Stephen Baker

The Numerati
Home - Viewing one post

Keeping count of people (and things)

June 15, 2010Hop Skip Go

I learned while researching The Numerati that the Chinese have 11 different spellings for Osama Bin Laden. (Maybe it's up to 12 or 13 by now.) So if the quants at the National Security Agency were attempting to monitor Chinese Web traffic about the Al Queda leader, their computers have to recognize all of these different spellings, and group them.

At the same time, I share a name with a prominent author who wrote best-selling books such as How to Live with a Neurotic Dog. Smart systems have to figure out that we're not the same person. (This, of course, is a huge issue for thousands of people whose names condemn them to no-fly lists.)

It sounds easy, but one of the toughest challenges in digging through unstructured data is to come up with accurate counts of people and entities. Jeff Jonas has a very thoughtful blog post and article on this. He writes:

it is essential to understand the difference between three transactions carried out by three people versus one person who carried out all three transactions.  Without the ability to determine when entities are the same, it quickly becomes clear that sensemaking is all but impossible....I find most organizations have underestimated this principle: If a system cannot count, it cannot predict.

add comment share:

©2022 Stephen Baker Media, All rights reserved.     Site by Infinet Design

Kirkus Reviews -

LibraryJournal - Library Journal

Booklist Reviews - David Pitt

Locus - Paul di Filippo

read more reviews

Prequel to The Boost: Dark Site
- December 3, 2014

The Boost: an excerpt
- April 15, 2014

My horrible Superbowl weekend, in perspective
- February 3, 2014

My coming novel: Boosting human cognition
- May 30, 2013

Why Nate Silver is never wrong
- November 8, 2012

The psychology behind bankers' hatred for Obama
- September 10, 2012

"Corporations are People": an op-ed
- August 16, 2011

Wall Street Journal excerpt: Final Jeopardy
- February 4, 2011

Why IBM's Watson is Smarter than Google
- January 9, 2011

Rethinking books
- October 3, 2010

The coming privacy boom
- August 17, 2010

The appeal of virtual
- May 18, 2010