Stephen Baker

The Numerati
Home - posts tagged as Hop Skip Go

One of the Numerati goes to work for Obama
March 9, 2012Hop Skip Go


I remember talking to Rayid Ghani about "barnacles."
Those are the shoppers who go to extraordinary lengths to buy only things on sale. Ghani, a researcher at Accenture Labs, told me that stores could identify likely barnacles in their data, and then perhaps take measures to "fire" them, ie. send them shopping elsewhere. Barnacles, after all, cost money.

At the time, Ghani and his team were analyzing loads of supermarket data, and trying to figure out how to lead shoppers toward the items they'd most likely buy (and others the supermarket wanted to get rid of). This research would go into the "Shopper" chaper in The Numerati, in which Ghani was the lead character.

Now, as the NYTimes reports, Ghani is chief scientist for the Obama campaign. The mission is to unearth different tribes of voters from the analysis of data, and then figure out the best way to "optimize" them, as donors, organizers, or just plain voters. If Ghani had made this switch earlier, I could have featured him in the Voter chapter. But that's one of the hallmarks of the Numerati. Their skills enable them to switch from one field to the next.

link to post share:

Big Data and math
February 12, 2012Hop Skip Go

Steve Lohr has a good round-up of Big Data trends in the Times. It has very similar themes to The Numerati and to Ian Ayres' SuperCrunchers.

In fact, reading the story reminded of my BusinessWeek story, Math Will Rock Your World, which led to The Numerati. It argues the same points, but instead of focusing on the subject of the investigation--data--it looks at the tools employed, mathematics and computers (without, you might note, shedding any light on how the mathematicians and computer scientists carry out this work.)

As I've mentioned before, Steve Adler, the editor in chief, started the process by telling me to write a cover story on math. So I started interviewing mathematicians. I was learning all sorts of interesting things about encryption and operations research, but I didn't really see the BusinessWeek cover story until I delved into the world of data. In the end, I wrote a story about Big Data--but kept math in the headline.

A few paragraphs from that story:

The world is moving into a new age of numbers. Partnerships between mathematicians and computer scientists are bulling into whole new domains of business and imposing the efficiencies of math. This has happened before. In past decades, the marriage of higher math and computer modeling transformed science and engineering. Quants turned finance upside down a generation ago. And data miners plucked useful nuggets from vast consumer and business databases. But just look at where the mathematicians are now. They're helping to map out advertising campaigns, they're changing the nature of research in newsrooms and in biology labs, and they're enabling marketers to forge new one-on-one relationships with customers. As this occurs, more of the economy falls into the realm of numbers. Says James R. Schatz, chief of the mathematics research group at the National Security Agency: "There has never been a better time to be a mathematician."

From fledglings like Inform to tech powerhouses such as IBM (IBM ), companies are hitching mathematics to business in ways that would have seemed fanciful even a few years ago. In the past decade, a sizable chunk of humanity has moved its work, play, chat, and shopping online. We feed networks gobs of digital data that once would have languished on scraps of paper -- or vanished as forgotten conversations. These slices of our lives now sit in databases, many of them in the public domain. From a business point of view, they're just begging to be analyzed. But even with the most powerful computers and abundant, cheap storage, companies can't sort out their swelling oceans of data, much less build businesses on them, without enlisting skilled mathematicians and computer scientists.

The rise of mathematics is heating up the job market for luminary quants, especially at the Internet powerhouses where new math grads land with six-figure salaries and rich stock deals. Tom Leighton, an entrepreneur and applied math professor at Massachusetts Institute of Technology, says: "All of my students have standing offers at Yahoo!  and Google." Top mathematicians are becoming a new global elite. It's a force of barely 5,000, by some guesstimates, but every bit as powerful as the armies of Harvard University MBAs who shook up corner suites a generation ago.

link to post share:

Baseball playoffs begin: Moneyball season over
September 30, 2011Hop Skip Go

                               Phillies' pitcher Roy Halladay: Statistics indicate that he could win... or lose

It's time to forget Moneyball and statistical analysis
. The 162-game baseball season, the six-month marathon in which statistics have the time to work their magic, is over. As play-offs begin, managers might as well return to their divining rods or the study of patterns on the bottom of their coffee cups. They're entering a season defined largely by unmeasurables such as confidence, feel, and most importantly, luck.

The Phillies-Cardinals match-up Saturday pits Roy Halladay, last year's Cy Young winner, against Kyle Lohse, a mid-rotation starter through his career. Lohse, through an 11-year career, has won about as many as he has lost. Halladay, in 14 years, has won two games for every game he's lost. This year he went 19-6. Still, he lost six games, and could lose another against Lohse. The game might turn on one pitch and the difference of a quarter inch on Albert Pujols' bat. That tiny adjustment turns a high fly ball into a tape-measure homer. Over a long season, Halladay establishes his superiority. In one game, or even a five-game series, throw the stats out the window.

And yet, because of the new Moneyball movie, we're sure to hear every day about the wonders of baseball's quants, led by the prime number-cruncher, Bill James. In fact, it already surfaced in the recent pennant races. In one article, summing up the chances of the fading Boston Red Sox, a Harvard data cruncher named Andrew Mooney surveyed the last ten days of the season and counseled the Sox not to worry:

You’re in a funk, you say. You’ve lost nine of your last eleven, while the Rays have won eight of 10. Actually, this could be just as much a source of comfort as a cause for alarm. Simple probabilities indicate that neither team is likely to continue at such a rate for the remainder of the season; that’s just the nature of streaks. Need evidence? You started the season 2-10.

You’ve got a 10-game homestand coming (winning percentage at Fenway: .592), while the Rays will be away for their next 11 (winning percentage on the road: .557).

My question: What do "simple probabilities" count for when one team is consumed by dread and its rival buoyed by rising confidence? How do you measure the impact of such things? And what are such measurements worth in a sample of only 10 games? Nothing, I'd say. 

By instituting two series of play-offs, Major League Baseball essentially created a second season. While the first is a marathon, in which statistics rule, the second is an eight-team sprint. The best team can win this second season. But for this to happen, it must also be hot and lucky.

Just one more point about the movie Moneyball. The analysis in the movie boiled baseball down to its essense: The team that scores more runs than its opponents over 162 games will likely wind up on top. But for some reason, the film glided over the biggest factor in this equation: starting pitching. It didn't mention even once that Billy Beane's Oakland A's had the best trio of starting pitchers in the Major Leagues. Barry Zito won the Cy Young Award that year. He was absent from the movie. Mark Mulder and Tim Hudson were magnificent. We got brief glimpses of their uniforms.

So the movie would lead us to believe that the A's won their division that year because they played a converted catcher at first base, swung a smart deal for a left-handed reliever, got rid of a distracting Giambi brother, and urged players to take walks. The A's could have done all of those things, and without their trio of great pitchers, they would have ended up with a losing record. That's why, as The Sconz says: Moneyball is a lie.

link to post share:

You will be monitored, step by step
September 7, 2011Hop Skip Go

I think I should have been more emphatic in The Numerati. From the 2020s on, practically every senior in the industrial world will be monitored by sensors in his or her home. That includes me and, if you're not already in your golden years, it includes you. My "Patient" chapter in the book focused on Intel's efforts to monitor seniors in the Portland area, research that also spread into Ireland. National economies desperately need to save money on health care, and developing technology that can monitor seniors and intervene before they get sick, and before they fall, is simply too sensible to pass up.

Now I get this news about a similar initiative in Missouri.One interesting wrinkle is the use of technology originally developed for gaming, Microsoft's Kinect. And they use the depth perception of the camera to view the subjects as silhouettes, which protects a bit of their privacy. You can assume that if you don't reach your 70s until, say, the 2030s, the technology will be remarkably effective and, at the same time, discreet. Then again, once it works well on older people, why not extend it to everyone else? That's the future I see.

MU Researchers Use New Video Gaming Technology to Detect Illness, Prevent Falls in Older Adults from MU News Bureau on Vimeo.

add comment link to post share:

How much data can Asthma inhalers provide?
May 11, 2011Hop Skip Go

It seems like the perfect combination for asthma research: inhalers equipped with GPS, so that each use of the medication comes with a time and place tag. Teradata's Paul Barsch  cites an Economist article about Asthmopolis, the new tool to track asthma.

The idea is that people can map their own patterns, and come to understand, and hopefully avoid, places and conditions that provoke asthma attacks. And researchers, studying the data from thousands of users, might learn even more.

That's where other variables come in. The GPS data may show that 20 users in Youngstown, Ohio, suffer exacerbations between 9 p.m. and 10 p.m. on a Tuesday night. But how many of them are just traveling through Youngstown in their car? Should they count? And how many of them are spending the evening in close quarters with their cats or dogs? What did people eat? Since each of us is a complex system, the challenge with medical monitoring is to pick up as much detail of the entire life as possible.

The key--at least until English-savvy machines like Watson are on the case--is to get valuable diary data into formats machines can process. Then systems like Asthmopolis could really make a difference, both for individuals and society at large. The other key, as Barsch notes, is to do this in a way that protects people's privacy. None of the medical monitoring will work without that.

add comment link to post share:

Intel casting for more sensor data
April 20, 2011Hop Skip Go

My health-care chapter in The Numerati focused on Intel's efforts to mine the minutia of elderly people's lives. In pilot projects, they wired people's homes with all kinds of sensors, and then crunched petabytes of behavioral data: Their walking patterns around the house, the strength of their voices, even the tilt of their bodies in the kitchen. The idea was to puzzle out the patterns of diseases like Parkinson's and Alzheimer's--and hopefully to intervene more quickly. (Here's a New Statesman article with some of the details.)

Now, writes my friend and former colleague, Olga Kharif, Intel is planning to expand research to 10,000 seniors in a $200 million study. However, in the example's Intel's Eric Dishman provides, the intelligence derived from the data is more elementary. Instead of looking at sensors producing rivers of behavioral data, the emphasis now appears to be using sensors for alerts. This could mean less fun for dataminers. But the simple alerts could pay off more quickly.

For example: You don't need much statistical analysis to gain an insight from one single piece of data: Mom didn't get out of bed this morning. Or Grandad hasn't gone into the kitchen in a week. Or he's been in the bathroom for 4 hours. Valuable data. No machine learning required.

This doesn't mean the end of the original vision, in which computers would spot oncoming diseases, perhaps even before humans knew they had them. But according to Eric Dishman, Intel's lead researcher on the project, Intel can't develop the knowledge and technology alone: “It’s too expensive even for Intel to single-handedly produce the clinical and financial evidence that these technologies detect diseases and lower costs,” Dishman said. “Even competitors need to come together and co-invest.”

In the meantime, I'm betting that middle-aged children will urge their distant parents to install ever more sensor devices in their homes. In the short term, these gadgets will send alarms. But over time, they'll produce the rich rivers of behavioral data that the Numerati feed on.

We drove out to Madison last weekend to visit family, and made stops on the way home in Detroit and State College, Pa. The weather, for the most part, was miserable. But the visits were great. Above is my view from the restaurant of the Edgewater Hotel, in Madison, looking out on Lake Mendota.

add comment link to post share:

Baseball to see new data explosion
April 4, 2011Hop Skip Go

In the beginning, there was the hit. Then the strike out, the RBI, the batting average, the run scored, the win, the loss. This was the first original generation of baseball data. It was the universe occupied by dead-ball era players, like Ty Cobb and Honus Wagner. Sometime early on, the first Numerati of the sport started to crunch some numbers. If each pitcher were to go a full nine-inning game, how many runs would he let in? That led to the earned-run average, or ERA. When I was a kid, the ERA was one of the more sophisticated numbers that I memorized from the back of baseball cards (such as the one of Johnny Callison, above, one of my early favorites).

The second generation, as Michael Lewis described in Moneyball, came about in the '90s. Number-crunchers started to develop new enhanced statistics, which took on a life of their own at Baseball Prospectus. They brought in loads of new variables, and analyzed correlations. They could calculate, for example, the AEqR, "the number of equivalent runs scored by a team, adjusted for their opponents' pitching and defense."

But now comes the sensor revolution, which will bring to baseball (and the rest of our lives) mountains of new statistics. These ones, as Ira Boudway writes at Bloomberg, will measure players not by the traditional route--results--but instead by monitoring and measuring their behavior. The new monitoring, already in place at San Francisco's AT&T Park, is called Fieldf/x. Boudway writes:

Fieldf/x is a motion-capture system created by Chicago- based Sportvision. It uses four cameras perched high above the field to track players and the ball and log their movements, gathering more than 2.5 million records per game. That means you could find out whether Ichiro Suzuki truly gets the best jump on fly balls hit into the right-field gap, or if Derek Jeter really deserved that Gold Glove last year.

It's with systems like this that the Numerati establish their hegemony over businesses, including baseball. The reason is that the statistics are so rich and varied that only experts with advanced computer skills can analyze them. Of course, eventually, they build and sell the software to widen the markets to the rest of us. But their systems come to define the game.

In the end, though, they're still stats. While the corralations they find may define the past, there's no guarantee they'll predict the future. Does this mean that the old-fashioned gut will prevail? I'd say no. But the arguments about baseball, about whether Jeter is worth is contract or whether the Phillies should have traded J. Happ for Roy Oswalt, will now rage in the realm of enhanced stats. And, of course, they'll never replicate the complexity of the real game, where day dreams, hangovers, and even the appetite of a single mosquito can change the course of a pitch, a catch, a game, and a season. There will always be a push to collect more information, to come up with yet another generation of stats.

add comment link to post share:

SAS: The rich HQ of the Numerati
January 20, 2011Hop Skip Go

If you had any doubts about the growth of the Numerati in the economy, consider the results just in from SAS, the privately-held titan of analytics software. In a slow economy, SAS revenue rose 5.2% to a company record $2.4 billion. Of course, it's not all from the recent boom in analytics. SAS has grown every year for its 35-year history. If this company's stock were publicly traded, we'd be reading about it constantly (and not confusing it with a Scandinavian airline).

SAS, based in Cary, NC, also routinely wins "best company to work for" awards, with these perks, according to Fortune:

Its perks are epic: on-site healthcare, high quality childcare at $410 per month, summer camp for kids, car cleaning, a beauty salon, and more -- it's all enough to make a state-of-the-art, 66,000-square-foot gym seem like nothing special by comparison.

add comment link to post share:

Crunching the words of Victorian literature
December 4, 2010Hop Skip Go

The Numerati continue their march, now into literature. The NY Times has a story today on literature researchers carrying out statistical analysis of the language used in Victorian fiction and poetry.

a scene from Great Expectations, 1946

The research funds, naturally enough, come from Google. Researchers can use Google's scans of 19th century literature and the company's vast computing resources to parse the patterns of language, including word choice. As more of our history gets scanned, from letters to newspapers, linguists, psychologists, anthropologists, art historians, and more will be able to research the words people used as they hunt for the cultural effects of technological change, the incidence of personal depression, anger, sexual repression... in short, much of what they study now, but in history.

Naturally, there will be debate about whether the scanned words represent true samples of society at that time, and whether the their interpretations are tinged with modern prejudices of one kind or another. But that's the nature of research. The point is that vast new possibilities are opening up. They exist as digital data. And the research suger-daddies, even in the humanities, are going to be companies like Google. This is one more example of what I wrote about in the book: The engineers and mathematicians are plowing into the domain of the word.

link to post share:

WSJ: Advertiser tracking on the rise
July 31, 2010Hop Skip Go

The Wall Street Journal publishes a report today (behind firewall) on cookies, and the growth of consumer-tracking on major Web sites. For the report, they analyzed big Web sites, including their own, and found that many dropped more than 100 cookies into visitors' computers. (The Journal dumps 60 cookies, slightly below the 64-cookie average on the 50 largest sites.) The only big site that doesn't track visitors is

As a reader (and former editor) I found the Journal story maddenly vague. It says that cookies are on the rise, but doesn't give any historical context. It mentions data-analysis companies that are doing highly detailed work, but doesn't name them. And while it states what type of analysis they could do with this detailed data, it doesn't give examples of how it's being used. To wit:

"Some tracking files can record a person's keystrokes online and then transmit the text to a data-gathering company that analyzes it for content, tone, and clues to a person's social connections..... Data-gathering companies [can] build personal profiles that could include age, gender, race, zip code, income, marital status, and health concerns, along with recent purchases and favorite TV shows and movies."

Why not name a few of these companies, and, while they're at it, ask advertisers how such detailed profiles are being used? Also, note the use of the word "could" in the last sentence. Is there evidence that these unnamed companies are actually building these profiles? We don't know.

I dealt with these issues often while researching The Numerati. The problem here, as in much of the data economy, is the gap between the astonishingly rich trove of data and the undeveloped business model for it. Most companies simply don't know how to put the data to use. How do you deal with millions of detailed consumer profiles when you only have four or ten or 20 different types of ad campaigns? You ignore most of the details and put the people into enormous buckets. (Credit-card companies are a notable exception. They can create thousands of different offers and test them against different groups. But they've been at this since long before the age of cookies.)

Eventually advertisers will learn to make use of this information, if a privacy uprising doesn't shut cookies down. But for now much of this detail we're communicating with our clicks and keystrokes is piling up in data centers, largely ignored.

add comment link to post share:

©2021 Stephen Baker Media, All rights reserved.     Site by Infinet Design

Kirkus Reviews -

LibraryJournal - Library Journal

Booklist Reviews - David Pitt

Locus - Paul di Filippo

read more reviews

Prequel to The Boost: Dark Site
- December 3, 2014

The Boost: an excerpt
- April 15, 2014

My horrible Superbowl weekend, in perspective
- February 3, 2014

My coming novel: Boosting human cognition
- May 30, 2013

Why Nate Silver is never wrong
- November 8, 2012

The psychology behind bankers' hatred for Obama
- September 10, 2012

"Corporations are People": an op-ed
- August 16, 2011

Wall Street Journal excerpt: Final Jeopardy
- February 4, 2011

Why IBM's Watson is Smarter than Google
- January 9, 2011

Rethinking books
- October 3, 2010

The coming privacy boom
- August 17, 2010

The appeal of virtual
- May 18, 2010