Stephen Baker

The Numerati
Home - Viewing one post

How can terror data mining spare false positives?

October 17, 2008Hop Skip Go

It's a crucial issue that I discussed in the book and have been stressing on the speaking rounds: Data mining for terrorists runs the risk of nabbing lots of innocent people--false positives in the lingo of statisticians.

I was talking to Eric Siegel about this, and he had an objection. Eric is the president of Prediction Impact, a company that digs through data to understand and predict customer behavior. Eric helped me as I researched the book. His point: False positives come up in all kinds of analysis, and they always have. People working from their guts (and prejudices) have arrested, fired, persecuted, and otherwise mistreated lots of innocent people--false positives--through the ages. Why is data mining any different?

Two differences, I said. First, data mining has tremendous scale. A sloppy predictive model of a potential bomber could implicate literally millions of innocent people. (Even a very smart and careful model would bring in quite a few. It's statistics, after all.) Second, data mining results carry the brands of "science" and "math," and as a result seem more conclusive and harder to contest.

After our conversation, Eric sent me these two paragraphs:

I agree that there needs to be a cultural shift in understanding and processes so that each predictive score output by a predictive model is treated as a probability relative to limited information, rather than an unrealistically "absolute" probability pertaining to a suspect.  That is, the model could score a suspect as 3% likely of being guilty relative to an average of 0.0001% chance across the general population, but that is with respect to the data available, which is always extremely limited relative to what could in principle be known about a suspect or customer, when you consider the extensive set of opinions, moral framework and brain chemistry that determines a person's behavior.

To help avert unjust treatment of an individual, the process needs to be defined regarding groups of people.  A group of people with "scores above 3%" can possibly be vetted in an ethical manner.  By seeing them as a group, the investigator will understand that "this room of many people is likely to contain a small number of guilty perpetrators" and thereby be less likely to experience unwarranted suspicion of any individual.  This group-oriented aspect of process definition is necessary but not sufficient to be just.

add comment share:

©2021 Stephen Baker Media, All rights reserved.     Site by Infinet Design

Kirkus Reviews -

LibraryJournal - Library Journal

Booklist Reviews - David Pitt

Locus - Paul di Filippo

read more reviews

Prequel to The Boost: Dark Site
- December 3, 2014

The Boost: an excerpt
- April 15, 2014

My horrible Superbowl weekend, in perspective
- February 3, 2014

My coming novel: Boosting human cognition
- May 30, 2013

Why Nate Silver is never wrong
- November 8, 2012

The psychology behind bankers' hatred for Obama
- September 10, 2012

"Corporations are People": an op-ed
- August 16, 2011

Wall Street Journal excerpt: Final Jeopardy
- February 4, 2011

Why IBM's Watson is Smarter than Google
- January 9, 2011

Rethinking books
- October 3, 2010

The coming privacy boom
- August 17, 2010

The appeal of virtual
- May 18, 2010