Bruce Schneier confuses data mining with racial profiling.
Even those who say that terrorists are likely to be Arab males have it wrong. Richard Reid, the shoe bomber, was British. Jose Padilla, arrested in Chicago in 2002 as a "dirty bomb" suspect, was a Hispanic- American. The Unabomber had once taught mathematics at Berkeley. Terrorists can be male or female, European, Asian, African or Middle Eastern. Even grandmothers can be tricked into carrying bombs on board. One problem with profiling is that, by singling out one group, it ignores the other groups. Terrorists are a surprisingly diverse group of people.
The problem with this criticism is that it is backwards. You can do racial profiling without doing data mining. Data mining is a way of doing the opposite of racial profiling. Data mining is a way of finding groups of characteristics that in combination can separate potential terrorists from others. Racial profiling is based on factors that are not well correlated with terrorism.
If data mining were used, we could search fewer Arab Americans and stop more potential terrorists. It is the people who are opposed to data mining who are going to cause the security forces to resort to racial profiling.
If Schneier does not understand this--if he has no clue how Bayesian algorithms work--then his credentials as a security expert are way overblown. If he does understand statistical inference, then he is being a demagogue. Either way, my respect for Scheneier has evaporated.
The question is, what is "sufficiently small"? Are you satisfied with current airport screening, where grandmothers traveling with their grandchildren are as likely as anyone else to be searched?
I am not sure how well Bayesian algorithms could work in airport passenger screening. But one example is that during the DC sniper spree, the suspects stole a credit card, and a charge was denied right away because of the statistical profiling used by the credit card companies.
Right now, the credit card companies are better at spotting a suspect than are the law enforcement agencies. That is because the credit card companies use data mining.
Posted by Arnold Kling on October 24, 2003 02:29 PM | Permalink to CommentRight now, the credit card companies are better at spotting a suspect
No, the credit card companies are better at spotting odd charging patterns which may or may not (likely, not) correspond with terrorist purchasing patterns. If I understand the system correctly, the credit card company looks for something like a shopping spree on a card that is usually only used for gas every week and so on. They're looking for stolen cards... they're not looking for purchases of specific items, as I understand it nor do I really want them to.
Therefore a terrorist who charges an airline ticket and hotel room isn't likely to raise a red flag if they're using a legitimate card, whereas if you decide to go purchase a bunch of stuff for a new apartment, you might find yourself flagged using this type of system.
I also think you've misconstrued Schneier's comments, he's not equating profiling with data mining -- he's arguing against the idea that "if we had enough data, we could pick terrorists out of crowds." Seems pretty obvious if you read the piece carefully.
Posted by Zonker on October 24, 2003 05:36 PM | Permalink to CommentAs an example of getting it backward Scheier provides an example that suggests where data mining can improve on "profiling," when he says
"Even those who say that terrorists are likely to be Arab males have it wrong. Richard Reid, the shoe bomber, was British. Jose Padilla, arrested in Chicago in 2002 as a "dirty bomb" suspect, was a Hispanic- American. The Unabomber had once taught mathematics at Berkeley."
That's the point of data mining -- to improve on crude intuition.
It also appears he has little appreciation for how complex sysems are developed. It is impossible to know in advance what data and relationships among data are going to produce good predicitve performance.
Posted by Dave Sheridan on October 24, 2003 05:40 PM | Permalink to Comment
I think Bruce is arguing that *any* sort of pattern matching will not work, and uses racial profiling as a well-understood example. It seems to me that you have set up a straw man. Okay, so 'race' is not highly correlated with 'being a terrorist'. What is? If the power of Bayesian Algorithms can solve the problem, I'd like to see a demonstration of that, or at least a reasonable model, that shows that the number of false negatives and positives are sufficienlty small.
Posted by Foolish Jordan on October 24, 2003 02:08 PM | Permalink to Comment