Bruce Schneier | |||||||||||
Schneier on SecurityA blog covering security and security technology. « The New U.S. Wiretapping Law and Security | Main | Airport Security Breach » August 10, 2007Police Data Mining Done RightIt's nice to find an example of the police using data mining correctly: not as security theater, but more as a business-intelligence tool: When Munroe took over as chief two years ago, his department was drowning in crime and data. Police had a mass of data from 911 calls and crime reports; what they didn’t have was a way to connect the dots and see a pattern of behaviour. Posted on August 10, 2007 at 6:51 AM • 35 Comments To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter. davez • August 10, 2007 7:35 AM This also done right because the non-public data was given, not taken, assuming the crime victims reported the events. Florian Liekweg • August 10, 2007 7:40 AM It'll be interesting to see whether future analyses show that the police actions were sucessfull. Also, I second davez' statement above. C Gomez • August 10, 2007 7:47 AM @Bruce: I am very happy you posted this. I have often said I want to see examples of how this great technology can be used to further positive goals, and not just slam it for being used as part of security theater. So, again I appreciate it. Great read, and let's keep educating so people demand more from their Congress. Roxanne • August 10, 2007 7:48 AM A key factor is that the police in Richmond were mining *their own data.* They weren't mining credit histories or telephone records or medical studies. They were looking at criminal reports and conditions within the city. In short, they were doing their job! What a novel concept.... Ale • August 10, 2007 8:24 AM In this case, data mining can be used in a predictive fashion because the underlying system dynamics are periodic, repeating at paycheck day in specific places - all of this at a known, relatively high rate. Thus the Richmond police dept. is predicting equilibrium states and tracking their changes, instead of trying to predict anomalies. The latter is always much harder to do with good false positive/negative rates. Fraud Guy • August 10, 2007 8:24 AM Repeating some of the above, but the police were not trying to determine who could be a criminal, but where crime might be committed. This might push some crime off to other, less likely venues, but has at its base preventative policing. sooth_sayer • August 10, 2007 8:40 AM Any of these "patterns" could(should) have been perceived by a dimwit cop ... only a nitwit needed a sophisticated software to correlate the obvious .. and Bruce heaving praise to make more nitwits blow their township budget on useless software. SteveJ • August 10, 2007 8:58 AM @sooth_sayer: I think anyone who has dealt with real data on behaviour as complex as criminal activity, knows that there is a lot of value in testing whether the "obvious" is actually true. When you have hundreds of thousands of data points, this is likely to require some kind of software. For example the article mentions cheque-cashing stores, and that crime rises near them on payday "in specific neighbourhoods". This sounds obvious, except that it isn't obvious which neighbourhoods it will happen and which it won't. The "dimwit cop" who uses the software knows which stores have a provable record of nearby crime, and of those which are the worst. You just have a vague hunch that maybe you should send more cops to areas near cheque-cashing stores, of which there may be a few hundred in your city. Good luck picking the right ones. Tynk • August 10, 2007 9:01 AM @sooth_sayer sooth_sayer • August 10, 2007 9:12 AM @Tynk Technology has ZERO value compared to what a dimwit cop SHOULD do .. that's his freaking job. Relying on useless technology like this has an additional risk .. that of diminishing value of human deduction and it will have far serious long term affects @SteveJ .. you make no sense at all Bruce Schneier • August 10, 2007 9:13 AM "I am very happy you posted this. I have often said I want to see examples of how this great technology can be used to further positive goals, and not just slam it for being used as part of security theater." I know. I'm happy to post this, too. I want more examples of data mining done right. My ususal example -- finding fraudulant patterns in credit card spending or phone calling cards -- is wearing thin. guvn'r • August 10, 2007 9:19 AM @sooth_sayer: Patience, grasshopper. Be a little less quick on the trigger. Richmond VA is far from a "township". In a large city environment it may be obvious to the cops who respond to the same locations every payday, but not to the higher-ups who allocate and deploy resources (unlike your rural townships where the same "nitwit" is both Chief and patrol officer). Meanwhile the city cops on the front lines may be too busy responding to calls to even notice the patterns, and certainly aren't in a position to convince the powers-that-be. That's because the pressure not to waste taxpayer dollars keeps patrol rosters small and heavily utilized, another example of penny-wise, pound foolish public policy. These are issues introduced by scale and specialization, which consistently cause effects in large scale complex systems contrary to intuition based on smaller and simpler situations. greg • August 10, 2007 9:26 AM Note that for this sort of thing, it does not need to be personally identifying data. @sooth_sayer Cop instincts work in movies, this is real life. The dimwit usually talks out of his ass, just what kind of definition of dimwit are you using....... Oh I see guvn'r • August 10, 2007 9:29 AM @Fraud Guy, "This might push some crime off to other, less likely venues, but has at its base preventative policing." The great thing is that it will also allow recognizing when that displacement has occurred, enabling the police to adjust resources rather than continuing to protect against a deprecated threat. It also provides a good foundation for causal analysis. Next step in mining the data might be to look for correlations with hypothetical risk factors (site lighting, traffic patterns, neighborhood demographics) and especially negative correlations that would allow predicting where displacement might go, and making environmental changes that would discourage criminal activity. ForReal • August 10, 2007 9:36 AM We are seeing the value of having software replace (or be put in place of) the human mind in the world's capital markets. And, in a somewhat difference sense, in the killing fields of Iraq. Andrew2 • August 10, 2007 10:00 AM The danger of "obvious" is twofold. First, many things that seem obvious end up being false. Second, truth is sometimes only obvious after you get a hint. Obvious is, effectively, an emotion felt that indicates "makes sense" at an intuitive level. It's reflective, not predictive. MathFox • August 10, 2007 10:01 AM What I see here is a use of "datamining" in decision support. Management is making use of the data so that it can use its resources more effectively. Yes, a lot of the knowledge would be available in the individual cops, but how can "headquarters" compare the problems Cop A sees in district X with the problems reported by Cop B in district Y? Using data to fight robbery in district X first (until that is "solved") and then switch resources to drug trafficking in Y... while keeping an eye on the robbery statistics. It all is proven statistics and proven resource management, contrary to collecting random heaps of information and trying to predict who will become terrorists. That system didn't work in East Europe; any good statistician should be able to tell you why. DigitalCommando • August 10, 2007 10:04 AM Bruce's article is simply pointing out a use of data mining which provided a positive benefit, without violating "business as usual" trampling upon our privacy. With that said, It's a bit like identifying a slaughter house which has chosen to pamper and massage one cow , while butchering all the rest. Perhaps software could be better used to track the crimes committed by police departments against citizens including: creation of probable cause where none actually exists to obtain warrants, Use of new spying technologies which have not been approved by any legislative body, i.e portable milimeter wave cameras (camero-tech.com), etc., etc., to reveal law enforcements new role as "the thieves of liberty and privacy". I would find that report to be far more interesting. I wonder what the odds are of any police department obtaining THAT software? Slow down boys, don't all rush in at once! Anonymous • August 10, 2007 10:42 AM @DigitalCommando I believe the correct application of tin foil can work here.... ie tinfoil hat Alex • August 10, 2007 10:46 AM What we see here is what usually is called 'intelligence led policing'; a concept that is particularly used in, for example, the UK and the Netherlands. And indeed you do not need to violate citizens privacy for that. Good to have a positive example from the US. dave X • August 10, 2007 10:57 AM sooth_sayer, Indeed, the dim-witted ones can detect patterns in their environment; that is what humans are good at. The big problem that data mining helps solve is choosing the more significant patterns from the insignificant patterns. Examples of obvious (in retrospect) patterns that were previously unrecognized make for good quotes in the article, however, data mining is useful for discovering unexplained patterns in large, complicated data such that even the dim-witted can help with the explanation. Using the statistics of data mining, you can optimize your allocation of effort towards significant risks, not fantasy security theatre. Guarding the 24 hour donut shop because it's open and has a cash register might be an obvious, easy beat, but you might not want all your cops doing that. CanadianAh • August 10, 2007 11:19 AM Vancouver BC used data mining to target car thieves. The thieves couldn't just move to other areas once they had been caught. Thieves are creatures of habit so data mining shows where they prefer to strike again. Realist • August 10, 2007 11:35 AM @sooth_sayer The "nitwit" cop only knows what's happening in his/her particular field of view (patrol area). But s/he usually doesn't have the bigger picture and see patterns of movement such as the concentration of crimes shifting over the course of a month to different areas of a large city, or patterns that only emerge whne larger georaphic areas or timeframes are used. The datamining could also show a singluar crime, such as sexual assualt, occuring in certain similar areas and help piece together the MO or other common info that would help police determine if the cases were unrealated or if there was a serial criminal involved and concentrate their efforts accordingly. In fact, a police officer from Vancouver did develop just such software and it has been used successfully on numerous occasions to locate the area in which the perpetrator lived -- it was very sophisticated dataining as it used not only crme locations but other environmental factors (type of area, ages, weather, etc.) to connect dots no one even knew existed.
Jim • August 10, 2007 12:21 PM Another example. You might as well tattoo caught on your arm along with your arrest number. Being in a gang increases the chances of being caught doing a crime. There are plenty of dumb criminals who don't know this fact. Tom Welsh • August 10, 2007 12:42 PM "Robberies spiked on paydays near cheque cashing storefronts in specific neighbourhoods. Other clusters also became apparent, and pretty soon police were deploying resources in advance and predicting where crime was most likely to occur". I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself. Maybe they need a bit less state-of-the-art computer equipment and a bit more plain thinking. This fascination with computers is what brought the CIA down from being an average, mediocre spy service to its present wretched state. As Heinlein memorably put it in a slightly different context, "If you load a mud foot down with a lot of gadgets that he has to watch, somebody a lot more simply equipped--say with a stone ax---will sneak up and bash his head in while he is trying to read a vernier". Jim • August 10, 2007 12:51 PM "I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself." After the 3rd or 4th time it would be fairly clear to the average 12 year old kid. Data Mining speeds it up so you don't need the policeman or the 12 year old kid, just a faster computer that can do all the thinking. Anybody can foul things up, to really create a mess out of things you need a computer. Realist • August 10, 2007 1:03 PM For all those who mention that any cop on the street should be able to do this, my question is "then why didn't they do something about it?". The beat cop could easily change his/her patrol pattern to have been in those places when needed. Obviously, the beat cop you are giving so much credit to either wasn't able to notice these patterns, or if they did they weren't motivated to make use of the information and take action until forced to by the computer reports. Jim • August 10, 2007 1:18 PM then why didn't they do something about it? Maybe they were about to crack the case and wanted more money budgeted, so they used the data to prove the money would be well spent to reduce crime. Maybe not, I wasn't there. The goal of police bureaucracy is to keep things running smooth. That's fine. The other goal is to increase the power of the bureaucracy. Computers are great bureaucracy builders and reduce paperwork. I'm sure the police love and want more paperwork. Look at the FBI VCF system. Less paperwork, more congressional testimony that generates, you guessed it, more paperwork. Richmond • August 10, 2007 1:24 PM It might be a nice example, but the article paints a prettier picture of the results than what those of us in the area see every day. Anonymous • August 10, 2007 5:18 PM "I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself." Yes, any old cop could say "I bet robberies near cheque cashing outlets are connected to paydays." But any old cop could not easily determine precisely which cheque cashing outlets had their associated robbery rates most closely associated with which companies' paydays. DBH • August 10, 2007 7:32 PM This type of analysis was the real secret behind NYPD success in the 90's. All the zero tolerance stuff and community based stuff was minor compared to intelligent deployment based on crime stats. Sort of a no-brainer nowadays... Predictor • August 11, 2007 5:35 PM It is important to separate the issues: Data mining (statistical analysis), as such, has never been the threat to privacy that data gathering has. If civil liberties and personal privacy are to be protected, then it is the gathering and sharing of data by government and businesses which needs to be controlled and monitored. Once an entity has private data, data mining (again: statistical analysis) is almost an afterthought. Felicia Donovan • August 12, 2007 8:07 PM As a recognized law enforcement technology expert and author, I'd like to clarify several things I've read in these comments. Data mining has its place for police departments that have enough frequency of crime to try and perform predictive analyses of patterns. Many mid- to small-sized agencies do not. Officers cannot sit and park for hours at a time waiting for a crime to happen when they are required to be on patrol and respond to calls. More and more agencies are hiring crime analysts to trend and disseminate information, but that requires a large amount of time to constantly track patterns and put that information out in a format that is quickly and easily read so that it is of value. By the time our crime analyst gets through putting one bulletin out, she has to get ready for the next. Data mining software is expensive and generally ties to the agency's Records Management System. It's all about GIGO once again. Many departments cannot justify the cost of data mining software or it's constant need to be updated. Data mining software is only as good as the quality of data put into the police RMS system. It will not work if an officer codes a simple assault as a criminal mischief. That human factor still plays a big part in the effectiveness of any of these tools. Quality control is the bane of many, many agencies who lack the staff and expertise to review every single report for the minutae that needs to be reviewed. I'm always leery of anyone's claim that they analyzed their agency's crime trends until they can prove to me that they have a handle on quality control first and foremost. Law enforcement technology has evolved greatly over the last decade, but unfortunately, funding gaps still keep most departments woefully behind. Some dislike data being collected, but when the 911 call pops up showing that the homeowner has a serious heart condition and there's no response on a callback, it really is taken much more seriously. Like it or not, that data really can save lives. Law enforcement technology is not mainstream business technology. It's shadowed by laws and standard operating procedures about how information can be used and shared. Law enforcement technology is still mostly understaffed, underfunded and always on the verge of obsolescence. Felicia Donovan Harold • August 15, 2007 11:28 PM Here's an example of citizen data mining of crime data: http://blog.jonudell.net/2007/08/02/... Roger • August 22, 2007 8:33 AM A number of commenters have mentioned the supposedly high cost of data mining software. This is largely a myth. True, at the high end of "turn-key solutions" you can shell out 6 figure sums, but it isn't necessary to go nearly so far. At a previous job myself and a colleague did some very effective and useful data mining using existing GIS software, some (free) GPL data mining libraries, a week of evenings reading about the theory, and a couple of hours guing it together with Perl. If you want something a bit more "turn key" you can expect to pay from $1000 - $5000 for entry level "solutions", but anying that doesn't require any coding at all will probably be of limited functionality as the problem domain is so broad.
Post a comment
Powered by Movable Type. Photo at top by Geoffrey Stone.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT. |
|
Comments