Schneier on Security
A blog covering security and security technology.
« The New U.S. Wiretapping Law and Security |
| Airport Security Breach »
August 10, 2007
Police Data Mining Done Right
It's nice to find an example of the police using data mining correctly: not as security theater, but more as a business-intelligence tool:
When Munroe took over as chief two years ago, his department was drowning in crime and data. Police had a mass of data from 911 calls and crime reports; what they didn’t have was a way to connect the dots and see a pattern of behaviour.
Using some sophisticated software and hardware they started overlaying crime reports with other data, such as weather, traffic, sports events and paydays for large employers. The data was analyzed three times a day and something interesting emerged: Robberies spiked on paydays near cheque cashing storefronts in specific neighbourhoods. Other clusters also became apparent, and pretty soon police were deploying resources in advance and predicting where crime was most likely to occur.
Posted on August 10, 2007 at 6:51 AM
• 35 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
This also done right because the non-public data was given, not taken, assuming the crime victims reported the events.
It'll be interesting to see whether future analyses show that the police actions were sucessfull. Also, I second davez' statement above.
I am very happy you posted this. I have often said I want to see examples of how this great technology can be used to further positive goals, and not just slam it for being used as part of security theater.
So, again I appreciate it. Great read, and let's keep educating so people demand more from their Congress.
A key factor is that the police in Richmond were mining *their own data.* They weren't mining credit histories or telephone records or medical studies. They were looking at criminal reports and conditions within the city. In short, they were doing their job! What a novel concept....
In this case, data mining can be used in a predictive fashion because the underlying system dynamics are periodic, repeating at paycheck day in specific places - all of this at a known, relatively high rate. Thus the Richmond police dept. is predicting equilibrium states and tracking their changes, instead of trying to predict anomalies. The latter is always much harder to do with good false positive/negative rates.
Repeating some of the above, but the police were not trying to determine who could be a criminal, but where crime might be committed. This might push some crime off to other, less likely venues, but has at its base preventative policing.
Any of these "patterns" could(should) have been perceived by a dimwit cop ... only a nitwit needed a sophisticated software to correlate the obvious .. and Bruce heaving praise to make more nitwits blow their township budget on useless software.
I think anyone who has dealt with real data on behaviour as complex as criminal activity, knows that there is a lot of value in testing whether the "obvious" is actually true. When you have hundreds of thousands of data points, this is likely to require some kind of software.
For example the article mentions cheque-cashing stores, and that crime rises near them on payday "in specific neighbourhoods". This sounds obvious, except that it isn't obvious which neighbourhoods it will happen and which it won't.
The "dimwit cop" who uses the software knows which stores have a provable record of nearby crime, and of those which are the worst. You just have a vague hunch that maybe you should send more cops to areas near cheque-cashing stores, of which there may be a few hundred in your city. Good luck picking the right ones.
There are three steps to every change in police function.
1) notice where a change needs to be made. This is where your "nitwit" cops come in. They know what is going on in the area and know things need to change.
2) Find a solution to the problem. Many times this means more of your "nitwit" cops in a certain location at a certain time.
3) Get approval to re-appropriate or gain additional funding. This is where the "nitwit" cops have bosses who need to be convinced.
Step three is the hardest part. Because you need to convince people who sit at a desk every day that what you need is more important then what his buddy down the hall needs. Many times your word will not be good enough. This is where the data mining comes in to prove to the bureaucrats and number crunchers what you know, using irrefutable data.
I don't like to write long posts .. but you make my point .. this software is being used as a crutch .. to convince others that the decisions of police have justification on data .. whether it's for more resources or for even patrolling certain areas heavily.
Technology has ZERO value compared to what a dimwit cop SHOULD do .. that's his freaking job.
Relying on useless technology like this has an additional risk .. that of diminishing value of human deduction and it will have far serious long term affects
@SteveJ .. you make no sense at all
"I am very happy you posted this. I have often said I want to see examples of how this great technology can be used to further positive goals, and not just slam it for being used as part of security theater."
I know. I'm happy to post this, too. I want more examples of data mining done right. My ususal example -- finding fraudulant patterns in credit card spending or phone calling cards -- is wearing thin.
Patience, grasshopper. Be a little less quick on the trigger.
Richmond VA is far from a "township".
In a large city environment it may be obvious to the cops who respond to the same locations every payday, but not to the higher-ups who allocate and deploy resources (unlike your rural townships where the same "nitwit" is both Chief and patrol officer).
Meanwhile the city cops on the front lines may be too busy responding to calls to even notice the patterns, and certainly aren't in a position to convince the powers-that-be. That's because the pressure not to waste taxpayer dollars keeps patrol rosters small and heavily utilized, another example of penny-wise, pound foolish public policy.
These are issues introduced by scale and specialization, which consistently cause effects in large scale complex systems contrary to intuition based on smaller and simpler situations.
Note that for this sort of thing, it does not need to be personally identifying data.
"Technology has ZERO value compared to what a dimwit cop SHOULD do."
Cop instincts work in movies, this is real life. The dimwit usually talks out of his ass, just what kind of definition of dimwit are you using....... Oh I see
@Fraud Guy, "This might push some crime off to other, less likely venues, but has at its base preventative policing."
The great thing is that it will also allow recognizing when that displacement has occurred, enabling the police to adjust resources rather than continuing to protect against a deprecated threat.
It also provides a good foundation for causal analysis. Next step in mining the data might be to look for correlations with hypothetical risk factors (site lighting, traffic patterns, neighborhood demographics) and especially negative correlations that would allow predicting where displacement might go, and making environmental changes that would discourage criminal activity.
We are seeing the value of having software replace (or be put in place of) the human mind in the world's capital markets. And, in a somewhat difference sense, in the killing fields of Iraq.
This stuff just finds the intuitively obvious when it WORKS. When it finds otherwise, you better look again, real carefully: the algorythm (AL GORE has no rhythm) is likely wrong, or misappplied.
The danger of "obvious" is twofold. First, many things that seem obvious end up being false. Second, truth is sometimes only obvious after you get a hint.
Obvious is, effectively, an emotion felt that indicates "makes sense" at an intuitive level. It's reflective, not predictive.
What I see here is a use of "datamining" in decision support. Management is making use of the data so that it can use its resources more effectively. Yes, a lot of the knowledge would be available in the individual cops, but how can "headquarters" compare the problems Cop A sees in district X with the problems reported by Cop B in district Y? Using data to fight robbery in district X first (until that is "solved") and then switch resources to drug trafficking in Y... while keeping an eye on the robbery statistics.
It all is proven statistics and proven resource management, contrary to collecting random heaps of information and trying to predict who will become terrorists. That system didn't work in East Europe; any good statistician should be able to tell you why.
Bruce's article is simply pointing out a use of data mining which provided a positive benefit, without violating "business as usual" trampling upon our privacy. With that said, It's a bit like identifying a slaughter house which has chosen to pamper and massage one cow , while butchering all the rest. Perhaps software could be better used to track the crimes committed by police departments against citizens including: creation of probable cause where none actually exists to obtain warrants, Use of new spying technologies which have not been approved by any legislative body, i.e portable milimeter wave cameras (camero-tech.com), etc., etc., to reveal law enforcements new role as "the thieves of liberty and privacy". I would find that report to be far more interesting. I wonder what the odds are of any police department obtaining THAT software? Slow down boys, don't all rush in at once!
"i.e portable milimeter wave cameras (camero-tech.com)"
I believe the correct application of tin foil can work here.... ie tinfoil hat
What we see here is what usually is called 'intelligence led policing'; a concept that is particularly used in, for example, the UK and the Netherlands. And indeed you do not need to violate citizens privacy for that. Good to have a positive example from the US.
Btw, don't credit the software too much, some human interpretation (expertise) of both input and output remains necessary.
Indeed, the dim-witted ones can detect patterns in their environment; that is what humans are good at. The big problem that data mining helps solve is choosing the more significant patterns from the insignificant patterns. Examples of obvious (in retrospect) patterns that were previously unrecognized make for good quotes in the article, however, data mining is useful for discovering unexplained patterns in large, complicated data such that even the dim-witted can help with the explanation. Using the statistics of data mining, you can optimize your allocation of effort towards significant risks, not fantasy security theatre.
Guarding the 24 hour donut shop because it's open and has a cash register might be an obvious, easy beat, but you might not want all your cops doing that.
Vancouver BC used data mining to target car thieves. The thieves couldn't just move to other areas once they had been caught. Thieves are creatures of habit so data mining shows where they prefer to strike again.
I think you are missing the bigger picture, have a very simplistic view of these efforts, and are making a hasty judgement on without thinking this through.
The "nitwit" cop only knows what's happening in his/her particular field of view (patrol area). But s/he usually doesn't have the bigger picture and see patterns of movement such as the concentration of crimes shifting over the course of a month to different areas of a large city, or patterns that only emerge whne larger georaphic areas or timeframes are used.
The datamining could also show a singluar crime, such as sexual assualt, occuring in certain similar areas and help piece together the MO or other common info that would help police determine if the cases were unrealated or if there was a serial criminal involved and concentrate their efforts accordingly. In fact, a police officer from Vancouver did develop just such software and it has been used successfully on numerous occasions to locate the area in which the perpetrator lived -- it was very sophisticated dataining as it used not only crme locations but other environmental factors (type of area, ages, weather, etc.) to connect dots no one even knew existed.
"GangNet gives law enforcement officials a tool to identify individuals, vehicles, tattoos, gang symbols, and locations, and to facilitate work on gang-related cases. Additional features, such as mapping and facial recognition, are continuously being added to the system. Law enforcement officials at all levels in many states and in Canada use GangNet to aid in identifying, locating, and apprehending gang members engaged in a variety of crimes. In addition to collecting information on gangs, GangNet can be used to identify and track illicit groups, outlaw motorcycle gangs, or other criminal organizations. Investigators can also use the solution’s collaboration and information-sharing capabilities across jurisdictions, improving their core intelligence functions."
You might as well tattoo caught on your arm along with your arrest number. Being in a gang increases the chances of being caught doing a crime. There are plenty of dumb criminals who don't know this fact.
"Robberies spiked on paydays near cheque cashing storefronts in specific neighbourhoods. Other clusters also became apparent, and pretty soon police were deploying resources in advance and predicting where crime was most likely to occur".
I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself. Maybe they need a bit less state-of-the-art computer equipment and a bit more plain thinking.
This fascination with computers is what brought the CIA down from being an average, mediocre spy service to its present wretched state. As Heinlein memorably put it in a slightly different context, "If you load a mud foot down with a lot of gadgets that he has to watch, somebody a lot more simply equipped--say with a stone ax---will sneak up and bash his head in while he is trying to read a vernier".
"I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself."
After the 3rd or 4th time it would be fairly clear to the average 12 year old kid. Data Mining speeds it up so you don't need the policeman or the 12 year old kid, just a faster computer that can do all the thinking. Anybody can foul things up, to really create a mess out of things you need a computer.
For all those who mention that any cop on the street should be able to do this, my question is "then why didn't they do something about it?".
The beat cop could easily change his/her patrol pattern to have been in those places when needed. Obviously, the beat cop you are giving so much credit to either wasn't able to notice these patterns, or if they did they weren't motivated to make use of the information and take action until forced to by the computer reports.
then why didn't they do something about it?
Maybe they were about to crack the case and wanted more money budgeted, so they used the data to prove the money would be well spent to reduce crime. Maybe not, I wasn't there. The goal of police bureaucracy is to keep things running smooth. That's fine. The other goal is to increase the power of the bureaucracy. Computers are great bureaucracy builders and reduce paperwork. I'm sure the police love and want more paperwork. Look at the FBI VCF system. Less paperwork, more congressional testimony that generates, you guessed it, more paperwork.
It might be a nice example, but the article paints a prettier picture of the results than what those of us in the area see every day.
"I would have thought that robberies being likely to happen near cheque cashing facilities on paydays might just be the kind of thing that an old-fashioned policeman could figure out for himself."
Yes, any old cop could say "I bet robberies near cheque cashing outlets are connected to paydays."
But any old cop could not easily determine precisely which cheque cashing outlets had their associated robbery rates most closely associated with which companies' paydays.
This type of analysis was the real secret behind NYPD success in the 90's. All the zero tolerance stuff and community based stuff was minor compared to intelligent deployment based on crime stats. Sort of a no-brainer nowadays...
It is important to separate the issues: Data mining (statistical analysis), as such, has never been the threat to privacy that data gathering has. If civil liberties and personal privacy are to be protected, then it is the gathering and sharing of data by government and businesses which needs to be controlled and monitored. Once an entity has private data, data mining (again: statistical analysis) is almost an afterthought.
As a recognized law enforcement technology expert and author, I'd like to clarify several things I've read in these comments.
Data mining has its place for police departments that have enough frequency of crime to try and perform predictive analyses of patterns. Many mid- to small-sized agencies do not.
Officers cannot sit and park for hours at a time waiting for a crime to happen when they are required to be on patrol and respond to calls.
More and more agencies are hiring crime analysts to trend and disseminate information, but that requires a large amount of time to constantly track patterns and put that information out in a format that is quickly and easily read so that it is of value. By the time our crime analyst gets through putting one bulletin out, she has to get ready for the next.
Data mining software is expensive and generally ties to the agency's Records Management System. It's all about GIGO once again. Many departments cannot justify the cost of data mining software or it's constant need to be updated. Data mining software is only as good as the quality of data put into the police RMS system. It will not work if an officer codes a simple assault as a criminal mischief. That human factor still plays a big part in the effectiveness of any of these tools. Quality control is the bane of many, many agencies who lack the staff and expertise to review every single report for the minutae that needs to be reviewed. I'm always leery of anyone's claim that they analyzed their agency's crime trends until they can prove to me that they have a handle on quality control first and foremost.
Law enforcement technology has evolved greatly over the last decade, but unfortunately, funding gaps still keep most departments woefully behind.
Smaller agencies usually have no IT staff but still rely on the officer who has a penchant for computers.
Some dislike data being collected, but when the 911 call pops up showing that the homeowner has a serious heart condition and there's no response on a callback, it really is taken much more seriously. Like it or not, that data really can save lives.
Law enforcement technology is not mainstream business technology. It's shadowed by laws and standard operating procedures about how information can be used and shared. Law enforcement technology is still mostly understaffed, underfunded and always on the verge of obsolescence.
Author, THE BLACK WIDOW AGENCY
A number of commenters have mentioned the supposedly high cost of data mining software. This is largely a myth.
True, at the high end of "turn-key solutions" you can shell out 6 figure sums, but it isn't necessary to go nearly so far.
At a previous job myself and a colleague did some very effective and useful data mining using existing GIS software, some (free) GPL data mining libraries, a week of evenings reading about the theory, and a couple of hours guing it together with Perl.
If you want something a bit more "turn key" you can expect to pay from $1000 - $5000 for entry level "solutions", but anying that doesn't require any coding at all will probably be of limited functionality as the problem domain is so broad.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.