Inherent Bias in Recidivism Algorithm

Really good investigative reporting on the automatic algorithms used to predict recidivism rates.

Posted on June 8, 2016 at 1:14 PM • 24 Comments

Comments

Tatütata • June 8, 2016 1:36 PM

Breaking open an air conditioner to steal the tubing? Sheesh, how desperate must you get? At least you couldn’t say it is easy work.

I tried to find in the article how the model is trained and whether its designers seriously attempted to keep racist bias out of the equation. The odds of having something pinned on you owes a lot more to the racial and economic background than the actual circumstances.

If a machine decides on your fate, how does this fit with the US constitution which provides judgement by a jury of peers?

Credit scores in Germany suffer a similar problem. They use your zip code, so if you’re in a poorer neighborhood a merchant will be much more likely to get a red light if you live in 12047 than in 14167.

Cues like the first name are used to estimate your age and socioeconomic status. So thet decide to grant you credit on whether your name is Gerlinde or Mandy (and of course, Mohammed will be most objectively distrusted…)

Brian M • June 8, 2016 1:50 PM

I can’t help but notice an age discrepancy between the White people and the Black people. This is also a very small sample size in these examples, so I would suspect some biases in the reporting here. What does the algorithm score on? It may be taking in some other factors that could affect the outcome. Eg, they may live in different areas where crime is more common, or be associated with more people that commit larger crimes. Without reading the whole paper, and understanding the algorithm, this article smells fishy.

It reminds me of the guy (don’t remember his name, don’t even care to), who claims black people aren’t as smart as white people because the aggregate SAT and college test scores. He refuses to factor in living conditions and many other background conditions that have been clearly shown to affect those scores. He’s wrong, obviously, but this article (and maybe the research) is doing the same thing.

The right course of action is to keep refining the algorithm to get a more accurate picture of recidivism. These examples may be outliers, exceptions to the rule, or it could be shoddy journalism and poor research. Of course, it could be accurate too.

Andrew • June 8, 2016 1:52 PM

This blog post claims to have gone through the math and found that the data doesn’t support Pro Publica’s claims. To quote:

Bias in the criminal justice system is prevalent. Humans are known to be intrinsically racist, and often irrationally so – is it any surprise that a justice system made of up humans would also be racist?

In contrast, the Cox model has no intrinsic prediliction for racism. In addition, according to multiple independent statistical analyses, any racial bias in the Cox model cannot be distinguished from random chance. Does anyone believe that bias in the human justice system is so small that we can’t measure it?

So in short, we’ve taken a flawed and racist human system and replaced it with a much better machine learning system. We’ve reduced the racism in the world. This is great!

Yet in order to sell clickbait, ProPublica has decided to spread dishonest mood affiliation (with no statistics!) criticizing the much better system. Consider the possibility; ProPublica’s anecdotal criticism of the Cox model finds legs, and politicians decide to replace the automated system with a human one. If we do this, racism will increase! That’s probably not what most of the right-thinking anti-racist people outraged by this story expect, yet that’s the inevitable result.

tz • June 8, 2016 3:48 PM

I think the question is whether the Cox model, or whatever the algorithm is (is it a trade secret like the Breathalyzers?) does justice.
Justice is not accuracy or precision.

Any algorithm will have some finite set of inputs to provide the output, so CANNOT by definition take into account all circumstances.
Bias and inaccuracy are different things, however nowhere has it been shown that the human system was less accurate (even if less consistent) though it is assumed.

The overall evil is to want to find computerized silver bullets that even if they don’t work as well as they ought to, even if they are less accurate or give worse results – but we need to know first what we want and will tolerate when either humans or computers make a mistake.

You can appeal an unfair ruling by a judge or magistrate. But the machine will never be “unfair” in any sense of the word. It just provides a number. But do we wish to give robots the force of law, or perhaps put differently what accuracy should we demand?

Somewhere it is said it is better than 99 guilty go free than 1 innocent be convicted. We’ve already lost that with plea-bargaining and excessive bail. But if that is the standard, false “guilty” scores should weigh more seriously than false “innocent” scores.

Daniel • June 8, 2016 3:51 PM

I was not going to comment on this topic until Andrew linked to that nonsense blog post.

The people behind these various models of recidivism claim that the models produce scientific results. So where is there control group? They don’t have one. Where is there randomized sampling? They don’t do it. Where is their research testing that eliminates alternative hypothesis? They don’t do that either.

It’s garbage in, garbage out with lots of numbers to lull the gullible.

If you don’t understand, read this:

http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

You can also read this:

https://www.apa.org/pubs/journals/features/law-0000012.pdf

In other words, it doesn’t really matter whether the models are racist or not. They are complete bullshit applied to anyone.

Michael P • June 8, 2016 4:09 PM

What makes that piece good investigative journalism? I would expect good journalism to mention the actual recidivism rates for the populations in question. If the actual recidivism rate for blacks is twice as high as whites (disclaimer: I have no idea what the ratio is), is it racist for a regression model to predict twice as much recidivism among blacks as whites? That they don’t mention the actual rates, and because (as mentioned above) their sample size is so small and their sampling methodology undescribed, it comes across more as a propaganda peace than good journalism.

read the analysis • June 8, 2016 4:56 PM

Folks, the actual numbers that a lot of you area screaming for are all buried in the analysis details at:
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

In a few cases, you may have to do some basic arithmetic and careful reading to get what you’re after, but it’s really clear from the statistics that this algorithm makes different kinds of mistakes depending on race.

qwertty • June 8, 2016 5:10 PM

@Andrew

Where do you get the training data for these algorithms? From the judicial systems which keeps these kinds of records, and which, as you point out, has been affected by the racist bias of its employees.
If you train a model on biased data, you get a biased model. Plus some error.
But you also get to put all the blame on the “science”, so hey, who cares, right?

tz • June 8, 2016 5:31 PM

There is another problem – there are 137 questions

https://www.documentcloud.org/documents/2702103-Sample-Risk-Assessment-COMPAS-CORE.html

How does this NOT violate the defendant’s 4th and especially 5th amendment rights? Can you be denied bail if you don’t answer? What about not answering truthfully?

How well does it work if questions about “have your friends ever been arrested” are left unanswered?

Meanwhile, many of the questions (Trouble paying bills?) will have a racial bias.

Even here there have been innumerable problems shown with “anonymizing data” where they could still locate you within a very small area (physical or virtual).

In the case of something like Insurance, the Credit score/rating is used and provides correlation, but this is yet a market, so I could prosper by finding better things to base it on.

In the case of incarceration, bail, the justice system, we need to remember CORRELATION IS NOT CAUSATION.

Guilt by correlation?

There are also reverse biases in the system no one wants to talk about. Deadbeat Moms are jailed at 1/8th the rate as Deadbeat Dads (and how do you pay child support when you are in jail?). You can find similar biases in custody. Because it is “reverse” sexism, most don’t care or even applaud discrimination here.

But Justice is based on the roman Themis – Who is wearing a blindfold and has honest scales.

Coyne Tibbets • June 8, 2016 6:37 PM

The story is a bit anecdotal with a puny sample size.

That aside, I suspect bias of this type is inevitable. Almost the entire system, from NYC’s “Stop-and-Frisk” program, to sentencing, is significantly biased against people of color. The NYC program was a shining example: of the people stopped, there was a much higher rate of crimes discovered among whites–and yet the police continued to disproportionately stop people of color.

Blacks: More likely to be stopped for little or no cause, than whites. More likely to be arrested for minor or manufactured crimes (possession, disorderly, resisting, disobeying) than whites. More likely to be prosecuted for similar crimes. More likely to be pressured into pretrial plea; or convicted by a jury. More likely to be sentenced to prison; and for longer sentences.

Given that, for a large set of whites and a large set of blacks, it shouldn’t be particularly surprising that the set of whites would have a lower apparent recidivism rate. It is from these sets that the machine is trained: what else would we expect it to learn about recidivism?

Worse, I suspect the program output is judged harshly in the same light. If it kept answering “low” all the time for blacks, how likely would it be to be regarded as reliable by the authorities that use it? So the company that makes it has a motive (it appears to me) for predictions to be higher for blacks: a program perceived as unreliable won’t be used…salable.

Given the social environment, I think a bias is inevitable. I certainly would not be surprised.

Mark • June 8, 2016 10:46 PM

The authors kindly made their data and analysis publicly available and their very own Cox model shows pretty much the opposite of what they conclude in the article. Please have a look at the plots [39] and [49] in their analysis they published on github. The analysis of the article is flawed. I assume there’ll be a detailed rebuttal forthcoming from the people that built the system.

Regarding the use of such a system in court for sentencing I encourage you to read the Supreme Court ruling Malenchik vs Indiana (Number 79S02-0908-CR-365). For the reason why the company doesn’t just publish their system for all to see note that a mathematical formula cannot be copyrighted or patented (keeping it a trade secret is the only way to protect their invention in this case).

The system computes a probability of re-arrest (two scores: for any new arrest, or an arrest for a violent offense). Just like with car insurance a high probability does not mean certainty; it just means that drivers with similar characteristics had more accidents. Listing some examples where the prediction was wrong is meaningless. Many of the examples in the article are easily explained: just google for “age crime curve”. Like Brian M has already noted there are large age differences between the cases, and young people are more likely to be arrested for a new offense. Surely this is a factor in the risk model as it’s well known fact in peer-reviewed scientific literature. If you look at the first two cases in the original pro-publica article (Prater / Borden) you find somebody whose major offenses are 20 years in the past compared to somebody who committed four offenses very recently.

So should these systems be used? The psychologist Paul Meehl has spend a lot of time on this question in the 60s (before big data), and concluded that even the simplest mathematical models beat trained experts in prediction. Meta-analysis showed that Meehl was right (Grove et.al, 2000; White M.J., 2006). Humans suck at making predictions and estimating probabilities. This is no different in criminal justice. Isn’t it better to do this stuff with a system where we can actually do a performance evaluation and studies to see if it works? If you disagree with this, ask yourself how well the alternatives are working and think of drug-dogs predicting the presence of drugs.

MrC • June 8, 2016 11:13 PM

Pretty obvious astroturfing there guys…

RonK • June 9, 2016 3:41 AM

@ Mark

Yes, of course! I want a “trade secret” to be what decides the fate of citizens in various situations in a democratic country!

OTOH, I suppose, this is so old hat, no? (“no-fly list”, electronic voting machines)

Ugh.

George H.H. Mitchell • June 9, 2016 6:03 AM

Please: it’s not an algorithm; it’s a heuristic.

paul • June 9, 2016 8:20 AM

It gets even worse, because this thing is used (even though, according to the article, the developer doesn’t really recommend it) to make bail and sentencing decisions. And guess what correlates with going to jail in the future? Getting put in jail now. (Because of the job and other opportunity costs of even short stints in jail, because of the monetary costs of posting bond and so forth.)

Also, especially in a world we know is racially biased, it seems that you could start by not targeting the likelihood of re-arrest, because arrest does not equal a determination of guilt.

Ray Dillinger • June 9, 2016 11:56 AM

I have no doubt that this program accurately predicts correlations matching those found in its input set.

Problem is its input set is biased as hell. It’s being trained on correlations between test answers and subsequent convictions.

So it’s not predicting how likely it is that someone will commit another crime. It’s predicting how likely it is that someone will be convicted for committing another crime. That’s the data set it’s trained on. Garbage in, Garbage out.

It’s predicting the result of human bias.

ObviousMan Saves the Day! • June 9, 2016 12:32 PM

Let’s say one race has more criminals per-capita than another… If you program a computer with that data, and ask it to predict if a given person is more or less likely to commit a crime based on that data… then is it any wonder that it will rate people of that one race higher than the other, based solely on their race?

It’s like “1 + 1 = 2″…. “NOOO DAMMIT… 2 IS NOT A POLITICALLY CORRECT ANSWER!!!” uhh…. well I guess if you didn’t want “2” as the answer, you shouldn’t have plugged in “1 + 1” then, right?

So if you don’t want computers to predict people’s crime rates by race, then don’t put in the current crimes rates by race! duh! It’s only using whatever factors you programmed it to use!

Milo M. • June 9, 2016 2:43 PM

Re Mark, June 8, 10:46 pm:

The “White, M.J., 2006” reference:

http://www.oocities.org/g-lam/tcp2006aegisdottiretalclinicaljudgment.pdf

“One area in which the statistical method is most clearly superior to the clinical approach is the prediction of violence, r = –.09. Out of 1,000 predictions of violence, the statistical method should correctly identify 90 more violent clients than will the clinical method (Rosenthal, 1991).”

“This meta-analysis represents only the second meta-analysis conducted in this area of the literature (cf. Grove et al., 2000). The present findings are not without limitations. The arguments in favor of the small, but reliable, edge of statistical prediction techniques are strong, but we are struck by the limits of these studies.”

A 2012 paper:

http://www.bmj.com/content/345/bmj.e4692

“What is already known on this topic

Instruments based on structured risk assessment predict antisocial behaviour more accurately than those based on unstructured clinical judgment

More than 100 such tools have been developed and are increasingly used in clinical and criminal justice settings

Considerable uncertainty exists about how these tools should be used and for whom

What this study adds

The current level of evidence is not sufficiently strong for definitive decisions on sentencing, parole, and release or discharge to be made solely using these tools

These tools appear to identify low risk individuals with high levels of accuracy, but have low to moderate positive predictive values

The extent to which these instruments improve clinical outcomes and reduce repeat offending needs further research"

Malenchik v. Indana:

http://law.justia.com/cases/indiana/supreme-court/2010/06091001bd.html

Note that this is the Indiana Supreme Court, not US Supreme Court.

Joseph M • June 10, 2016 6:18 AM

Regardless of the accuracy of the algorithm, I think people should discuss what information the criminal justice system should take in to account when dealing with you: your own actions, or the actions of other people who resemble you in various superficial ways? Many, but not all, people seem comfortable using credit scores to influence lending decisions, but the criminal justice seems to differ from this in various important ways.

ObviousMan Saves the Day! • June 11, 2016 2:46 PM

@Joseph M

“…what information [should] the criminal justice system should take in to account when dealing with you: your own actions, or the actions of other people who resemble you in various superficial ways?”

Very well put.

Bumble Bee • June 11, 2016 5:08 PM

There is an inherent bias here in our criminal justice system that no one wants to look at because everyone seems satisfied with it.

https://www.bop.gov/about/statistics/statistics_inmate_gender.jsp

The mob uses women and girls for mules and runners and other high-risk low-level jobs, because they are reasonably confident the females will not be arrested or have to serve prison time (their purses constitute a big privacy zone men do not have) and moreover they use sex to gain sympathy and distract male investigators, prosecutors, and judges. Police dogs come from Germany and are exclusively trained to pursue male suspects.

Remember “the male” from that other thread several months to a year ago who was suspected of hacking into an airplane’s flight control system? About the time Andrew what’s-his-name was banned?

The federal criminal justice system is a veritable red-light district throughout the whole U.S. Women and girls are property, not independent persons under this system, so by and large they cannot be charged with criminal offenses.

And all that other shit that goes along with red-light districts.

Anon10 • June 11, 2016 10:14 PM

As I read the article, the authors only pulled their criminal records from one county, so if someone later commits a criminal offense outside that one county, it wouldn’t show up as recidivism in their data. Maybe, the researchers couldn’t get any better data to work with, but that flaw alone should have killed the paper in the peer review process.

Nick P • June 11, 2016 11:39 PM

@ Anon10

That’s a good catch. There are often clusters of cities, towns, or just neighborhoods near county lines where people come and go through them. In my area, there’s a cluster of four counties in two states that have people moving around in them all the time. Any opportunity, good or bad, might show up in any of them regardless of where we live. Plus, crooks do often keep jurisdiction in mind when doing things. They might try to avoid getting in same county or city court twice.

TRX • June 13, 2016 8:40 AM

> Credit scores in Germany suffer a similar problem. They use your zip code, so if you’re in a poorer neighborhood a merchant will be much more likely to get a red light if you live in 12047 than in 14167.

I don’t know if American credit scoring does that, but I’d be surprised that it doesn’t.

I do know that my auto insurance rate is strongly based on my ZIP code, though. The insurance industry makes no secret of that.

Schneier on Security