The Fallibility of DNA Evidence

This is a good summary article on the fallibility of DNA evidence. Most interesting to me are the parts on the proprietary algorithms used in DNA matching:

William Thompson points out that Perlin has declined to make public the algorithm that drives the program. “You do have a black-box situation happening here,” Thompson told me. “The data go in, and out comes the solution, and we’re not fully informed of what happened in between.”

Last year, at a murder trial in Pennsylvania where TrueAllele evidence had been introduced, defense attorneys demanded that Perlin turn over the source code for his software, noting that “without it, [the defendant] will be unable to determine if TrueAllele does what Dr. Perlin claims it does.” The judge denied the request.

[…]

When I interviewed Perlin at Cybergenetics headquarters, I raised the matter of transparency. He was visibly annoyed. He noted that he’d published detailed papers on the theory behind TrueAllele, and filed patent applications, too: “We have disclosed not the trade secrets of the source code or the engineering details, but the basic math.”

It’s the same problem as any biometric: we need to know the rates of both false positives and false negatives. And if these algorithms are being used to determine guilt, we have a right to examine them.

EDITED TO ADD (6/13): Three more articles.

Tags: algorithms, biometrics, DNA, false negatives, false positives, identification, transparency

Posted on May 31, 2016 at 1:04 PM • 44 Comments

Comments

Ross Snider • May 31, 2016 1:13 PM

Indeed, there is much determined by algorithms that are black box, including US citizen’s “threat scores” used in ‘real time law enforcement’ [1] and the algorithms used to determine Credit Score from social media posts. [2]

Furthermore, a large number of media organizations block content according to algorithms – some of which are also hooked into Federal systems. I remember being blocked from posting Snowden disclosures on Facebook (and watching others not able to post political messages and the May Day protest organizing blocked by Facebook).

Maybe to a sort of Richard Stallman thesis: if the user doesn’t control the code, the code controls the user. And if a special interest controls the code: the special interest controls the user.

[1] (https://www.washingtonpost.com/local/public-safety/the-new-way-police-are-surveilling-you-calculating-your-threat-score/2016/01/10/e42bccac-8e15-11e5-baf4-bdf37355da0c_story.html)
[2] (http://www.forbes.com/sites/moneybuilder/2015/10/23/your-social-media-posts-may-soon-affect-your-credit-score-2/)

TimH • May 31, 2016 1:31 PM

This is something that seems easy to leglislate. Any equipment used to provide evidence must be examinable by the defense, just like physical evidence can be tested, and witnessed cross-examined. A black box provider can protect their business with patents, but not with trade secrets.

I suspect the issue with all this kit is that there is filtering going on. That’s standard when an analog signal is digitised – to shape the input bandwidth to avoid aliasing, and to mitigate noise. But the other filtering is the yes/no decision in the grey area of the measurement resolution and/or accuracy. The equipment makers don’t want this analysed and challenged, because it is probably a lot more subjective than they admit. For example, potentially cancerous cells are normally examined by a person – not a machine – for the cancerous/not decision. And let’s not forget Dr Bite Mark’s evidence recently voided.

Tom VanCourt • May 31, 2016 1:56 PM

Some years back, the Boston Globe had two articles on facing pages of a two-page spread. On the left, the then-governor talked about favoring the death penalty only when evidence was utterly beyond doubt, like DNA. On the right, the FBI were accused of falsifying DNA evidence.

I’ve always admired the editor who placed those articles.

albert • May 31, 2016 2:37 PM

It would be a lot easier to examine the code. There’s no reason to withhold it, with NDAs and lawyers around. The reason given for refusal to release the source code was that it would be “financially devastating”. I wonder why? Fear of competitors ‘stealing’ it? Maybe there’s prior art. Maybe it’s obvious and not unique. Maybe it’s not accurate. Maybe the truth will put them out of business.

Selling such a product before a patent application, then refusing disclosure is at best, unethical, and possibly criminal.

Maybe we shouldn’t be using computers to make ‘life and death’ decisions.
. .. . .. — ….

Chris • May 31, 2016 3:08 PM

There’s nothing magical about comparing DNA sequence from several samples. The methods for doing so are straightforward and have been implemented many times in academia. They have probably wrapped it in a slick user interface, which is where the real value is added. There should be nothing preventing them from releasing the core DNA comparison code.

Herman Mirsky • May 31, 2016 3:13 PM

“We have disclosed not the trade secrets of the source code or the engineering details, but the basic math.”

There’s a big difference between what the maths is able to do and what you’re actually making it do.

Daniel • May 31, 2016 4:56 PM

This specific situation has already garnered the attention of the the folks at Evidence Prof Blog

http://lawprofessors.typepad.com/evidenceprof/2016/04/yesterday-the-huffington-post-published-a-piece-on-the-trueallele-casework-system-according-to-the-piece-cybergenetics.html

give me six lines • May 31, 2016 5:17 PM

There really should be a new TV crime drama titled CSI:F-ups, or maybe Law and Order: Junk Science

Are there special classes, courses, seminars, certifications for defense attorneys on this stuff.

Cop show, Doctor shows, Lawyer shows – all propaganda to instill false confidence in a very flawed system.

moo • May 31, 2016 5:25 PM

I think the biggest problem with DNA evidence is that it gets misused in statistically unsound ways.
http://www.theatlantic.com/science/archive/2015/10/the-dark-side-of-dna-databases/408709/

If you gather 1 sample from a crime scene, and 1 sample from a suspect, a DNA test might confirm that there’s a 99.999% chance that the suspect was the one whose DNA got left at the crime scene. But if you gather 1,000,000 samples from a population of convicted/suspected/possible future criminals, and test every single one of them against the same 1 sample from the crime scene, there’s a significant chance that one of them will match purely due to random chance.

Now take that kind of “evidence” and put it in the hands of a prosecutor with a strong incentive to secure convictions, who puts it in front of judges and juries with no background in statistics, and the inevitable miscarriages of justice follow.

Bumble Bee • May 31, 2016 7:18 PM

Yes, DNA evidence is fallible. It’s like most other science, as far as juries are concerned. It is done in a lab out of sight, and I would imagine that the jury is asked to believe expert testimony regarding chain of custody, correct and proper methodology in the lab, the validity of the science itself, and how well the DNA evidence corroborates with other evidence collected at the crime scene.

I would like to know of actual cases where defense attorneys are failing to cross-examine this expert testimony on which the DNA evidence depends. I don’t doubt for a minute that there are such cases, but (more) actual examples would be nice.

Is the argument against any use of DNA evidence at all in court, or is it against defense attorneys’ failing to do their jobs? Or is there something else at play here?

tyr • May 31, 2016 7:38 PM

I recall an old usenet story written by a tech who
was working on building a cancer detector for an
imminent science guy. The detector was fairly
standard electronic gadgetry with a black box
analysis program to massage the data and decide
if a sample was cancerous or not.

So the curious tech fed it a hot dog chunk. It came
back cancerous, then he fed it the same chunk again
and it came back as clear of cancer. The puzzled
tech then took a look at what the code was in the
secret program. It turned out to be a Basic loop
that generated a random result. When the originator
was questioned about this it turns out the machine
was being built on a grant proposal based on the
shiny new detection method, but the shiny new method
did not exist yet. Very expensive vapourware and a
potential danger to future patients. Given enough
traction this could very well have made it into the
hospitals as a very expensive coin toss diagnostic
method.

Moral is: Let’s see the code if lives are on the line
because of it. NDAs aren’t that hard to write if you
are worried about the exposure. I’m sure Venter has
a couple of people who could tell you whether it is
a valid method.

Dirk Praet • May 31, 2016 7:48 PM

We have disclosed not the trade secrets of the source code or the engineering details, but the basic math.

Total felgerkarb. A cryptographer can come up with the best algorithms, but how they are being implemented in general is an entirely different issue. It’s beyond me that any judge with half a brain would allow such “black box” evidence.

@ give me six lines

Cop show, Doctor shows, Lawyer shows – all propaganda to instill false confidence in a very flawed system.

May I suggest Mack Sennett’s wonderfully subversive Keystone Cops and Reno 911! as a powerful antidote?

I guess it’s safe to say that being exposed to the Keystone Cops at a very early age decisively shaped my general attitude towards LEO’s and other authority figures in a uniform.

Taihennami • May 31, 2016 8:20 PM

Sherlock Holmes is another good one, with that tendency for Inspector Lestrade to be consistently wrong about everything.

Spaceman Spiff • May 31, 2016 8:30 PM

Algorithms and math be damned! That doesn’t prove that the code is correct! Until it can be independently verified, by multiple experts, it is unreliable! This is just another case of “security by obscurity”, and people are going to prison or the gallows because of it… GAH!

give me six lines • May 31, 2016 8:47 PM

@Dirk Praet, @Taihennami

Shakespeare’s Dogberry was impressive also.

give me six lines • May 31, 2016 8:52 PM

While we’re at it Sledgehammer – Trust me, I know what I’m doing.

I am Groot • May 31, 2016 8:53 PM

I look forward to a similar critique of digital evidence.

How is it that someone can be convicted of possessing illegal digital content when the accusers (not to mention a whole host of other nefarious actors) have the means, motive, and opportunity to place said illegal content on whatever device they so choose?

r • May 31, 2016 9:01 PM

Even if the software is good how do we know the hardware’s legit. 🙂

Wael • May 31, 2016 9:19 PM

He noted that he’d published detailed papers on the theory behind TrueAllele, and filed patent applications, too: “We have disclosed not the trade secrets of the source code or the engineering details, but the basic math

One sentence at a time…

Published detailed paper

Teeeeshek ✔️

Filed patent applications, too

Tell that to your friends, irrelevant to this case!

We have disclosed not the trade secrets of the source code or the engineering details

Assuming the above two items show the theory is solid, the request is then to show that the software implements the “published” theory. Theory is good, math looks fine, but the linkage between your implementation and the published knowledge is broken! You want me to trust your word? Why don’t you just pronounce the defendant guilty, and just skip this gadget crap?

Like with other “black box” type testing, when implementation details aren’t available — for whatever reason — the next best thing is to use test vectors. If I were the defense attorney I would have asked for 10 Million sample runs to be able to characterize the accuracy and fidelity of the “trade secret”. We do that with random number generators! Give me a sample output ranging from 10Kb to 100Gb and upwards, depending on the purpose of e test.

With Biometrics, we understand the accuracy and limitations (FAR, FRR: False Acceptance Rate and False Rejection Rate — you’ll find different and perhaps erroneous expansions to these acronyms online) among other properties.

These lawyers can learn a thing or two from security… I say they should come here for a dry run before they go to court. We don’t charge $350 an hour!

Treebeard • May 31, 2016 9:39 PM

It’s old news but:

Keystone investigators seek out private DNA databases
http://phys.org/news/2016-03-law-private-dna-databases.html

like Ancestry.com but why stop there, all that family tree data for every user of genealogy sites would make an interesting graph to study and mark with the available genetic databases and use statistics to fill out probable genetic profiles where missing. Why isn’t my personal genetic sequence copyrighted, being physically written on my substrate. Oh you’ll say I didn’t write it, it belongs to Big Pharma…

Wael • May 31, 2016 10:02 PM

A little OT…

you’ll find different and perhaps erroneous expansions to these acronyms online

Speaking of acronyms…

AAAAAAAAAAAAAAAAA: (pronounced aaaaaah)

Allied American, Austrian, Australian, Albanian, Armenian, And Argentinian Anti Acronym, Abbreviations And Abstractions Abuse Association And Affiliation

I could have made it longer than 17 characters, but you get the idea 🙂

That would be my response to Dr. Perlin 🙂 (makes a nice password, I mean pass-expression.)

Chopper Reid • June 1, 2016 12:03 AM

a cop once told me, if you ever find yourself in court being told there is finger print evidence – make sure they demonstrate the actual finger print sample and the proof of its location and how they obtained it etc.
There must be a demonstration of all evidence before the court.

because it’s way harder for fingerprints to be located than the movies make it seem

this may help some one, some where

GuineaPig • June 1, 2016 12:14 AM

@ tyr @moo and @everyone

i am reminded of ‘quack buster’ Ben Goldacre whom writes for the guardian and a few books. And one particular article called ‘So You Have a Pill’ found in the appendix to Four Hour Body by Tim Ferriss
The article and much of his writing is how pharmaceutical companies massage and doctor data, and the infinite array of tricks they use to take a study and make it look like a drug 1. does what they say it does 2. with minimal no adverse effects 3. will not kill anyone
4. et cetera

His article ‘so you have a pill’ is a tragi-comedy, after he takes you through ever single trick they use, ‘bearing in mind your audience are not journalists or the general public, they are GP’s, and they know all of these tricks’ one of the last resorts is the Git-Mo method – ‘if all else fails – torture the data’

GuineaPig • June 1, 2016 12:19 AM

PS if you are ever concerned about someone placing too much trust in western medicine, big pharma, western drugs etc. Let me rephrase that. If you ever observe someone placing any trust in…

The article above ‘So You Have A Pill..’ is priceless and sure to convince them. I just haven’t found a copy on line

Wael • June 1, 2016 1:52 AM

will be unable to determine if TrueAllele does what Dr. Perlin claims it does.” The judge denied the request.

I’m no lawyer but how can the judge justify not checking the equipment? The algorithm at this point is ‘Voodoo, Witchcraft, and Sorcery’! It’s written all over the name: ‘El Rule Tale’ is an anagram of ‘TrueAllele’! They didn’t think that’s a little suspicious?

It’s defense’s fault. Couldn’t the Lawyer say something like: ‘[Shazam!] If it doesn’t fit, you must acquit’?

Snarki, child of Loki • June 1, 2016 7:50 AM

I’m sure there’s a huge amount of proprietary R&D that went into the UI between the “basic math” of DNA matching and the LEO sitting at the keyboard.

Stuff like:

“who do you want this sample to match?”

Jason • June 1, 2016 8:17 AM

Bit of a stretch but what about the 6th amendment?

albert • June 1, 2016 10:25 AM

@GuineaPig,

“…1. does what they say it does 2. with minimal no adverse effects 3. will not kill anyone 4. et cetera …?”

Many drug ads I see on TV mention serious side effects and death as possible ‘side effects’.

Death as a side effect? God bless America!

So, 1. Kudos for their honesty, 2. Stay far, far away from those drugs.

Save your money and go to Vegas instead.

. .. . .. — ….

gamble • June 1, 2016 10:52 AM

@albert

I’d go to Reno instead. From what I understand, they have better cops there

Anselm • June 1, 2016 11:28 AM

Death can be a side effect of crossing the road.

With drugs, it is unreasonable to insist that there be no side effects at all. Anything that messes with your body in the way many drugs do is virtually guaranteed to have (mild or grave) side effects, and drug companies do try to mitigate those if at all possible. Drugs that have no side effects (think homeopathy) usually have no primary effects, either.

What is really important with medical interventions – not just drugs, but also, e.g., surgery – is the risk/benefit ratio of the intervention. It stands to reason that “death” would not be an acceptable side effect of an acne remover, even if it happened in one out of a million cases, but if you suffer from something that will definitely kill you tomorrow if left untreated, then having an intervention that will either cure you completely or kill you outright with a 50% chance on either side may be an acceptable risk. Conversely, acupuncture does nothing for you beyond the placebo effect, but the associated risk from possible (if rare) side effects such as infection or pneumothorax suggests that acupuncture isn’t worth considering as a treatment.

cmurf • June 1, 2016 11:47 AM

Bruce:

And if these algorithms are being used to determine guilt, we have a right to examine them.

OK but we have that ability by examining the patent filing. Do we have the right to examine the code that constitutes the implementation of that algorithm? I implicitly agree that we should have that right, otherwise it’s a break in the chain of custody. It seems much more clear cut that something which processes data and determines a result, needs to be transparent. But what’s the argument?

Milo M. • June 1, 2016 11:56 AM

@GuineaPig:

The first half of the article (2 pages) is currently in Google Books:

https://books.google.com/books?id=yeIGdYQQNQsC&pg=PA501

https://books.google.com/books?id=yeIGdYQQNQsC&pg=PA502

For others interested in Dr. Goldacre:

http://www.badscience.net/

http://www.theguardian.com/profile/bengoldacre

paul • June 1, 2016 12:23 PM

“because it’s way harder for fingerprints to be located than the movies make it seem”

Not only that, but — for the most part — fingerprint comparisons aren’t actually comparisons of fingerprints. They’re comparisons of sets of ostensibly invariant points in the prints, which is to say a complicated, specialized hash of the apparent topology of the prints. (I say “apparent” because few prints are clean an unambiguous.) Much of this design is from days when storing more than, say, 64 bytes per print was prohibitive.

albert • June 1, 2016 4:36 PM

Software patents are written as broadly as possible to cover as many cases as possible, and still pass the examiners. Therefore, they are as useless as tits on a boar hog when code verification is the issue. If you’re making tires, code verification isn’t necessary. You just want good tires. That’s the result. When your product is the production of evidence that is considered the ultimate proof in a court case, lives are at stake.

I hope to see more court cases like this. Perhaps rationality may prevail. Cybergenetics is not in a good position right now. They will have to prove their system eventually, or wind up in court; sued by a defendant, or a forced reexamination of all TrueAllele-derived evidence.

I’d really like to see the TrueAllele TOS.
. .. . .. — ….

infallible • June 1, 2016 9:43 PM

Just trust us, we’re infallible. There’s no possible way we could EVER have ANY problem with our software…

Judges, courts, juries, and the population at large just buy it hook, line, and sinker! What could possibly go wrong.

Drone • June 2, 2016 3:04 AM

The accepted methods and standards for the testing and interpretation of results in forensic DNA analysis are public knowledge. Machines for forensic DNA analysis undergo independent testing and verification using random samples with a-priori known valid results. If a DNA analysis machine is properly maintained and periodically verified for accuracy, I see no reason for the manufacturer to disclose his/her source code. In-fact studying the source code outside the realm of casual curiosity is a waste of time for anyone except (perhaps) the machine maker’s competitors.

The only time I might want to look at at-least some of the machine’s source code is if the manufacturer of the machine was dumb enough to allow the machine to be networked and/or accept easily inserted removable storage. A forensic DNA machine connected to the Internet would, in my opinion, introduce reasonable doubt into the validity of any results it produces. And no, unlike the go/no-go testing of the DNA test accuracy, penetration-testing of the machine is far more complex and unreliable. That’s where the source is needed. Anyway, I doubt the manufacturer would resist opening the source surrounding the highly proprietary DNA testing and result interpretation code, which would remain locked.

Snarki, child of Loki • June 2, 2016 7:41 AM

If you read the (more detailed) information, particularly from the Evidenceprofblog link, above, you find the court reasoned as follows:

“If we require that the defense be able to examine the software, the expert witness might pull out of the case, making it harder for the prosecutor”

Well boo fncking hoo.

Hey, I got an idea…how about I make a magic black box: you type in the name of a RW prosecutor, and it spits out “DEVIANT PEDOPHILE!”. Sorry, you can’t see what’s inside, it’s a secret

paul • June 2, 2016 9:27 AM

@Drone

From the articles in linked to, what you’re saying is apparently not true for the case of microscopic mixed samples. The machines can tell you (except when there’s dropout and they don’t) which alleles are present, but they can’t tell you definitively which alleles come from whom without that crucial interpretive step. Which is apparently well understood in principle but not so well in practice.

And that’s before you get to the question of whether the machines are properly used and maintained and whether the samples are in fact properly collected. (There are protocols for those things, but labs have been known to not adhere to protocols.)

give me six lines • June 2, 2016 3:48 PM

Somewhat related (from The Intercept) since this database contains dna profiles:

The FBI Wants to Exempt Massive Biometric Database From the Privacy Act
https://theintercept.com/2016/06/01/the-fbi-wants-to-exempt-massive-biometric-database-from-the-privacy-act/

With the exemption, “they are maximizing their ability to act without oversight but they risk leaving victims of inaccuracies out in the cold with no remedy.”

give me six lines • June 2, 2016 3:52 PM

Additionally on the FBI database, they’re are linking biometrics with the obvious error potential being associating person A’ fingerprints with person B’s facial, person C’s dna profile etc so a hit for another suspect is linked to your face or name and probably no way to question the tie in in court.

Rick • June 2, 2016 7:30 PM

@Drone

Can you say “volkswagen”?

https://en.wikipedia.org/wiki/Volkswagen_emissions_violations

DNA correlation? • June 4, 2016 9:22 PM

Already doing autosomal and y-dna surname searches on genealogical dna data they are.

http://www.wired.com/2015/10/familial-dna-evidence-turns-innocent-people-into-crime-suspects/

For some surnames they can track down from your 5th or 6th cousin if they have released their y-dna data.

One would hope the defense team could run their own analysis of the raw data in these cases where the suspect DNA has multiple people’s samples in it.

JMC • June 7, 2016 8:39 AM

This is another article on the fallibility of forensic tests if misinterpreted or not carried out correctly that goes into more depth than the one on The Atlantic. Unfortunately, the conclusion is the same: woe betide you:

http://www.texasmonthly.com/articles/false-impressions/

Tamper Tentroom • September 26, 2017 5:42 PM

Ahhhbove, the joys of this open forum.

The Fallibility of DNA Evidence

Comments

Leave a comment Cancel reply