Software Problems with a Breath Alcohol Detector

This is an excellent lesson in the security problems inherent in trusting proprietary software:

After two years of attempting to get the computer based source code for the Alcotest 7110 MKIII-C, defense counsel in State v. Chun were successful in obtaining the code, and had it analyzed by Base One Technologies, Inc.

Draeger, the manufacturer maintained that the system was perfect, and that revealing the source code would be damaging to its business. They were right about the second part, of course, because it turned out that the code was terrible.

2. Readings are Not Averaged Correctly: When the software takes a series of readings, it first averages the first two readings. Then, it averages the third reading with the average just computed. Then the fourth reading is averaged with the new average, and so on. There is no comment or note detailing a reason for this calculation, which would cause the first reading to have more weight than successive readings. Nonetheless, the comments say that the values should be averaged, and they are not.

3. Results Limited to Small, Discrete Values: The A/D converters measuring the IR readings and the fuel cell readings can produce values between 0 and 4095. However, the software divides the final average(s) by 256, meaning the final result can only have 16 values to represent the five-volt range (or less), or, represent the range of alcohol readings possible. This is a loss of precision in the data; of a possible twelve bits of information, only four bits are used. Further, because of an attribute in the IR calculations, the result value is further divided in half. This means that only 8 values are possible for the IR detection, and this is compared against the 16 values of the fuel cell.

4. Catastrophic Error Detection Is Disabled: An interrupt that detects that the microprocessor is trying to execute an illegal instruction is disabled, meaning that the Alcotest software could appear to run correctly while executing wild branches or invalid code for a period of time. Other interrupts ignored are the Computer Operating Property (a watchdog timer), and the Software Interrupt.

Basically, the system was designed to return some sort of result regardless.

This is important. As we become more and more dependent on software for evidentiary and other legal applications, we need to be able to carefully examine that software for accuracy, reliability, etc. Every government contract for breath alcohol detectors needs to include the requirement for public source code. “You can’t look at our code because we don’t want you to” simply isn’t good enough.

Posted on May 13, 2009 at 2:07 PM110 Comments

Comments

handsomedave May 13, 2009 2:56 PM

I hope the departments that bought these things sue the pants off the manufacturer. I’m not the type to encourage lawsuits but these idiots are obviously negligent.

IntelVet May 13, 2009 3:00 PM

Don’t voting machine manufacturers use the same mantra, You can’t look at our code because we don’t want you to”?

aikimark May 13, 2009 3:03 PM

Another victory for transparency.

Everyone should remember this when their federa/state/local government considers the use of electronic voting equipment.

nemryn May 13, 2009 3:07 PM

Wouldn’t that system give the last reading the most weight, not the first?

Scared May 13, 2009 3:08 PM

“Readings are Not Averaged Correctly: When the software takes a series of readings, it first averages the first two readings. Then, it averages the third reading with the average just computed. Then the fourth reading is averaged with the new average, and so on. There is no comment or note detailing a reason for this calculation, which would cause the first reading to have more weight than successive readings.”

Isn’t it the last reading that has the most weight in this scheme? Basically a simple IIR low-pass filter, rather than an average of all readings.

I could see a use for that kind of filtering, however crude it is. One would imagine that if you keep reading new values, it’s because you’re not very interested in the first ones.

Eric in PDX May 13, 2009 3:12 PM

I wonder if this approach would work with all those speed cameras too? Maybe it was an extra hot day and the software “though” it was a red light?

Don Marti May 13, 2009 3:15 PM

The two alternatives for manufacturers are going to be either (1) publish the code yourself, or (2) the only information about your product’s software online will be a bunch of testimony from nit-picking defense experts. Somebody is going to start using “published code, already reviewed by” and a list of experts as a selling point long before the states get around to requiring it.

Glenn Maynard May 13, 2009 3:19 PM

which would cause the first reading to have more weight than successive readings

… the last reading, rather, not that it makes a difference. The first reading would have the least weight.

I hope the departments that bought these things sue the pants off the manufacturer.

I’m sure people convicted of drunk driving based on this device will appeal, and I’m sure others convicted based on similar devices will, too.

I’m not wishing more drunks on the road, but if this causes mandatory code audits of this type of device, then all the better.

Base One, however, did an extensive evaluation, finding 19,400 potential errors in the code.

I’ll never understand why people inflate their numbers so grossly that they’re plainly ludicrous. It destroys credibility entirely. I wouldn’t be surprised if this number is greater than the number of compiled instructions.

nick May 13, 2009 3:24 PM

The first two sure seem like bugs which would make this less accurate. I don’t want to say the loss of accuracy is enough to substantially change the results without actually testing the thing (you buy the drinks 😉 , but it seems plausible.

The last one isn’t necessarily a bug. If the company can demonstrate that those interrupts would not be used if they were enabled, there’s no problem here.

sam May 13, 2009 3:38 PM

I feel conflicted about this report. It seems there are some serious errors in the source code, but the report itself blatantly overhypes stuff that isn’t even a problem, such as the fact that the processor used was “1970s technology”. Guess what, a lot of the satellites in the sky use “1970s technology”, because it’s proven to be rock solid.

I suppose some bias is to be expected, since the study was commissioned by the defense, but giving a judge/jury bad information for the “right” reason can bite you later, when they use some malformed idea you fed them to make some unwise decisions of their own. Think: “our satellites are using outdated 1970s technology! All new satellites must use Windows Vista!”

Deskin May 13, 2009 3:40 PM

I’m with @Scared, #2 looks like an exponentially weighted moving average, where the last item has the most weight.

Seth Breidbart May 13, 2009 3:40 PM

The last reading has the highest weight in the last value shown; but it says to average all the values shown, which gives the first reading the highest weight.

Seth Breidbart May 13, 2009 3:42 PM

The last reading has the highest weight in the last value shown; but it says to average all the values shown, which gives the first reading the highest weight.

FP May 13, 2009 3:50 PM

Playing devil’s advocate: Wouldn’t it be sufficient for a manufacturer to demonstrate to an independent authority that results were sufficiently accurate across the entire range? I.e., if you set up a testing rig that produces a set of reference inputs and then check that the device produces the expected results?

Just like a drug manufacturer does not have to open-source their drugs but only has to demonstrate to the FDA that the drug is effective across a somewhat representive cross-section of humanity.

Lazlo May 13, 2009 3:51 PM

“Other interrupts ignored are the Computer Operating Property (a watchdog timer), and the Software Interrupt.”

Anyone else amused at the acronym there? The breathalizer is ignoring the COP, as opposed to the other way around…

alvin May 13, 2009 4:08 PM

With this “averaging” method described here used by the software, it would be the successive readings that would have more weight instead of the first reading.

Joel May 13, 2009 4:17 PM

As far as the LINT report showing 19K errors, I defy anyone to not have nearly that number on any sizable code base. LINT is notorious for providing false positives, particularly when unfiltered. At best it gives you an idea of where to look for errors.

Tangerine Blue May 13, 2009 4:41 PM

@Don Marti

Somebody is going to start using “published code,
already reviewed by” and a list of experts as a
selling point long before the states get around to
requiring it.

If the customers (states) don’t care about (require) it, what would induce the vendor to change?

peri May 13, 2009 4:44 PM

@Glenn Maynard: I wouldn’t be surprised if this number is greater than the number of compiled instructions.

Consider this detector program listing:

noop

I am certain I packed more than one problem into a program with only one instruction.

Bryan Feir May 13, 2009 4:50 PM

@Joel:

Agreed on that. LINT is a fairly simple code analysis tool, which has limits on what it can and cannot flag, and tends to err heavily on the side of caution. Not to mention that it was written back before most compilers would produce detailed warning outputs.

Granted, I’ve seen bad outputs from compilers as well. One compiler I’ve worked with would consistently report that an array had been used before being initialized, even though I could prove all elements in the array had been initialized before they were used. As far as I could tell, unless I initialized the entire array at once when it was defined, it would throw the error, because it treated the array as a separate item from the sum of its elements.

Initializing the array on definition removed the warning… and promptly caused a noticeable slowdown in the code because of the brain-dead method the compiler used to clear the array.

Steve May 13, 2009 5:01 PM

Old tech has its virtues. Back in the Z-80 era, I read about an engineer who designed a railway (the San Francisco BART, IIRC) control system around the Intel 8008. Using the 8085 or the (then) latest-and-greatest 8086 would have required less code and less hardware, but the designer was confident that he fully understood the 8008, bugs and all, so there would be no loud surprises involving trains full of people.

Jason May 13, 2009 5:06 PM

“This means that only 8 values are possible for the IR detection, and this is compared against the 16 values of the fuel cell.”

I would have thought that some patrolman somewhere, who uses this thing daily, might have noticed that “hey, the readout is always one of only 8 different values”, and started to ask questions. But maybe cops that have actual powers of observation end up being promoted to detective. At least, I hope so.

xls May 13, 2009 5:15 PM

For those interested, here’s what the “buckets” of analog to digital (might) look like given error 3 in the article. Buckets are a range of real values that would be reported as a single output reading after D/A processing.

Assuming the machine was sensitive up to .5% BAC (just above LD50 for human adults)

4095 discrete D/A conversions = 0.0001221% BAC per bucket

4095/255 ~= 16 discrete D/A conversions = 0.03125% BAC per bucket

4095/255/2 = 8 discrete D/A conversions per bucket = 0.0625% BAC per bucket

The specific problem here is illustrated by the second bucket in the 8 discrete measures scenario: that bucket spans a large conversion domain from 0.0625 (legal, most states) to 0.125 (illegal, most states).

In fact, the “legal” part of that bucket (.0625 up to .1) occupies 60% of the domain, and the “illegal” part (.1 up to 1.25) occupies only 40% of that domain.

If the device shows that a person is blowing a BAC in that bucket, there’s a 60% chance that they’re actually under the legal limit (assuming no other source of error).

Anonymous May 13, 2009 5:25 PM

I just searched for the manufactures homepage and found this wonderful piece:

Dräger Alcotest® 7110 Evidential

The Alcotest® 7110 MK III-C is proven evidential breath breath analyzer.
It is the only evidential breath tester on the market whose source code has been reviewed by independent third parties and approved by a Supreme Court decision.

http://www.draeger.com/US/en_US/products/alcohol_drug_detection/evidential/cdi_alcotest_7110_evidential.jsp

ROFL!

Roy May 13, 2009 5:30 PM

For six samples, the scalings have the denominator of 6!, with the numerators 0!, 1!, 2!, 3!, 4!, 5!

Reader May 13, 2009 5:47 PM

“12. Defects In Three Out Of Five Lines Of Code: A universal tool in the open-source community, called Lint, was used to analyze the source code written in C.”

lol, that’s just bad!

Peter E. Retep May 13, 2009 6:18 PM

Isn’t this whole dynamic of
[a] interfacing physical processes,
[b] for pre-packaged chemical processes,
[c] and adjunct software processes,
[d] each constructed from limited human knowledge,
[e] and limited human and physical capabilities,
[f] and limiting cost priorities,
[g] as interpreted by limited human and physical capabilities
[h] and mediated by limited human judgement
is sort of a micro-lab of issues of social regulation,
especially as the whole is
[i] regulated by chain of custody
[j] administatively determined priorities [such as r.o.i.
and [k] sworn adherences to presumed-to-be-effective-and-accurate protocols?
Did I leave out any other fountain of uncertainty?
Oh, yes.
[L] The problem of the willfulness of the operating agents!
Anything else?

M May 13, 2009 6:41 PM

@Tangerine:

Once convictions start getting overturned, I’m willing to bet customers (law enforcement agencies) will start demanding code-reviewed breath testers.

Therac-25 May 13, 2009 6:47 PM

“The Safety division’s current portfolio includes stationary and mobile gas detection systems, respiratory protection, firefighting equipment, professional diving gear, and alcohol and drug-testing instruments.

The Medical division’s product range covers anesthesia workstations, ventilation equipment for intensive and home care, emergency and mobile ventilation units, warming therapy equipment for infants, patient monitoring equipment, IT solutions and gas management systems.”

How long before customers for some of these other products also start wondering about code quality…?

PackagedBlue May 13, 2009 7:48 PM

A great article and a great case for the need for open systems. I wish this stuff would be on the front page on many newspapers and magazines.

Truth is very disturbing today.

Truth needs a peer review process, like good science, over superstition.

Reminds me of the 90’s with NSA and clipper chip. Sadly, the NSA still wants to play subverted traffic cop and fracture IT. GRR, I leave the details alone.

We need a vibrant community to handle the complexity of technology. This is not happening. I leave those details for others to write in article/comments.

killick May 13, 2009 9:51 PM

It only matters what code was compiled and installed on the machine at the time it was used to gather evidence. Where’s the chain of evidence that shows that THIS code in the courtroom really was the code running on the machine at the time in question?

This is not limited to breath analyzers– it could be anything that gathers evidence– red light and speed cameras, voting machines, etc.

Karl Lembke May 13, 2009 10:12 PM

I agree with commenters who say the averaging method gives a higher weight to the last data read, not the first.

In general, if you average together N numbers by this method, the first two numbers in the series have a weight of 1/2^(N-1); and each succeeding element has double the weight, until you reach the last element with a weight of 1/2.

If you read a person’s breath for long enough, he’ll sober up, and only the last few readings of zero BAC will have enough weight to affect the final result.

John D May 13, 2009 10:24 PM

In the 1950s I read a science fiction story where a person accused of a crime really had only one possible defense… to find a flaw in the assumed-to-be-correct judicial software that judged everyone. It was a scary thought then and it’s a scary thought now.
I can’t remember the name or author. It wasn’t the main theme of the story… just a side note. Anybody remember it?

Harry Johnston May 13, 2009 10:46 PM

For comparison, in New Zealand if you fail a breath test you are then given a blood test. Only the results of the blood test are admissible in court.

Charles May 13, 2009 11:42 PM

@John D, you might be thinking of Philip K Dick’s “The Minority Report”, although this features the future as seen by three mutated humans interpreted by a computer system.

If this isn’t the case then there’s a book out there I really need to read.

Clive Robinson May 14, 2009 1:37 AM

Oh dear people are arguing about the wrong thing here…

Sure the code as described by the defense appears bad (not seen the real thing so cannot say) but that is not the real issue.

The real issue is a political 455protector system dressed up as a safety measure, that has over time turned into revenue system both by fines and the claim from central resourses for the entire system (follow the money).

A court actualy has little or no interest in if you where above or below the limit they realy do not care or want to care.

What interests them is arguing about the pieces of paper infront of them.

They court does not have the ability to understand the technical or scientific data presented nor does it care, as I said it’s only interested in the argument and such things as “gravity” (in the legal sense).

Any physical evidence presented by prosecution or defense is “99 times out of 100” (only slightly exa 😉 of no merit just of argument.

There are a whole host of reasons from human failings down to the laws of time and physics.

For instance,

Evedentiary chains are just myths that lend some psudo scientific mumbo jumbo to the procedings.

The simple fact is that you have paperwork saying something has… what proof is there, only the word of a human who is reliant on the words of other humans and so on up the chain. At any point somebody could have made a mistake and just not seen it or cared to see it or for other reasons reported things in a maner that lends the incorect burden to the evidentiary item.

At the other end you have a “signal to noise” issue which is equivalent to seeing paterns in clouds or the static on a TV screen when it’s not tuned in.

Scientific tests under ideal conditions (noise less than signal level) have a sensitivity where they will pickup single molecules of chemicals and amplify the signal to the point it can be measured.

But you have to ask two questions,

What is the signal to noise prior to amplification?

Can the test reliably detect the difference between a signal and the noise at the ratio present when taking the measurment?

The majority of these tests have two charecteristics, the first is that they destroy what they are measuring, the second is the output of the amplification is effectivly hard limited prior to measurment.

For instance DNA the process of testing boils down to the following steps,

0, Obtain sample.
1, Transport sample to test.
2, Chop sample up into small pieces
3, Replicate the small pieces tens of millions of times.
4, Use an overly time, and energy sensitive test to produce a chart.
5, Compare the resulting chart with another chart.

The chain of evidence documentation says absolutly nothing about,

0, How the sample was actually obtained (not the standard procedure used or the quality controls used in the process).

1, How the sample was transported if it was subject to cross contamination or degredation by energy (light, temprature, preasure, sound, RF…) or time.

2, How the sample was purified before being irevocably being choped up along with anything else that was present (ie the noise).

3, How the amplification works with impure or damaged samples. Ie does it have a prefrence factor which would skew results in the presance of noise.

4, Again how linear is the test and what are it’s sensativities, how relaiable are control markers etc.

5, Now how is the comparison between the chart of data taken from a noisy environment (at scene) and the chart of data taken from a quiet environment (swab of suspect).

If you start asking these quite reasonable questions in a court the judge will not take kindly. They will turn around and say that you are wasting the courts time, as such matters are of “established fact”…

Oh and in the UK judges take a very dim view of defence council “beating up” on “Expert witnesses” as it “confuses the tribunal” (but not if it’s of “truth” or “law”)and the expert is technicaly a court officer…

So your expert witness gives “opinion” this is actually “hearsay” which would normaly be barred from being heard by the jury (tribunal of truth) by the judge (tribunal of law).

However it is alowed providing “in the courts view” the person is “of sufficient standing” to present “opinion” and sufficiently well versed as to not show prejudice in their presentation of “hearsay” as “facts”…

In the UK we currently have a number of cases under review due to Prof Roy Medows who to put it bluntly provided biased and unsubstantiated whims and bad science as “excepted practice in the field”…

He is not the only such expert witness (Walter Mitty type) to come under the spotlight in recent times.

At the end of the day the jury judge performance of the players and the judge assesses the quality of the presentation. Nobody actually judges the evidence…

moz May 14, 2009 1:52 AM

Used properly Lint is part of a whole process in which the you write in a limited C-like (C-subset really) language less likely to contain errors. Obviously if software hasn’t been written in “Lint-C” it will fail lots. However this doesn’t make the test invalid. The defence in a legal case has the job to show doubt. The answer to 19K ‘potential errors’ is to show a different software engineering methodology which rules most/all of them out as errors. In the absence of that methodology there’s no simple mechanical way to know which errors are errors and which are not. It’s the prosecution’s job to prove each one individually is not an error. Suggesting that a defendant should have to worry about that kind of difficult task is unfair. There’s nothing wrong with quoting this number since it shows the impossibility of the court being sure the defendant has been justifiably prosecuted.

Alex May 14, 2009 1:55 AM

Bruce, you appear a to be a bit behind with your reading. The Base One report is two years old….

bob May 14, 2009 6:44 AM

Nobody but me vomiting uncontrollably over the “new one is better because it uses Windows” part? Well, if it was Vista maybe it would be better because then it would hang up and refuse to give any answer at all for hours, then it would intentionally reduce the accuracy of the results because MS could not see where the person being tested had paid a DRM fee for the use of the device that day by which time you would be completely sober and could demand a blood test instead.

One of the times I was on a jury one of the other jurors kept saying over and over that the defendant must be guilty because if he was innocent he would have taken the stand and stated such. I kept pointing out the fact that the rule in the US is (was) a presumption of innocence and that the Bill of Rights (back when it was in effect) states explicitly that you do not have to testify, but he just held his position. I suspect that most people believe the same as he did, they just dont say so. Cant wait until people try to cross-examine a poorly designed piece of software which is damning them and have THAT get by these jurors. I imagine their position will be “the box said so so it must be true”.

Tom Welsh May 14, 2009 7:51 AM

‘Removing the error messages “now that the program is working” is like wearing a parachute on the ground, but taking it off once you’re in the air’.

  • Kernighan & Plauger [Software Tools]

sooth sayer May 14, 2009 8:58 AM

Maybe the programmers were drunk while coding?

There is a lot of crap out there in industrial products – I am not very surprised.

EscapedWestOfTheBigMuddy May 14, 2009 12:15 PM

There are a whole slew of manufacturers out there who produce embedded medical devices. Glucometers (blood sugar meters) would seem to be a good analogy to breathalizers. I would be very interested in knowing what kind of process these companies use for code production and review, and how it compares to that used by breathalizer makers.

[ rant ]
That said, I know (from device-to-device and device-to-blood-draw tests) that some OTC glucometers are well calibrated for high values and ill calibrated for low values. This appears to be a deliberate decision on the part of the manufacturer. Presumable they figure that most of their customers are Type II, and have more trouble with highs than lows. But they won’t admit to it, which leaves Type I’s with a false impression of what the meter will do for them…

Worse still, these widgets appear to have power processors and copious memory, so they could fix it in software if they cared…
[ /rant ]

Anna May 14, 2009 1:09 PM

@FP- Catastrophic Error Detection Is Disabled. The new machine could have been successfully tested under ideal conditions. Then, is is left in a hot or freezing car for years, it gets dropped a few dozen times and rained on a bit. Unless the machine is regularly tested under less than ideal conditions it’d be tough to accept the tests.

Juergen May 14, 2009 1:13 PM

Looks like normal sloppy coding for a non-safety critical embedded device.

So the problem is that there doesn’t seem to be a required quality standard for software that produces legal evidence, requiring for example

  • disclosure of the source code
  • a decent quality standard and software development process as required for safety-critical devices
  • an audit trail from the pre-average raw values to the final computed result
  • in this special case making sure that blood alcohol is either underestimated or no value is produced in case of problems

Rich Gibbs May 14, 2009 2:09 PM

@FP: “Just like a drug manufacturer does not have to open-source their drugs but only has to demonstrate to the FDA that the drug is effective across a somewhat representive cross-section of humanity.”

They don’t have to open-source the drug, in the sense of allowing other people to make it, if it’s patented. But if it’s patented, the formulation of the drug has to be disclosed in the patent. You (I hope) can’t patent table salt as a treatment for cancer, because no one would believe that it worked.

Similarly here, the manufacturer does not have to put the code under the GPL (let’s say) in order for it to be examined. They would still have the copyright and could license it under any terms that they can persuade people to agree to.

Aegeus May 14, 2009 2:30 PM

The loss of precision when the input is divided by 256 would only happen if they were using integer division, right? I find it hard to believe that the breathalyzer only outputs 8 different values.

Jeff Bell May 14, 2009 2:53 PM

What in the world is an Atari style chip?

Did they mean a 6502?

Maybe “Atari style” sounded more unreliable than an “Apple II style”?

Observer May 14, 2009 3:06 PM

“For instance DNA the process of testing boils down to the following steps”

Being someone who’s actually read some of the research behind modern DNA analysis and it’s use for forensics, your post is probably the most uneducated thing I’ve seen on the internet all day.

(Bruce’s original post is pretty bad too. Surely he should know better than to trust everything an pay-us-and-we’ll-say-anything source states, especially when even the claims he cite contain obvious errors?)

Mike May 14, 2009 3:20 PM

All I want for any of these things–breath alcohol testers, voting machines, ATMs, etc–is for them to be as carefully monitored and tested as slot machines. The code on those is actually audited by gaming authorities (Nevada is really good at this.) There’s tons and tons of regulations and periodic testing, and everything is audited.

Open code is probably even more important for matters of personal criminal activity or voting, but at the minimum if a credible agency had complete access to the code and could audit its operation with a valid testing matrix, that would be helpful.

khb May 14, 2009 4:05 PM

No doubt the “reason” for their averaging technique was not wanting to have a buffer for all the input values. However, they could have used a “kalman filter” style recursion. Using awkish notation
xbar=0.; sig2=0.; delta=0.
{
delta = $1 – xbar ; xbar = xbar+delta/NR;
if (NR > 3) { sig2=(NR-2)/(NR-1) * sig2 +delta*delta/NR}
}
This requires no memory buffer, probably requires a few more values to converge to the “true” estimates (that you’d get with a buffer) but I imagine they are capable of taking hundreds of samples in a few seconds rather than just using 5 or so

Andy May 14, 2009 4:11 PM

Draeger triggers a deja-vu. Dräger as we write it in German was a reference customer of my former employer.
Guess what they build (handycraft?) too? Yes, airpacks, scuba-dive devices.

I’m kind of scared…
~Andy

Gil May 14, 2009 4:52 PM

From reddit: This article is a piece of junk.

Most of these comments are quibbles.

1) The averaging algorithm sounds to me like a “running average” algorithm, used to construct a running average over time when you have an unknown number of incoming samples. Furthermore, mlw72z (see below) gave some rationalization to make the last sample more important than the first.

2) Results limited to small, discrete values: This is normal for sensors. A pregnancy test takes a large amount of input, the levels of various hormones, and tries to return a small, discrete value: Pregnant or Not.

So summarizing the output to small, discrete values is indeed the expected behavior of a sensor.

3) Removing the crash detector: many embedded systems remove these interrupts, as they can test every possible code branch, unlike larger and more generally unstable systems like windows PCs.

Carsten May 14, 2009 5:03 PM

The code-review has some translations from german to english which sounds like the good old Altavista bablefish.

Messablauf was translated as ‘measure expiration’, but this ablauf has nothing to do with expiration. It means the processing of the measurement. I’m not sure about the correct translastion, but ‘measurement processing’ should describe it good enough.

flug May 14, 2009 5:07 PM

“The specific problem here is illustrated by the second bucket in the 8 discrete measures scenario: that bucket spans a large conversion domain from 0.0625 (legal, most states) to 0.125 (illegal, most states).”

The interesting thing (it seems, based on the very limited info in Bruce’s summary) is the very large discrete steps it actually measures (#3), are being covered up by the smoothing/averaging scheme (#2).

Thus, even though the machine is actually measuring in large, discrete steps the display will show various values in between the discrete values because of the averaging scheme.

I would guess that if your actual alcohol level is somewhere between two of the discrete values, what happens is the actual value sort of pops back and forth between the two nearest values.

So you’re getting a stream of data something like this:

0.0625 0.125 0.125 0.125 0.0625 0.0625 0.125 0.125 0.125 0.0625 0.125 0.0625 0.125 0.125 0.0625 0.125 0.125

Then the averaging scheme kicks in and so the display seems to converge to a single number somewhere in between the two values.

This strikes me as not the most reliable way to determine the actual value for those ‘in between’ values. (However it might not be that bad–you’d have to do some controlled tests to see. It could easily have some systematic bias in certain ranges.)

Also one reason they likely came up with the averaging scheme is without the display would leap around in a very discontinuous way.

Jeff May 14, 2009 5:11 PM

It may be useful for the algorithm to give more weight to samples at the end. Presumably the most accurate samples will be from the air at the bottom of your lungs as you finish exhaling. Whether that weighting was by design or not… who knows?

ech May 14, 2009 5:19 PM

“There are a whole slew of manufacturers out there who produce embedded medical devices. Glucometers (blood sugar meters) would seem to be a good analogy to breathalizers. I would be very interested in knowing what kind of process these companies use for code production and review, and how it compares to that used by breathalizer makers.”

I haven’t worked in medical device software in quite a while, but the FDA has standards for software in medical devices. IIRC, they get to look at the source code. Here is one of their guidance documents:
http://www.fda.gov/cdrh/comp/guidance/938.html

Susan May 14, 2009 6:48 PM

I am a software engineer and, unfortunately, I see code like this all the time. Even in medical devices! Software engineers skip the most fundamental of practices: coding standards, code reviews, static analysis tools, etc. that would catch a majority of problems like these. It is time the software engineering field grew up!!!

Nemo May 14, 2009 6:51 PM

Someone please name one instance where this test was contradicted by another device or blood test?

Gavin May 14, 2009 6:53 PM

@John D The timeline is off, but the theme you describe is similar to “Little Brother” in Walter Mosely’s Futureland

Simmee64 May 14, 2009 8:03 PM

This just goes to prove the following: you should not use machines for voting and all software in things like breath testers should be transparent. I am not surprised that the code is a mess, I would be willing to bet that the code in Diebold’s (or what ever they are called now)voting machine is a mess as well.

booyah May 14, 2009 8:09 PM

Definitely need this type of investigation on voting machines. I don’t understand how voting machine software can be proprietary when we actually pay for them with our taxes. Not to mention the fact that the only math a voting machine should ever do is plus-1. There should be no subtraction, multiplication or division, ever!! Seriously, how hard or complex can $candidate++ actually be.

Brian Tung May 14, 2009 8:19 PM

@khb: Wouldn’t it be simpler to simply do something like the following?

for (count = 0; sample = getSample(); count++)
    avg = (avg*count+value(sample))/(count+1);

This computes a running average with equal weights without resorting to a Kalman filter.

@Gil: I’m assuming you’re just quoting reddit, because those are just weird defenses of the software. Imagine if your digital scale quantized its results into {weightless, non-weightless}.

@flug: The averaging scheme apparently used by the vendor would be subject to more “jumping around” than a constant-weighting averager, not less.

As to whether it could produce a valid result, given enough samples, it should be relatively easy to show that it wouldn’t do so in general unless the measurement error had some pretty unusual characteristics. It would, I think, give correct results in general if the measurement error were uniformly distributed over an interval that was a (non-zero) multiple of 0.0625, centered at 0, and the representative values were at the center at their respective ranges (e.g., the machine measures 0.0625 for true values between 0.03125 to 0.9375), but not otherwise. Of course, it might give correct results for isolated true values in any event, but that hardly justifies what they do.

@Jeff: I think the best you could say is that it wouldn’t affect the answer. Even if the physiology justified it, the odds that this particular exponential weighting is optimal are pretty low. It might make more sense, rather, to use constant weighting, but to ignore the first N values, where N is determined through testing. Better yet, the actual weighting scheme should be determined through testing.

Brian Tung May 14, 2009 8:27 PM

@booyah: I’m sure the fundamental operation of an electronic voting machine is pretty simple. Ostensibly, the complexity arises when you try to make the machine tamper-proof and self-checking. That’s not to defend the performance by Diebold et al., but I do think there’s more to it than candidate.incrementVote().

TJ May 14, 2009 9:01 PM

When you rely on a machine to decide whether to ruin someone’s life, you’ve already made an egregious error … whether or not it’s “accurate”.

How many people will swear that Lie Detectors (note authorative caps!) can actually detect lies accurately? Yet their “testimony” was relied on long before there was science to decide their legitimacy.

Anyway … law enforcement is a highly subjective field that’s hardly consonant with the use of machines to give primitive (but SCIENTIFIC!!) indicators about complex questions.

Steve May 14, 2009 9:03 PM

I’ve seen averaging (actually, more accurately, I suppose, weighting) schemes like the one discussed above in image processing applications, mostly for noise reduction in systems where you have a limited number of buffers and a limited dynamic range.

Yes, it gives more weight to the last reading than previous readings, so that the earliest readings eventually slide off the end after a number of iterations.

Bob S May 14, 2009 9:50 PM

I also fear that Scared may have had it right yesterday. As implemented, the averaging process will tend to make it feasible to keep taking new measurements until random variation among the measurements gives you a result along the lines of what you’re looking for.

Ordinary averaging would halve the measurement-error variance in your estimate of the mean (presumably true) breath alcohol level each time you double your number of measurements. In contrast, the averaging process as implemented would tend to cause the error in the estimate of breath alcohol level to approach (as the number of measurements approaches infinity) one-half that of an ordinary sample comprising just two measurements–or some value not very different from that.

NL May 14, 2009 9:52 PM

It is horrible that this device is used to ruin many lives, but is this really surprising? They just exposed the dirty-little-secret of most software development efforts to the general public. When it comes to creating software, it is usually more about making money than it is about quality of the product.

People really need to stop drinking and driving though.

Dex May 14, 2009 10:29 PM

I work with medical software. FDA isn’t really interested in the source code. They are interested in the company’s internal quality protocol with documentation that shows that software has been planned, documented, and tested.

In the end, it doesn’t tell much about the actual quality of the code. Seeing a bunch of unit tests pass at 100% is certainly good enough for FDA. I’ve never seen an FDA auditor actually even peek at what the unit tests do.

There’s really no way they’d ever get to know about Lint warnings, missing watchdogs, or even bad averaging algorithms. Unless the functional tests fail in spectacular ways or enough customers make official complaints to warrant a deeper investigation.

SteveC May 14, 2009 11:02 PM

We don’t really need to know how the code works if the box can produce controlled and reproducible results. Being able to ‘calibrate’ the unit to reliably produce PASS/FAIL results is all you need.

If you are going to make a big issue of the accuracy and resolution of the system, you have to consider ALL the factors in determining the result – such as blood-lung permeability for alcohol transfers, the capabilities of the mechanics to accurately transfer the exhaled breath to the sensor, the accuracy of the sensor in various chemical environments (think garlic sausage burp!), the analogue to digital processing and the screen representation. Furthermore, this is a piece of field equipment and subject to wide range of temperature, humidity, vibration conditions as well as pocket lint, dirt, and all that sort of stuff. If you cannot control these elements with precise and consistent accuracy then there is no point arguing about fine detail. It’s a bit like complaining about the length of a meter of cloth because it was not measured with a micrometer, or measuring the vehicle battery voltage with a 4 1/2 digit multimeter, or complaining because your gas tank only shows 1/4 tank increments.
In Australia, as in other parts, a conviction is recorded on a blood alcohol test. The breath test is just to triage the likely suspects from those under the limit and so the test does not have to have laboratory accuracy.

Don’t get too hung up on the averaging either. The program would most likely have been written in assembler for a processor with 16 bit registers. The moving average solution that was applied would be a reasonably simple programming exercise without requiring additional programming gymnastics to calculate the average in a more conventional sense in order to produce a result that, ultimately, did not require that level of precision.

simon May 14, 2009 11:06 PM

I think two comments above raise the real issue; which is are the assumptions behind the math and science right.

Even if the signal is being computed correctly, regardless of interrupts/timers etc.., and there is a 60% (random) chance of being in illegal between .06-1% (remember these machines were developed when a DUI was a 1.25%), and the conviction value is 0.8, then something is clearly wrong, not just compiler errors, but in methodology.

The assumptions the software was built upon do not conform to the legal standard.

I’d rather the cop flip a coin flip. Odds are better. Maybe two-face had the right idea.

simon May 14, 2009 11:10 PM

edit: corrections for readability..

I think two comments above raise the real issues; the first is whether the assumptions behind the math and science right, assuming the code is perfect:

If the signal is being computed correctly, regardless of interrupts/timers etc…, there is a 60% (random) chance of being between .06-1% (remember these machines were developed when a DUI was a 1.25%), and having that output as a 1.0. The current DUI value is 0.8. Something is clearly wrong, not just compiler errors, but in methodology when you have a 60% change of being guilty. This is the false positive rate. These are measured by calibration protocols, which it would be interesting
to see what they calibrate too, and if the machine is aware of the calibration (special routines).

The assumptions the software was built upon do not conform to the legal standard.

I’d rather the cop flip a coin. Odds are better. Maybe two-face had the right idea.

Clive Robinson May 14, 2009 11:11 PM

The odd “average” may be due to trying to get around a hardware sensor problem (no I’m not defending just saying insufficient info to judge fairly).

Oh and what is an average any way, there appears to be more of them than your average girl about town has shoes…

If you assume that your breath is a little under 37C and outside at night in North EU is rarely above 15C and can be lower than -20C frequently. You have a very real issue of a lot of significant temprature change in three to six seconds or so which is a very awkward design issue at the best of times.

So if you blow across a “silicon” type detector you would expect to see a curve where the latest readings should be given more weight than previous.

Further to get cost size and weight down, as well as battery life up (and MTBF figures) you generaly try to minimise the component count.

So the “average” could not only be trying to track sensor offset but also be trying to do “DC” offset correction for changing battery voltage as well…

You would need the spec sheets on the sensors used as well as the actual circuit diagram to have a chance at making a valid judgment call.

With regards the actual sampling to get accurate results, have a look into “over sampling” and “dither”.

You would be surprised at just how many extra bits of valid data you can wring out of repeated samples and random (ish) sample points.

There for the “odd average” could actually be aproximating two or more different averages in one. A slow average (low pass) to get the DC or temprature offset and a medium speed average (band pass) to get the measurment resolution up and a high speed average (high pass) to detect interferance / readings not valid state (yes I’m guilty of this sort of sin in medical, and other related equipment. Diathermy and de-fib machines play hell with patient monitoring systems as they can dump joules of energy onto long sensor cables at unexpected times).

As anyone who has had to design RTOS or embeded systems based around analogue sensors in the late 80’s early 90’s will know first you wrote understandable top down code tested it block by block bolted it all to gether, then you squished it down to make it fit the available resources and (should have) then regression test to check it was still working to spec.

How ever depending on who you listen to and what technology area you work with hardware resources overall tend to double around every 18 months (individual items faster or slower).

However software development on embedded systems can take over three years if starting from scratch on a moderatly complex but fairly harmless item like a cordless phone, longer on more safety critical or higher reliability products.

So major “code re-use” is where it is at in embeded system design.

Unfortunatly (puts on old grumpy hat and effecting Grand Pa Simpson voice as befitting status 😉 the young whipersnapper code cutters of today driven by hop head no nothing crazed marketing types are not interested in product that does what you would hope as a customer but product that sells at good margin in their pay packet…

So software, like the old battered and nearly worn out kindergarden building / alphabet blocks, just keeps getting stack up in new and wildley different ways by young and eager hands until it falls over (or blows up on the launch pad / shortly after take off) or the owners of the young and eager hands lose interest and toddle off to go tourture the cat etc.

Remeber folks that the RTOS/Embedded code you write today be it right or wrong will still be around to cause you (and others) sleepless nights when you are a grey grizzeled old engineering manager being wheeled in and out of your box to various design meetings. And unfortunatly long after you can remember the why let alone the how of it, some thirty years after that night you wrote it and thought it “neat / cool”…

And as for all those tools currently loved by business managment and consultants such as “lean” “Six Sigma” etc that supposadly came from engineering and manufacturing roots do not realy resolve the fundemental problems involved just make them happen faster and encorage people to move on even quicker (best time CV wise to move is about 20% of the time into the project you can claim all sorts of things and get away with it)…

As has been noted “Old hardware goes the way of any bucket full of WEEE, but software like roaches just never goes awy no matter how many bugs you squash”.

Clive Robinson May 14, 2009 11:25 PM

@ Observer,

“Being someone who’s actually read some of the research behind modern DNA analysis and it’s use for forensics, your post is probably the most uneducated thing I’ve seen on the internet all day.”

When you where trolling around doing your reading on research, did you happen to check how much of it is actually being used today or five or ten years ago (which is still current in court terms)?

Oh and perhaps you would like to have a go at providing a brief ovreview of the process without jargon, and then identifying the attendent flaws with each stage?

It should make interesting reading.

Math is fun May 14, 2009 11:39 PM

@Aegeus, who wrote “I find it hard to believe that the breathalyzer only outputs 8 different values.”

I think that’s 8 discrete values BEFORE averaging. The assumption is that the sensor is producing non-identical values as the air flows by. Those are then averaged (with some weighting).

IANAL May 14, 2009 11:42 PM

I certainly don’t condone drunk driving but if I were defending people with DUI’s I would hate the code being released. I would rather have them keep it a secret and get my clients off because the accuracy of the deviced could not be verified. Sure, having the code released makes them look bad and is good for getting current convictions overturned but this just means that better code is going to be written and implemented in the future which is going to make breath tests more valid as evidence of drunk driving.

Anonymous May 15, 2009 12:12 AM

@

“We don’t really need to know how the code works if the box can produce controlled and reproducible results. Being able to ‘calibrate’ the unit to reliably produce PASS/FAIL results is all you need.”

There are a couple of issues here, the fisrst is PASS/FAIL from what has been said so far on this blog the reliability of this device is the equivalent of blindly tossing a coin. So it would appear that it can not “produce controlled and reproducible results” (nor that any tests have been done to verify either way).

The second issue is more important.

As you say,

“In Australia, as in other parts, a conviction is recorded on a blood alcohol test. The breath test is just to triage the likely suspects”

Unfortunatly this is changing, in some parts of the world the system is being moved away from “justice at any cost” to “conviction on the cheep”.

In the UK we have removed the option for trial by jury from many types of crime, removed the oportunity of many to get legal aid to defend themselves.

Oh and we have a model of if you pay now it’s X but if you argue it starts at 2X pluss costs for motoring offences recorded by automatic technology.

What has been shown with this is where people have the gumption to chalenge 80-90% get over turned.

Now I know you should not argue backwards about specifics from that result but I think it is an indicator that something is wrong with the system and it needs to be looked at.

Oh and one thing the brakes in your car may well use a microprocessor. What level of not working are you prepared to accept on them?

Once in a hundred million times you put your foot on, one in a million or how about one in ten thousand?

Currently most embeded software has an “unknown” fail rate nearer one in ten thousand than a hundred million…

This is often due to inter action between interupts and main flow and is very very difficult to test for as the window of oportunity could be just a few micro seconds in duration on events that might happen once or twice a day. A classic example is having two instructions swapped over such as disabaling an interupt and clearing a flag, that effectivly renders an input or process disabled untill some other event happens (key debounce being a not unknown issue in this regard or the runaway mouse seen in Windows XP and Vista due to driver software issues).

Now if you take the number of cars in Australia and the number of times a day a driver put’s their foot on the brake on avarage how many days or part days do you think it would take to get to one hundred million brake peddle uses?

And if the brakes do fail on a car what are the odds of it ruining the drivers life within society?

As a “drunk in charge” or whatever it is called in your juresdiction conviction can and often does ruin a drivers life due to loss of job / income standing in the community etc,

Would it be unfair to ask that the machine be atleast as reliable in operation as the foot brake of a car?

Bob S May 15, 2009 12:13 AM

As described, the averaging algorithm isn’t a moving average (although I guess you could call it an expanding average). Perhaps the description is mistaken. However, it calls the algorithm an incorrect calculation and goes on to describe what is, in fact, an incorrect calculation regardless of whether it was intended to be a moving average or a simple average that’s updated upon the arrival of new observations.

The idea voiced by some that an algorithm can be ok, even if it’s wrong, as long as it passes acceptance tests is mistaken. An important reason for making sure of the correct implemention of an algorithm is to ensure that the device being controlled performs correctly and predictably over its full range of inputs, not just in the necessarily restricted sample of conditions employed in acceptance testing. (Think encryption, for example.)

SimonTewbi May 15, 2009 1:12 AM

@Eric in PDX, RE: Speed Cameras: Maybe 10 or 12 years ago in New Zealand an electronic engineer or physicist (can’t remember the details) was given a speeding ticket. Rather than roll over and pay up he elected to fight it in court. He was able to show that with sunlight reflecting at just the right angle off the car the speed camera (or radar gun, can’t remember) would give a significantly incorrect reading. The police dropped the case before the judge could rule as they didn’t want a court ruling stating that their equipment was inaccurate.

Swa May 15, 2009 7:01 AM

For everyone saying that the last result has the most weight. NO. The first two are averaged. 50/50. That result is averaged. 66/33. That result is averaged. 75/25. That result is averaged. Each time the initial results have 50% of the upper average. So the initial results always have the plurality of the average used.

sengan May 15, 2009 8:18 AM

Later readings count more. For 3 readings you get:

0.5 * ( 0.5 * (a + b) + c) = 0.25 * a + 0.25 * b + 0.5 * c

Swa May 15, 2009 8:24 AM

You’re doing it wrong. They take an average. then the average of an average. Which is 50/50. 100/50, etc etc.

C May 15, 2009 9:54 AM

Oh how nice, always return a value. Doesn’t matter how it does it, just that it does.

I’m all for getting drunks off the road. But if you have a system that could potentially let off the drunkards but convict the innocent then there is a huge problem.

For that reason I hope the manuefacturers are put out of business and charged. This level of negligence is unnacceptable. Hell, look at all the years they’ve had to make this product better and yet here, according to Base One, the code is still absolute garbage.

Its been said by many of you, I’ll say it again. Situations like this is why we need transparency; it could have been avoided from the start.

Bob S May 15, 2009 11:52 AM

As originally described, the average of the first i measurements, mean(i), is incorrectly calculated:

mean(i) = [measurement(i) + mean(i-1)] / 2

The calculation should be:

mean(i) = [measurement(i) + (i-1) * mean(i-1)] / i

In response to a previous comment, note that the correct calculation requires nothing more in the way of resources than the incorrect calculation.

averros May 15, 2009 4:12 PM

I was always wondering why driving while intoxicated is a crime when much more serious and dangerous condition called driving while stupid isn’t?

The Monster May 15, 2009 4:57 PM

“Someone please name one instance where this test was contradicted by another device or blood test?”

The law in states where breathalyzers are used doesn’t provide for another test to contradict the breathalyzer. It is declared to be accurate by fiat, much like radar guns are presumed to be accurate even when used in moving patrol cars subject to “cosine error” or various other problems.

Brian Tung May 15, 2009 5:01 PM

@averros: At the risk of painting with a broad brush, I’d say that driving a car, though it requires some modicum of intelligence, isn’t rocket science. It requires careful attention more than it requires abstract intelligence. It relies on the same sort of 3-D tracking that allowed our ancestors to spear prey and avoid being eaten. None of this is what we normally consider advanced skills.

Of course, if by “stupid” you mean “unwise to the point of driving while doing a New York Times crossword on the freeway,” then actually some of that is illegal. But not easily enforceable, though.

Roger May 15, 2009 6:01 PM

@The Monster:
That might be true in some states, but it certainly is not in mine. Here a BAC is not accepted as evidence at all. Rather, a reading over the legal limit is normally followed up by a blood test. The blood test can be refused but that is deemed as admitting the breathalyser reading in evidence. (Lawyers recommend you always take the blood test, unless you know you had a large drink — more than one “unit”– in the last half hour. If you had, your BAC is probably still rising and the short delay before the blood test will likely give an even worse reading.)

If you had a drink very soon before the test the alcohol in your mouth will also elevate the results, and so in this state you also have the right to ask to repeat the test after waiting 15 minutes. If you don’t want to wait, generally they will also allow to take it again after rinsing your mouth with water, and in my experience that works for removing mouth alcohol, but that one is at the officer’s discretion.

Jurisdictions without an obligatory blood test usually have a second test at a “control machine” at the station, which is simply another breathalyser that is more carefully maintained and calibrated than the field units. It probably is more accurate than the field units but suffers the same systemic problems i.e. alcohol is not the only thing it detects. In many jurisdictions without obligatory blood testing, it can still be requested by the accused and will normally be provided more or less automatically; even in jurisdictions where the right to request a blood test is not legislated, a refusal by the police to provide the test will impair their case.

John Q. Smith May 15, 2009 10:00 PM

The averaging technique is the well-known exponential average. Although personally, I think a weighting of 1/2 is too high.

The thing here is that if someone is going to be convicted on the sole evidence of a breathalyzer, then the breathalyzer has to be accurate “beyond reasonable doubt”. In fact, WELL beyond reasonable doubt, so there’s room for some imperfection elsewhere in the prosecution.

Either that, or the claim of intoxication has to be supported by additional, independent evidence.

bugsme May 15, 2009 10:47 PM

Assuming the intent is to compute the “mean” of the samples, all you need is an accumulator and a counter. Simply add each new sample to the accumulator and increment the counter. When the timer dings, divide the sum by the count and you’ve got the mean.

Since the device has a known sample rate, and you have to blow into the sample tube for a set time, you now know how big to make your accumulator, and how much precision your math must handle.

Bob S May 16, 2009 1:05 AM

Not to flog a dead horse, but I thought exponential averages were characterized by moving windows.

The problem of defining accuracy “beyond a reasonable doubt” is interesting. The straightforward method would seem to be to treat the problem as a one-sided hypothesis test in which (a) the null hypothesis is that measured breath alcohol content (BrAC) does not exceed the legal limit, and (b) the false-positive probability is small enough that it’s beyond reasonable to doubt that an observation large enough to cause rejection of the null would indicate an illegally high BrAC. If a false positive rate of 2.5% were judged sufficiently low, for a simple example, then you would need to obtain a measured BrAC level about two standard deviations of measurement error above the legal limit (assuming normally distributed errors) to support a principled, beyond-reasonable-doubt argument for conviction based on a high BrAC. The determination of measurement error would be tricky, perhaps expensive (because to do it right might possibly require a lot of empirical work), and might ultimately lead to disturbing conclusions about the reliability of the test.

Of course (to echo previous commenters), life would be better and simpler if people would just not drink and drive.

Max May 16, 2009 2:50 PM

If they had acceptance tests which were public – tests that specify product design rather than technical design – they would be able to show what their code does without showing the code, itself.

While that may not have prevented them from from having to show the code in a court of law, it at least would have saved them the embarrassment of having their software be so flawed when it was examined…

Dogfood May 16, 2009 4:16 PM

I always knew these breathalyzer tests were a bunch of crap. I got arrested for DUI because I got pulled over for a broken taillight, the cop smelled beer, and drunk tested me. I blew a 0.12 BAC and he hauled me in for the night. Guess what the blood test showed? 0.03 BAC. I had two pints of beer with food at a restaurant in 90 minutes, so there’s no way I should’ve registered anywhere near 0.12.

I beat the rap even though the cop took it to court and told the judge that my eyes were bloodshot, I smelled “strongly” of alcohol, and I was stumbling. After being found not-guilty, the judge chewed me out for 5 minutes and told me that I was lucky “this time”. 0.03 and they act like I’m on a 3-day binge, driving over lampposts and shit.

Screw them, that was 1991 and I’ve only had two tickets since!! I say the breathalyzer sucks.

Reality May 16, 2009 4:31 PM

?I hope the departments that bought these things sue the pants off the manufacturer.”

Are you kidding? They are probably giving a per-arrest kick-back to Draeger. Every lawyer knows that if a cop needs or wants to make a DUI arrest, the suspect (aka victim) will blow over the limit. It makes no difference how much he/she has actually had to drink or the level of actual impairment. And everyone fails the sobriety tests, even sober people. DUI is a major cash cow for police and county governments. They’re going to stack the deck in their favor. They know that a good lawyer can beat a bad DUI rap, but most people don’t have the $10K in legal fees needed to do so.

Engineer in Training May 16, 2009 9:42 PM

I’m a 3rd year Computer Engineer, and I’m calling them out on being just plain lazy. The average is computed exponentially because they didn’t want to include libraries for floating-points or for 32-bit integers. Add up more than 16 12-bit integers, and you have to worry about overflowing a 16-bit int. I ran into the same problem in lab. And I tried the same solution. And my professor rightly bitched me out, because I was taking the lazy way out, and losing precision in the process. THIS DOESN’T FLY IN UNDERGRAD UNIVERSITY CLASSES.

Same thing with the interrupts. If you write code in C and use an assembler, sometimes it’s going to “optimize” something critical out, or overflow a buffer, with the end result being the processor gets lost, and tries to interpret data as code. This causes an illegal op-code interrupt, which is basically a red flag that tells you that something went wrong, and you probably shouldn’t trust your data, because the processor has been following garbage instead of instructions. When you get these interrupts in the debugging/testing phase of development, it means that you really ought to go back, figure out what’s going wrong, and correct the bugs. Of course, this is difficult, as it means you have to wade through hundreds to thousands of lines of Assembly code, which is second only to machine code in terms of unreadability. So, if you’re lazy, or pressed for time, you just ignore the warning signs, and put the processor back on track SOMEWHERE around where it left off. You’ve probably overwritten some of your data with nonsense at this point, but you hope that it wasn’t important. If you’re, say, trying to get something working long enough to get a passing grade in a class, this can work. If you’re trying to create a tool that will be used by law enforcement officers to determine whether someone has been recklessly endangering lives, GO BACK AND FIX IT.

Forever Me May 16, 2009 10:48 PM

I’d like to see any document where the claim was made that operation was ‘perfect’. Nothing is perfect, certainly not a product. There are just levels of reliability and repeatability. There are also likely environmental variables such as whether a person may have properly used mouthwash containing alcohol during previous hours.

markm May 16, 2009 10:51 PM

1) In embedded control software, this
is an easy way to balance over-reacting
to input changes that might just be
glitches versus reacting too slowly to
real changes as one waits for the
average to change. A very simple
calculation weights the readings 1/2,
1/4, 1/8, … It only cuts the impact
of noise in the readings by 1/2, but
it gets most of a consistent change
into the average after two readings.
A straight unweighted running average
of the last two readings would work
slightly better, but would double the
memory requirements and add or
subtract steps.

But this is grossly mis-applied
in a breathalyzer; there’s no need for
a running average, instead it should be
adding up a complete run of data and
only then computing the average. No
way is that weighting scheme a
reasonable approximation to any
mathematically defensible weighting
of the average. In other words, they
copied a program meant for a quite
different purpose without thinking
about the real goal.

OTOH, given the backwards explanation
of the averaging scheme in the report,
can we trust the report (or the translation) to be usefully accurate?

2) If the report is accurate, this feature of the program is just plain ridiculous.

3) The report is at least partly nitpicking here. Not using COP may be a bad sign. Not using interrupts is a good thing, when you can get the job done without them – it makes the software more predictable and easier to test and debug.

Clive Robinson May 17, 2009 5:59 AM

@ Engineer in Training,

Although I agree with you that they could have put more effort in, all embeded systems engineering has resource constraints. However not all the constraints are immediatly obvious to ordinary software engineers and are very rarely taught at Uni or other formal educational establishment.

One such constraint is battery life, it is usually an excedingly complex thing to “trade off” with differing technologies in use. The usuall solution is to minimise CPU speed and time that parts of the circuit are powered up.

Even for CPUs with FPUs the use of FP is usually battery hungry and therfore best avoided when ever and where ever possable (and if you can get rid of int mul and div so much the better).

When you say,

“Add up more than 16 12-bit integers, and you have to worry about overflowing a 16-bit int.”

You are making a number of assumptions which sugest a lack of familiarity with consumer grade embeded design.

Most analogue inputs are based on an unsigned value read from an AD converter. This at some point MAY have to be converted to a signed number or not (and lose a bit of precision in the process).

Also invariably only a small part of the AD range is actually used due to avoiding the use of expensive high tolarance components (that’s what software is for ;).

Also the thing about analogue input from instrumentation is that it changes oh so slowly with respect to the CPU clock. Which opens up a number of possabilities.

A simple solution for the avarage and sign convertion is to use a delta average. Basicaly you either know or compute an artificial zero point and subtract this from the unsigned AD value the result will be a signed int usually of a quite small value. These values can usually be averaged to some considerable extent without any loss of precision or risk of overflow.

The simplest and fastest way is to use a “running window average”. Where you have three values, new, old, window. For each reading you subtract the old value and add the new value to the window value. The distance between the old value and the new value in the number of readings defines the size of the window and hence the precision of the data held in the window. If this size is chosen to be a power of two then normalising back is simply a case of copying the window value and shifting it down, then add on the artificial zero value if required (which offten it is not).

So no “muls” “divs” or pointers etc required.

You can use a similar trick when doing DSP with bi-quad filters where you want to have a significantly small cut off frequency with respect to the sampling frequency.

The usuall MAD constants will be very very close to an integer value such as 2 therfore instead of using the value use 2-value and shift the values appropriatly. You can get close to effectivly doubling the size of the number of bits available to you (ie a 16bit DSP producing results similar to a 30bit DSP).

Oh and if you intend going down the embeded route learn the ins and outs of 1’s and 2’s complement and work out how you can use either as sometimes you will save yourself a lot of time and CPU cycles which can be of better use else where.

Oh and with regards the CPU losing it’s way. For this application which is neither a one shot or safety critical the best solution is to exit with an error number on the display forcing the operator to turn the unit off and on again forcing a compleate re-initialisation of the device.

And if you are going to develop an embeded system first split the job into two parts the first is a BIOS or OS the second the actuall functionality.

When developing your BIOS/OS make it as close to *nix or Possix as you can that way you ease not only your development tool chain issues, you also make it easier to use existing code from other places. Just remember not to use the I/O “blocking” and threading methodology unless the CPU can support context switching (which most low cost CPU’s cannot due to stack issues). Remember to be multi-thredded does not require an MMU (multi process does) but it can make life a lot easier 8)

Oh and avoid “signals” they are oh so slow, develop IPC via ring buffers (or C-Lists if you have the RAM) use proper top and bottom handlers on interupts and prioratise acording to function Hard Real Time, High speed Async, etc.

Use the highest system tic you can and break I/O control into small blocks that will fully compleate in around 1/4 max of the tic interval. State machine design and buffers realy are your friends when you are trying to get past 50% utilisation of CPU resources.

Anonymous May 17, 2009 10:58 AM

This is the very definition of moving average:

“Readings are Not Averaged Correctly: When the software takes a series of readings, it first averages the first two readings. Then, it averages the third reading with the average just computed. Then the fourth reading is averaged with the new average, and so on. There is no comment or note detailing a reason for this calculation, which would cause the first (last?) reading to have more weight than successive readings. Nonetheless, the comments say that the values should be averaged, and they are not. “

Geoff Sherrington May 17, 2009 9:39 PM

The whole discussion, though important for standards, is academic. As the former owner of a large analytical chemistry laboratory, we used to work on a sound principle.

You do not take a measurement when the property is changing, then use it as a representation of a fixed condition.

Blood alcohol rises to a peak about half an hour after ingestion then decays away over 8 hours to a quite low value. Breath alcohol can be all over the place in the short term from fluid retained in the mouth.

A breath test taken in the rising period is likely to be quite unreliable. Also, the driver could be safely home and asleep in that time, a menace to nobody.

Even where the breath analyser is used for triage, one needs to examine the software of the following blood alcohol device. I would expect more programming errors in the latter, because it’s more complex.

Chris Williams May 18, 2009 12:43 PM

Those who have access to it could usefully check out this:
International Review of Law, Computers & Technology, Jul2004, Vol. 18 Issue 2,
It was a special issue on electronic evidence.

PS – sorry if someone’s mentioned it upthread – this comment is just a driveby from a guy in a hurry.

Mysticdog May 19, 2009 3:24 PM

It is also interesting to note that by overweighting the later values during “averaging”, it is going to tend to skew to the higher numbers. This is because the lungs keep a resevoir of air that isn’t exchanged much except when blowing out hard, and that air builds up more alcohol. So when the cops tell you to blow hard, your last readings will almost be higher than your first readings, and then the machine’s code skews the “average” to those higher readings.

EBT May 22, 2009 9:58 AM

I am not a programmer, but rather work with breath testing and have found the posts here most interesting.

I currently have a file on a guy that blew a 0.223 and a 0.227 (7 minutes apart) and an hour later a 0.014 blood. A programmer I know suggested that maybe the decimal place on the breath was shifted and should have read 0.022 not 0.22 on the breath samples. This would make sense with the blood results an hour later.

So my question to you folks is, are decimal shifts really possible as my friend has indicated? Is there anywhere I can learn about this?

(and for the record, the blood was analyzed at two separate labs with the identical results.)

ALAN CANDELETTI October 2, 2013 4:47 PM

Well, all this means is good people lost there right to drive, there jobs, maybe time in jail, and paid like $ 12,000 DOLLARS IN FINES,
just because software is bad.

t February 26, 2016 3:25 PM

Great to see that people are becoming aware of these machines. I am a victim and currently in litigation over the results of this machine. A jury of your peers is no longer relevant since these machines decide your fate. We need to purge the entire legal system as well as these machines if we want true justice.

A June 28, 2022 5:51 PM

I’m just here to say that 13 years later, this machine is still destroying innocent people’s lives.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.