Schneier on Security
A blog covering security and security technology.
« John Mueller on Nuclear Disarmament |
| The Iranian Firewall »
June 23, 2009
Eavesdropping on Dot-Matrix Printers by Listening to Them
First, we develop a novel feature design that borrows from commonly used techniques for feature extraction in speech recognition and music processing. These techniques are geared towards the human ear, which is limited to approx. 20 kHz and whose sensitivity is logarithmic in the frequency; for printers, our experiments show that most interesting features occur above 20 kHz, and a logarithmic scale cannot be assumed. Our feature design reflects these observations by employing a sub-band decomposition that places emphasis on the high frequencies, and spreading filter frequencies linearly over the frequency range. We further add suitable smoothing to make the recognition robust against measurement variations and environmental noise.
Second, we deal with the decay time and the induced blurring by resorting to a word-based approach instead of decoding individual letters. A word-based approach requires additional upfront effort such as an extended training phase as the dictionary grows larger, and it does not permit us to increase recognition rates by using, e.g., spell-checking. Recognition of words based on training the sound of individual letters (or pairs/triples of letters), however, is infeasible because the sound emitted by printers blurs so strongly over adjacent letters.
Third, we employ speech recognition techniques to increase the recognition rate: we use Hidden Markov Models (HMMs) that rely on the statistical frequency of sequences of words in text in order to rule out incorrect word combinations. The presence of strong blurring, however, requires to use at least 3-grams on the words of the dictionary to be effective, causing existing implementations for this task to fail because of memory exhaustion. To tame memory consumption, we implemented a delayed computation of the transition matrix that underlies HMMs, and in each step of the search procedure, we adaptively removed the words with only weakly matching features from the search space.
We built a prototypical implementation that can bootstrap the recognition routine from a database of featured words that have been trained using supervised learning. Afterwards, the prototype automatically recognizes text with recognition rates of up to 72 %.
Researchers have done lots of work on eavesdropping on remote devices. (One example.) And we know the various intelligence organizations of the world have been doing this sort of thing for decades.
Posted on June 23, 2009 at 6:16 AM
• 23 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Now, if there were enough dot-matrix printers printing eavesdrop-worthy things...
Indeed interesting, although nowadays there are few applications still using dot matrix printers (and contrary to the authors' suggestion, both my doctor and my bank use laser printers and while 72% accuracy might be excellent for decoding the highly redundant language of a prescription, it would be rubbish for decoding a bank statement.)
As Bruce observes, the methods used are similar to those in earlier published work on acoustic side channel attacks, and the technique is of fairly general application. Naturally then one wonders what else it can be extended to? A few ideas:
* Listening to the pins lift when a mechanical key is inserted into a lock
* Key presses on access control keypads (probably much harder to analyse than keypresses from full keyboards, as the keys are quieter and the area they are placed is usually noisier)
* If a bank, or a VPN administrator provides an off-line token for PIN entry to create a single use password for on-line banking or VPN access, then even though a compromised PC has no electronic access to the token, it might still be able to obtain the PIN by turning on its microphone.
Very interesting research indeed, for an academic course this year I had to write a review on Acoustic based side channel attacks. As it turned out this research is not very common but there has been some prior work on it before:
and was inspired by the work of Shamir:
What the first link did propose, was that if the printer had a constrained printing set i.e. was only printing similar documents. Then this could reduce the dataset enough to get more reliable results. Their motivating example was that of paper receipts from voting machines.
ISTM that there's a reasonable chance that this technology could be extended to inkjet printers...
Back in '82 I used a Printronix dot matrix LINE printer (vice character printer). It was practically musical, so I suspected there might be an "audio-TEMPEST" hazard. That aside it was a very good printer.
Yes, the rhythmic tap tap tapping of stone age tools on tablets....
"Key presses on access control keypads"
I'm sure this has already been done. But rather than use the sound of the keys (totally wouldn't work for digital beeps) you could use the timing between presses.
For example assuming the person is using a single finger, the average 1-9 delay s probably much bigger than the 1-4 delay. And 1-4 (going down) is probably very slightly shorter than 4-1.
In fact, I shall do a quick search...
Well I couldn't find anything on the web of knowledge, but I did find another interesting paper by these people about reading screens from their reflections: http://ieeexplore.ieee.org/xpls/abs_all.jsp?...
And there's another paper there that says automatic patch-based exploits are possible! This seems implausible..
It's interesting to note that the frequency band of interest is above 20KHz.
This makes it less likley that a phone would be used as the "tempest" transfer channel.
Also the "laser bugs" are not going to work any where near as well without apropriate modifications.
I should think this technique would apply equally as well to selectric/golfball type typewriters as well but using different parameters.
Also chart and other plotters.
Now a thought occurs what about hard disk drives. You would not be able to get the data or which head but you would be able to work out which cylinder and possibly sector within it. Thereby doing a sort of "traffic flow" analysis of the users activities.
"* Listening to the pins lift when a mechanical key is inserted into a lock"
You have just given me a nifty idea of a new way to profile a locks pins.
If you assume that the pins are all of the same material and of the same diameter each one will have it's own resonant frequency based on it's length.
If you apply a sound source and sweep the frequency up slowly you will be able to find the pin lengths by measuring the energy transfer into the pin.
I believe it was Jeremy Clarkson who pointed out that if people are researching this sort of thing, does that mean they've finished curing cancer/solving world peace?
I guess where there is money to be made - someone will start exploiting!
It's interesting from an academic / intelligence point of view.
The "fear" that medical records aren't private because of this though is their own movie scenario.
Did make me come up with an idea of for the keypad locks though -- I wonder if you could put distinct, clear powders or chemicals on the keypad. Then after the target punches in the combination use tape to "lift" the prints and reconstruct the order the keys were pressed (i.e. the 2nd key pressed will have traces from the 1st, the 3rd will have traces from 1 & 2). Does require being a bit more obvious then sonic analysis though.
In the late '70s early '80s while using dot-matrix terminals, my brain built the feedback circuit well enough that I could tell that I made a typo by the sound of the typehead. While printing out, it ran too fast to decode (and I don't think I ever really tried).
This was long enough ago, that it was easy to steal the password of whomever was working directly on the system. There were little blinky lights* for everything, including the last character typed on the directly attached console.
*Incandescent blinking lights.
Recognizing sounds from printers is not new. In the late 50s on an IBM 704 at MIT the operators used sound extensively. There was a loud speaker attached to one of the bits in the accumulator and they listened to the sounds that the printer that logged job progress made. The systems people were not allowed to change the contents of the messages on the printer because the operators only listened, never read. From the loud speaker that could hear programs that were stuck in a loop.
When I was in university the Experimental Med department had a PDP-8 controlling a panel of lights in a cage with two chimps. There was also an array of buttons. The idea was to try to train the chimps to answer questions (formed by patterns of lights) by pressing the correct button. If right, they were rewarded by a machine which dispensed a slice of banana.The most interesting part of that experiment was that every time the dot matrix printer printed "Error: banana dispenser empty" the chimps refused to continue; they could distinguish that pattern of noise from the other status messages.
@Matt from CT:
> ... an idea of for the keypad locks though -- I wonder if you could put distinct, clear powders or chemicals on the keypad.
Already been done -- even been in the movies! It is one of numerous keypad attacks that can defeated by a scrambling keypad (alas, they cost around 20 times as much as a regular keypad, and are rarely seen.)
> Their motivating example was that of paper receipts from voting machines.
I haven't seen a US voting machine, but I kind of assumed they would use thermal printers, because that is the norm for kiosk-type ticket dispensing applications. What do they use?
And on another note, adding to the list of targets that may be vulnerable to acoustic analysis:
* Simplex or Codelock mechanical keypad locks.
* Bicycle / briefcase type combination locks, if recorded during scrambling (determine click frequency for each wheel, and count number of clicks to find off-set from scrambled position -- ok these are already very weak locks, but with a PDA with a microphone you can get the cracking time down to a couple of seconds even for a 6 wheel lock.)
* Rather more of a challenge: decode handwriting by the sound of the pen on the paper.
Ah, *finally* a compelling reason to upgrade from dot-matrix.
This research sponsored by x printer manufacturer...or could it be by one of those sound-proof box manufacturers. Do they still make those for printers? Time for a comeback?
I think this will work with only regular dot matrix printers, not line printers. Therefore I assume some of the statistics presented might not be accurate. For example most of the banks use line printers for statement printing. In my organization we use a line printer to print the salary statements.
However this is very interesting.
I can see this working when the printer makes one line of text per pass across the page. However, it seems it would be significantly harder if the printer is printing in "graphics mode" (where the line size is smaller or larger than a single swipe of the print head, so each pass prints only a horizontal swipe of each word) or in landscape mode. Handling either of those cases would be as hard or harder computationally and from an information-gathering perspective as "deblurring" individual letters (essentially, you need to be able to deblur not just individual letters, but portions of individual letters and piece them back into individual letters afterwards).
The applicability of this are what we used to call "line printers": printers which are able to output exactly one font which is the same spacing as the print head. I'm sure a lot of places still using dot matrix printers for rapid printouts also use the line printing facilities of those printers, so that's still a fairly large target.
At the same time, LPR functionality in things like laser and inkjet printers is strictly emulated. There is not a period of time where you could capture the audio from a laser or inkjet printer and say "that's the word 'attack'." Each pass of the print head, by design and even in the most efficient modes, spans multiple or partial lines.
So, again, to expand this to more common technologies, you are looking at the need for a major advance in the technology before this will be useful.
At the same time, I wonder how "real world applicable" such approaches would be in the first place. Different printers of the same model and manufacturer sound different to my ears, and my own printer sounds different at different times of the day (dot matrix printers tend to be more consistent there, though). I attribute this to manufacturing inconsistencies and environmental factors. In the high-frequency range these researchers are looking at, is there less variation?
Interesting - it was just last night that I thought that the printers for printing account statements at banks here in Germany are about the last examples of dot matrix printers I could think of.
Given that these are standardized units (usually made by Siemens-Nixdorf; at least, they've been in every bank I've been to in Germany so far), it should be possible for a determined attacker to carry out this attack in practice and "read" people's account statements.
in some countries it is required that banks and currency exchanges use dot matrix printers for forensic purposes - every printer is claimed to have an unique signature.
And a databases of sample outputs exists somewhere.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.