Schneier on Security
A blog covering security and security technology.
« "The Family Guy" on Airport Security |
| NSA Helps Microsoft with Windows Vista »
January 9, 2007
How to Recover Numbers from Blurred Images
Posted on January 9, 2007 at 7:06 AM
• 19 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Might work, very dependant on being able to recreate the original setup accurately.
Incidentally, mosaic-ing to conceal a person's face is weak if the camera or the person moves around - you can use the movement to resample the original image and so get a passable picture back.
Wow, people really are stupid.
Can't people learn and just black out the numbers?
Another problem with blurring images (particularly guassian blur as opposed to the blocked blur) is deconvolution algorithms. These are used primarily in microscopy and astronomy to recover detail from out of focus images. However, the algorithms work really well for recovering images from a guassian blur too. Here's an example page of different algorithms:
And here's an example using blurred text:
Google "image deconvolution" for many more examples and some free code.
Of course, even blacking out the text does nothing if you don't flatten the image (thus avoiding having a black layer over the original layer which can be extracted).
Two comments spring to mind:
(1) Casting this as an optimization problem in a vector space is overkill. The problem factorizes into the components of the vector -- the individual digits of the numeric code -- since the overlap of the blurred images of the digits is minimal. So basically, instead of a search in 12 dimensions (for a 12-digit credit card number) we have 12 one-dimensional searches, a reduction in problem size from 1.0E12 to 120 (ignoring the fact that the space of credit-card numbers is more limited due to stereotyped digit sequences).
(2) The choice of distance metric -- basically the rms distance, AKA the Euclidean metric -- is arbitrary, and quite possibly sub-optimal. The normalization of the vectors that are sent into the Euclidean metric results in coupling between the vector components that we know should not be present, from (1). In fact, the factorization property of (1) suggests that a more suitable metric would be the max over all digits of the absolute distance of the model obscured digit from the corresponding datum.
@Rod: The most common web image file formats (say, GIF, JPG) do not support multiple layers, so even if your image editing program allow you to introduce multiple layers when editing such an image, they will automatically be "flattened" (= lost) when saving the image. I am not 100% sure about PNG as it has many not-so-commonly-known features (e.g., animations that are possible somehow).
There were some amusing stories with Word .doc and .pdf, though ;-)
> The solution is simple: Don't blur your
> images! Instead, just color over them
Adobe Acrobat has a great feature for just this sort of redaction.
Excellent post. This is a good application for distributed/grid computing. Security agency computing grids will be a-humming.
The article and most of the comments focus only on how numbers might be recovered by a machine but neglect the simple fact that the brain is really good at this kind of stuff.
I scanned a check I got recently and blurred out the sensitive information as described. If you look closely at it it's just a bunch of gray blocks but if you take a step back and relax your eyes a bit you can make out the numbers well enough. The most amusing example is the word "TEXAS" on the check because it becomes completely discernible at a meter or so.
Oh crumbs forgot. Sorry for the double post.
For the facial de-blurring, take a second mosaic of the first to narrow down possible sources. It's like taking a derivative of a function. But you might get enough to narrow the field.
"Adobe Acrobat has a great feature [color over] for just this sort of redaction."
LOL! This usually results in a leak of information people would like to keep confidential. The first thing to try is to use the text select tool and select all text. More often than not the text you've carefully colored out is revealed.
Repeat after me -- color over is not redaction. If you want to remove information from a file you actually have to *remove* information.
In most image-processing programs, erasing an area (to the currently-selected background color) is at least as easy as filling a rectangle with a chosen color. And, you rarely have problems with multiple layers being created.
I don't even understand why some of the suggested techniques (e.g. first distort the numbers, then pixelize them) would be considered - they take far more work than simply erasing a rectangle.
My favorite practice of creating pseudo-blurred images:
1. Create a similar sequence using similar font.
2. Copy and paste this on "private" areas.
3. Merge document.
4. Select pseudo-private area
5. Apply blur filter.
6. Complete edits and final merge.
7. Resize image.
8. Export image.
This is best for what you want your readers to view something other than blacked out data. It gives you that real-like feel and is virtually non-reversible regardless of this attack because even if you do find the corresponding sequence it is a dead end.
Only let them know what you want them to know.
Summary: Pseudo-Blur FTW!
Blurring the information you want to hide is a less visually obtrusive effect than a solid block. People do it that way because it looks better.
Next time perhaps I'll replace the digits with random garbage and *then* blur over them; that'll look better and not leak information.
I'll note that even blacking out in photo-editing software often doesn't work. People who don't know how to use the software (i.e. almost everyone) can overlay with not-quite-perfect opacity. Playing with levels and other controls can reveal the data that's supposed to be hidden.
I don't do this enough that I have found anything interesting, but I have exploited this and found data behind black boxes in this manner. Just to prove people are stupid.
Not as bad as the "black hiliting" in Acrobat or Word, but not great. Sorta like blacking out a word and giving over the original (P.S. my wife found out how much I paid for her xmas gift because I did this; I didn't think she'd try to figure it out, so didn't photocopy the receipt).
Still too much work, just replace the image with random blurring...if you find the black out objectionable...
How can you read docoments that has been blacked out by black marker. Very important, contact me ASAP, mater of me keeping my son.
"How can you read docoments that has been blacked out by black marker."
You have not mentioned if you have the document that has got black marker on it, or a photocopy of the document with black marker.
If the later you would need to get hold of the original document.
If you have the original and it was originaly printed out on a laser or was a photocopy prior to being "blacked out" with marker pen it may be as simple as tipping it through the light whilst wrapped around a cylinder, or holding it up in reverse against a light source.
Failing that look at "false colour" illumination with an appropriate light source and camera (sort of works the way the luma light and the orange glasses does in the TV "CSI whatever").
If it is a photocopy of a blacked out document there is a very very small chance that there is a difference in contrast that might be readable but I would not count on it. Few office photocopiers are designed to copy "pictures" well thus have a limited contrast ratio.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.