Bruce Schneier

 
 

Schneier on Security

A blog covering security and security technology.

« "The Family Guy" on Airport Security | Main | NSA Helps Microsoft with Windows Vista »

January 9, 2007

How to Recover Numbers from Blurred Images

Interesting.

Discussion here.

Posted on January 9, 2007 at 7:06 AM17 CommentsView Blog Reactions

To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.

Comments

Might work, very dependant on being able to recreate the original setup accurately.

Incidentally, mosaic-ing to conceal a person's face is weak if the camera or the person moves around - you can use the movement to resample the original image and so get a passable picture back.

Posted by: Martin Ingram at January 9, 2007 7:42 AM


Wow, people really are stupid.
Can't people learn and just black out the numbers?

Posted by: Anonymous Coward at January 9, 2007 8:25 AM


Another problem with blurring images (particularly guassian blur as opposed to the blocked blur) is deconvolution algorithms. These are used primarily in microscopy and astronomy to recover detail from out of focus images. However, the algorithms work really well for recovering images from a guassian blur too. Here's an example page of different algorithms:

http://www.bialith.com/Research/BARclockblur.htm

And here's an example using blurred text:

http://www.maxent.co.uk/bayes.htm

Google "image deconvolution" for many more examples and some free code.

Posted by: Dave Wagner at January 9, 2007 8:40 AM


Of course, even blacking out the text does nothing if you don't flatten the image (thus avoiding having a black layer over the original layer which can be extracted).

Posted by: Rod at January 9, 2007 9:16 AM


Two comments spring to mind:

(1) Casting this as an optimization problem in a vector space is overkill. The problem factorizes into the components of the vector -- the individual digits of the numeric code -- since the overlap of the blurred images of the digits is minimal. So basically, instead of a search in 12 dimensions (for a 12-digit credit card number) we have 12 one-dimensional searches, a reduction in problem size from 1.0E12 to 120 (ignoring the fact that the space of credit-card numbers is more limited due to stereotyped digit sequences).

(2) The choice of distance metric -- basically the rms distance, AKA the Euclidean metric -- is arbitrary, and quite possibly sub-optimal. The normalization of the vectors that are sent into the Euclidean metric results in coupling between the vector components that we know should not be present, from (1). In fact, the factorization property of (1) suggests that a more suitable metric would be the max over all digits of the absolute distance of the model obscured digit from the corresponding datum.

Posted by: Carlo Graziani at January 9, 2007 9:17 AM


@Rod: The most common web image file formats (say, GIF, JPG) do not support multiple layers, so even if your image editing program allow you to introduce multiple layers when editing such an image, they will automatically be "flattened" (= lost) when saving the image. I am not 100% sure about PNG as it has many not-so-commonly-known features (e.g., animations that are possible somehow).

There were some amusing stories with Word .doc and .pdf, though ;-)

Posted by: Paeniteo at January 9, 2007 9:36 AM


> The solution is simple: Don't blur your
> images! Instead, just color over them

Adobe Acrobat has a great feature for just this sort of redaction.

Posted by: Adobe Abode at January 9, 2007 12:27 PM


Excellent post. This is a good application for distributed/grid computing. Security agency computing grids will be a-humming.

Posted by: -ac- at January 9, 2007 12:50 PM


The article and most of the comments focus only on how numbers might be recovered by a machine but neglect the simple fact that the brain is really good at this kind of stuff.

I scanned a check I got recently and blurred out the sensitive information as described. If you look closely at it it's just a bunch of gray blocks but if you take a step back and relax your eyes a bit you can make out the numbers well enough. The most amusing example is the word "TEXAS" on the check because it becomes completely discernible at a meter or so.

Posted by: Andrew Jorgensen at January 9, 2007 12:50 PM


Oh crumbs forgot. Sorry for the double post.
For the facial de-blurring, take a second mosaic of the first to narrow down possible sources. It's like taking a derivative of a function. But you might get enough to narrow the field.

Posted by: -ac- at January 9, 2007 12:54 PM


"Adobe Acrobat has a great feature [color over] for just this sort of redaction."

LOL! This usually results in a leak of information people would like to keep confidential. The first thing to try is to use the text select tool and select all text. More often than not the text you've carefully colored out is revealed.

Repeat after me -- color over is not redaction. If you want to remove information from a file you actually have to *remove* information.

Posted by: ruidh at January 9, 2007 2:58 PM


In most image-processing programs, erasing an area (to the currently-selected background color) is at least as easy as filling a rectangle with a chosen color. And, you rarely have problems with multiple layers being created.

I don't even understand why some of the suggested techniques (e.g. first distort the numbers, then pixelize them) would be considered - they take far more work than simply erasing a rectangle.

Posted by: X the Unknown at January 9, 2007 4:00 PM


My favorite practice of creating pseudo-blurred images:

1. Create a similar sequence using similar font.

2. Copy and paste this on "private" areas.

3. Merge document.

4. Select pseudo-private area

5. Apply blur filter.

6. Complete edits and final merge.

7. Resize image.

8. Export image.

This is best for what you want your readers to view something other than blacked out data. It gives you that real-like feel and is virtually non-reversible regardless of this attack because even if you do find the corresponding sequence it is a dead end.

Only let them know what you want them to know.

Summary: Pseudo-Blur FTW!

Israel Torres

Posted by: Israel Torres at January 9, 2007 4:59 PM


Blurring the information you want to hide is a less visually obtrusive effect than a solid block. People do it that way because it looks better.

Next time perhaps I'll replace the digits with random garbage and *then* blur over them; that'll look better and not leak information.

Posted by: David Dyer-Bennet at January 9, 2007 7:20 PM


I'll note that even blacking out in photo-editing software often doesn't work. People who don't know how to use the software (i.e. almost everyone) can overlay with not-quite-perfect opacity. Playing with levels and other controls can reveal the data that's supposed to be hidden.

I don't do this enough that I have found anything interesting, but I have exploited this and found data behind black boxes in this manner. Just to prove people are stupid.

Not as bad as the "black hiliting" in Acrobat or Word, but not great. Sorta like blacking out a word and giving over the original (P.S. my wife found out how much I paid for her xmas gift because I did this; I didn't think she'd try to figure it out, so didn't photocopy the receipt).

Posted by: steven hoober at January 10, 2007 10:43 AM


Still too much work, just replace the image with random blurring...if you find the black out objectionable...

Posted by: DBH at January 10, 2007 2:09 PM


Also, you might want to look at the "Hallucinating Faces" research, to see just how much facial detail you can infer from a very few degrees of freedom.

http://www.ri.cmu.edu/projects/project_536.html

Posted by: Neil at January 11, 2007 11:46 AM


Post a comment



Real names aren't required, but please give us something to call you. Conversations among several people called "Anonymous" get too confusing.



E-mail is optional and will not be displayed on the site.


Remember Me?


Powered by Movable Type. Photo at top by Steve Woit.

Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.

 
Bruce Schneier