## How to Recover Numbers from Blurred Images

Interesting.

Discussion here.

Might work, very dependant on being able to recreate the original setup accurately.

Incidentally, mosaic-ing to conceal a person’s face is weak if the camera or the person moves around – you can use the movement to resample the original image and so get a passable picture back.

Anonymous Coward January 9, 2007 8:25 AM

Wow, people really are stupid.
Can’t people learn and just black out the numbers?

Dave Wagner January 9, 2007 8:40 AM

Another problem with blurring images (particularly guassian blur as opposed to the blocked blur) is deconvolution algorithms. These are used primarily in microscopy and astronomy to recover detail from out of focus images. However, the algorithms work really well for recovering images from a guassian blur too. Here’s an example page of different algorithms:

http://www.bialith.com/Research/BARclockblur.htm

And here’s an example using blurred text:

http://www.maxent.co.uk/bayes.htm

Google “image deconvolution” for many more examples and some free code.

Of course, even blacking out the text does nothing if you don’t flatten the image (thus avoiding having a black layer over the original layer which can be extracted).

Carlo Graziani January 9, 2007 9:17 AM

(1) Casting this as an optimization problem in a vector space is overkill. The problem factorizes into the components of the vector — the individual digits of the numeric code — since the overlap of the blurred images of the digits is minimal. So basically, instead of a search in 12 dimensions (for a 12-digit credit card number) we have 12 one-dimensional searches, a reduction in problem size from 1.0E12 to 120 (ignoring the fact that the space of credit-card numbers is more limited due to stereotyped digit sequences).

(2) The choice of distance metric — basically the rms distance, AKA the Euclidean metric — is arbitrary, and quite possibly sub-optimal. The normalization of the vectors that are sent into the Euclidean metric results in coupling between the vector components that we know should not be present, from (1). In fact, the factorization property of (1) suggests that a more suitable metric would be the max over all digits of the absolute distance of the model obscured digit from the corresponding datum.

Paeniteo January 9, 2007 9:36 AM

@Rod: The most common web image file formats (say, GIF, JPG) do not support multiple layers, so even if your image editing program allow you to introduce multiple layers when editing such an image, they will automatically be “flattened” (= lost) when saving the image. I am not 100% sure about PNG as it has many not-so-commonly-known features (e.g., animations that are possible somehow).

There were some amusing stories with Word .doc and .pdf, though 😉

Adobe Abode January 9, 2007 12:27 PM

The solution is simple: Don’t blur your
images! Instead, just color over them

Adobe Acrobat has a great feature for just this sort of redaction.

Excellent post. This is a good application for distributed/grid computing. Security agency computing grids will be a-humming.

The article and most of the comments focus only on how numbers might be recovered by a machine but neglect the simple fact that the brain is really good at this kind of stuff.

I scanned a check I got recently and blurred out the sensitive information as described. If you look closely at it it’s just a bunch of gray blocks but if you take a step back and relax your eyes a bit you can make out the numbers well enough. The most amusing example is the word “TEXAS” on the check because it becomes completely discernible at a meter or so.

Oh crumbs forgot. Sorry for the double post.
For the facial de-blurring, take a second mosaic of the first to narrow down possible sources. It’s like taking a derivative of a function. But you might get enough to narrow the field.
</random thoughts>

ruidh January 9, 2007 2:58 PM

“Adobe Acrobat has a great feature [color over] for just this sort of redaction.”

LOL! This usually results in a leak of information people would like to keep confidential. The first thing to try is to use the text select tool and select all text. More often than not the text you’ve carefully colored out is revealed.

Repeat after me — color over is not redaction. If you want to remove information from a file you actually have to remove information.

X the Unknown January 9, 2007 4:00 PM

In most image-processing programs, erasing an area (to the currently-selected background color) is at least as easy as filling a rectangle with a chosen color. And, you rarely have problems with multiple layers being created.

I don’t even understand why some of the suggested techniques (e.g. first distort the numbers, then pixelize them) would be considered – they take far more work than simply erasing a rectangle.

My favorite practice of creating pseudo-blurred images:

1. Create a similar sequence using similar font.
2. Copy and paste this on “private” areas.
3. Merge document.
4. Select pseudo-private area
5. Apply blur filter.

6. Complete edits and final merge.

7. Resize image.

8. Export image.

This is best for what you want your readers to view something other than blacked out data. It gives you that real-like feel and is virtually non-reversible regardless of this attack because even if you do find the corresponding sequence it is a dead end.

Only let them know what you want them to know.

Summary: Pseudo-Blur FTW!

Israel Torres

David Dyer-Bennet January 9, 2007 7:20 PM

Blurring the information you want to hide is a less visually obtrusive effect than a solid block. People do it that way because it looks better.

Next time perhaps I’ll replace the digits with random garbage and then blur over them; that’ll look better and not leak information.

steven hoober January 10, 2007 10:43 AM

I’ll note that even blacking out in photo-editing software often doesn’t work. People who don’t know how to use the software (i.e. almost everyone) can overlay with not-quite-perfect opacity. Playing with levels and other controls can reveal the data that’s supposed to be hidden.

I don’t do this enough that I have found anything interesting, but I have exploited this and found data behind black boxes in this manner. Just to prove people are stupid.

Not as bad as the “black hiliting” in Acrobat or Word, but not great. Sorta like blacking out a word and giving over the original (P.S. my wife found out how much I paid for her xmas gift because I did this; I didn’t think she’d try to figure it out, so didn’t photocopy the receipt).

Still too much work, just replace the image with random blurring…if you find the black out objectionable…

PATRICIA March 27, 2010 7:38 PM

How can you read docoments that has been blacked out by black marker. Very important, contact me ASAP, mater of me keeping my son.

Clive Robinson March 28, 2010 12:42 PM

@ PATRICIA,

“How can you read docoments that has been blacked out by black marker.”

You have not mentioned if you have the document that has got black marker on it, or a photocopy of the document with black marker.

If the later you would need to get hold of the original document.

If you have the original and it was originaly printed out on a laser or was a photocopy prior to being “blacked out” with marker pen it may be as simple as tipping it through the light whilst wrapped around a cylinder, or holding it up in reverse against a light source.

Failing that look at “false colour” illumination with an appropriate light source and camera (sort of works the way the luma light and the orange glasses does in the TV “CSI whatever”).

If it is a photocopy of a blacked out document there is a very very small chance that there is a difference in contrast that might be readable but I would not count on it. Few office photocopiers are designed to copy “pictures” well thus have a limited contrast ratio.

Sidebar photo of Bruce Schneier by Joe MacInnis.