Unredacting Pixelated Text

Experiments in unredacting text that has been pixelated.

Posted on May 22, 2024 at 7:03 AM • 13 Comments

Comments

Winter • May 22, 2024 7:33 AM

I assume that pixelation is chosen to give an impression of a text. That is, the fact that it is a string of characters with a given length.

So, the prudent way to do it is to first generate a random character string of the same length and then pixelate that string.

Or just replace it with Lorem ipsum.

Btw, the same approach might be successful with badly pixelated faces in video.

Depixelation seem to be a well studied art in Japanese Adult Video. Occasionally, I hear about people getting arrested for it.

echo • May 22, 2024 8:11 AM

Someone did some research for photos the police or media redacted and discovered that squinting your eyes would reveal an accurate enough impression of the face for the person to be recognisable. Experiment suggested the best minimum block size for pixilisation of faces and I’ve heard of no reports of anyone making much sense of this although I suspect there might be cases where identification of a person might be made especially if it’s from CCTV.

I suspect @Winter’s suggestions are the best approach for text. Static photos may get by with minimum block size. I don’t know about moving images if the intent is to maintain a level of aesthetic integrity with the whole image. Maybe introducing randomness would help.

Car number plates are often pixelated in the media especially of VIP’s. I wonder if this essay might cause a review of best practice.

The maths and science of it is beyond me but it’s interesting reading how cosmologists have squeezed signal out of noise or feint data. It’s also funny how so many of us were so caught up in the first blurry image of a black hole like young children staring in wonder at a colourful bug on a leaf.

Conan the deconvolutionarian • May 22, 2024 9:26 AM

So – that “enlarge, enhance” stuff in thriller movies actually works ?

https://youtu.be/t8tCS6cM7DI?si=-cvs676wlCY664t7

Conan the deconvolutionarian • May 22, 2024 9:57 AM

Being model based, the deconvolution is only hypothetical and is a fwiw opinion.

Morley • May 22, 2024 10:43 AM

I tried a de-blurring tool a while back. It worked on my screenshot program’s blur feature. Gotta actually remove the data!

Peter • May 22, 2024 12:28 PM

Just a matter of time until an ai model can read pixelated text just as well as captcha.

Reversing the past • May 22, 2024 1:06 PM

@ALL

This is not a new issue, just a new use.

Those who have been involved with communications and signal processing whilst not exactly eating this stuff for breakfast have been munching on it seriously since the end of the 1950’s and begining of the 1960’s when solid state electronics became small enough and fast enough to make it practically useful in real time.

Most people get to hear two or three things about communications,

Signal to noise ratio.
Noise is random.
Noise is Gaussian in characteristic.

Thus you get ‘Additive White Gaussian Noise'(AWGN) and an explanation such as

https://wirelesspi.com/additive-white-gaussian-noise-awgn/

The reality is AWGN is a simple model of reality that gets widely used and is for many things not a particularly good model.

Noise in reality is the ‘Root Mean Square'(RMS) of many signals.

N^2 = S1^2 + S2^2 + S3^2 + … Si^2

Where each signal S is in reality multipled by a ‘channel gain’ which is like an information carrying signal varying with time.

But also there is another fun thing to consider. Each signal consists of multiple time dispersed copies of it’s self. And gets called ‘Inter Symbol Interference'(ISI) which if used with care can improve the information capacity of a channel

https://www.tutorialspoint.com/digital_communication/digital_communication_pulse_shaping.htm

Knowing that noise is actually multiple sources of information not true randomness, tells you why you can ‘de-pixelate’ images that are either not random or the randomness can be modelled in some -usually- out of band way thus removed layer by layer.

If you think of pixelation as being a very low grade substitution cipher with at best poor chaining, as opposed to a One Time Pad. You can also see why it can be removed layer by layer.

As is seen with the ‘Crypto-Tux’ or ‘ECB-Penguin’ image where the Linux Tux Penguin gets encrypted using AES-128 in ‘Electronic Cipher Book’ simple substitution Mode

https://github.com/robertdavidgraham/ecb-penguin

The actual moral is

“Fully Determanistic algorithms have inverse algorithms.”

(Even supposed “One Way Functions” are invertible to a dictionary attack, especially if the actual input data has a very limited input set).

lurker • May 22, 2024 2:43 PM

Here we go again. There was shock, horror, when some people discovered that the “blacking out” function could be peeped under.

If you don’t want stuff to be seen, remove it completely with a big sharp knife, or better, don’t put it there in the first place.

Reversing the past • May 22, 2024 7:33 PM

@lurker

There is a problem with

“If you don’t want stuff to be seen, remove it completely with a big sharp knife”

It’s proportional fonts that are all to common in documents these days.

Unlike monospaced, fixed-pitch, fixed-width, or non-proportional fonts, the characters are all different widths with lowercase ‘i’ usually being the most narrow and upper case ‘W’ the widest.

If you cut out a single word or a suspected phrase the an unredacting process can find out if the width matches the hole that’s been left with quite some probability.

There is also another issue with proportional fonts. Few people by eye can see if the width of the white space changes.

You can hide a DRM / canary-trap serial / ID number in a printed document in such a way that even cutting out 9/10ths of the text on a page will still not remove the number thus the ‘whistle blower’ will be identifiable.

So one way to make it more anonymous is to scan the document using ‘Optical Character Recognition'(OCR) then “spell check” it etc then save it as an ASCII text file.

Take the file to another computer and read the text file into an editor and reformat the sentences paragraphs and other basic document format back in using a monotype font

Replace redacted words/sections with minus signs in blocks of five with a single space. Not to match words but fill out to approximately the basic space.

This way you keep the basic structure of the document thus limit the usual ploy of people saying

“This can not be a valid document as it looks nothing like any of our documents.”

“We have never produced a document that looks like that.”

“I’ve never seen a document that looks like that in our organisation.”

Etc. Thus not denying if the text is true or not but denying the “style” or “look” of it.

blocks of five • May 22, 2024 8:11 PM

fib • May 23, 2024 6:51 AM

Someone did some research for photos the police or media redacted and discovered that squinting your eyes would reveal an accurate enough impression of the face for the person to be recognisable.

Interesting! Who did it? This time you didn’t provide the super scientific youtube link…

Bob Paddock • May 23, 2024 7:46 AM

It has been found that True Random Noise is noticed by the human eye when added to photos. So Blue Noise was created:

“Gaussian Blue Noise”

https://dl.acm.org/doi/10.1145/3550454.3555519

“Blue noise for diffusion models SIGGRAPH (Conference Proceedings), 2024”:

‘https://xchhuang.github.io/bndm/

Johnny Memonic Stego.7z • May 23, 2024 9:10 PM

Visuals and QR barcodes might be the way to go coupled with 3D printing for the blind.

If anybody wants to play with a free screengrab picture for stegosaur purposes, here ya go.

It’s clearly not that generic, which is GOOD.

We Have Explosive
There Will Be No Armageddon

both tunes, temporarily in musically and visual synchronized playback

You can do some interesting research on this.

Unredacting Pixelated Text

Comments

Leave a comment Cancel reply