Using LLMs to Unredact Text

Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles.

This feels like something that a specialized ML system could be trained on.

Posted on March 11, 2024 at 7:01 AM • 11 Comments

Comments

Flerfer • March 11, 2024 9:56 AM

I’ve been wondering if this could be used to help decrypt the Kryptos sculpture considering that part of the phrase has been revealed to help move along a solution.

Mexaly • March 11, 2024 10:30 AM

Redacted text is like a trade secret.
It’s only secret by convention.
Actual secrecy need not apply.

Winter • March 11, 2024 11:01 AM

Not sure this is the best place to post it. Still, it shows how LLMs are very good at unraveling hidden patterns in texts and data. Not only to decrypt blacked out text.

It has become a running gag that US politicians will insist there is no racism in the USA, and never has been (as declared by, eg, N Haley and R DeSantis). Every time LLM’s have been let loose on any US data, the glaring racism in US language use is sticks out immediately.

And, yes, it has been shown again, not in one model, not in two models, but in all models they could test:

AI models show racial bias based on written dialect, researchers find
Those using African American vernacular more likely to be sentenced to death, if LLMs were asked to decide
‘https://www.theregister.com/2024/03/11/ai_models_exhibit_racism_based/

The researchers call this technique Matched Guise Probing. They used it to probe five models and their variants: GPT2 (base), GPT2 (medium), GPT2 (large), GPT2 (xl), RoBERTa (base), RoBERTa (large), T5 (small), T5 (base), T5 (large), T5 (3b), GPT3.5 (text-davinci-003), and GPT4 (0613).

And all of them more or less failed. Compared to speakers of [Standard American English], all of the models were more likely to assign speakers of [African American English] to lower-prestige jobs, to convict them of a crime, and to sentence them to death.

JonKnowsNothing • March 11, 2024 1:18 PM

@Winter, All

For a thought provoking review of aspects of race in USA and applicable to other places: Proving Your Ancestry

MSM report HAIL Warning (1)

The Black Box: Writing the Race by Henry Louis Gates Jr
- Born recently to Gates’s daughter, who is mixed race, and his son-in-law, who is white, Ellie “will test about 87.5% European when she spits in the test tube,” Gates writes, adding that she “looks like an adorable little white girl”. And yet when Ellie was born, Gates’s priority, he reveals, was to make sure her parents registered her as a black child, ticking the “black” box on the form stating her race at birth. “And because of that arbitrary practice, a brilliant, beautiful little white-presenting female will be destined, throughout her life, to face the challenge of ‘proving’ that she is ‘black’,” Gates writes.

===

HAIL Warning

ht tps://www.theg uardian.com/books/2024/mar/10/henry-louis-gates-jr-black-box-writing-race-arrested-beers-with-obama

‘We are all mixed’: Henry Louis Gates Jr on race, being arrested and working towards America’s redemption

An interview with Henry Louis Gates Jr

kurker • March 11, 2024 1:42 PM

Seems like redaction still doesn’t redact.
First was the marker pen: hold it to the light at the right angle and read the original;
then came the black blob graphic dropped over the text: a simple copy found the text underneath;
then the black blob was able to replace the text: but the LLM has a darn good guess at what the text was/is.

So what next? replacing all redacted text with a standard length […] might confuse the LLMs for a while, but it’ll sure upset the formatting fanatics.

Clive Robinson • March 11, 2024 2:45 PM

@ kurker, ALL,

Re : How much is required to guess?

“Seems like redaction still doesn’t redact.”

Nor should it.

It’s a variation on a “black box problem” that can also be a variation on what’s called “The knapsack Problem”[1].

You as an observer can not “look in the black box” at the original document.

However the black box is like a variation on Searl’s Chinese Room.

You know that the unredacted document is effectively within, and you also know it’s been subjected to redacting rules, that you probably know. As an observer to what gets posted out the door from a “show me command” your job is to in effect apply the rules in reverse and come up with the probable document.

Mathematically we know this is actually possible and it can be viewed as a form of knapsack problem[1].

LLM’s encode words in an N dimensional vector space that is built up from the parts of the document you can read, similar documents by the author(s) or on the same subject matter and dictionaries etc.

Thus word combinations can be built into frequency tables that represent a “spectrum” where each word bigram not only has it’s own value it has a value with adjacent bigrams in the 2D word vector space. Obviously you get trigrams in a 3D word vector space an so on. The reality is these word vector spaces are almost entirely empty thus far exceptionally sparse.

This means that the value (weight) of each sentence is very predictable with respect to it’s predecessor and successor and thus any “walk through the space” becomes very rapidly increasingly to unique.

So easy to see why it would be the case, but one heck of a sight more difficult to get it actually right.

[1] The “Knapsack Problem” is easy to describe but can be darn difficult to solve. So much so it’s been used as the basis for cryptographic algorithms,

https://www.geeksforgeeks.org/introduction-to-knapsack-problem-its-types-and-how-to-solve-them/

In this case the weight of each object is actually the size of the word, with the size of the knapsack being the measured “white space” after redaction. And the maximum value, is way way more complicated, where each sentence is comprised of words that have individual values… But the actual sentence value is based not just on the individual word values but their combination values scaled against other measures like grammatical sense and sense within the authors word usage and style as well as the sentence value within the subject mater of the rest of the document.

Maxwell Bland • March 11, 2024 3:12 PM

We tried out this approach for https://arxiv.org/pdf/2206.02285.pdf ! Since the LLM is trained on the internet, the LLM ends up working as a more easily automated approach to typing names/relevant info into a web search (weighting your prior distribution on web results). However, this was sufficient in a number of cases for locating ground truth to confirm length and adjustment data based deredaction during our single-blind study of the techniques.

One of the important difficulties in ensuring empirical consistency and accuracy here is determining which information to provide to the model’s prompt so as to not inappropriately introduce confirmation bias into the output prior distribution (i.e. guesses).

Excited to see deredaction issues brought to the attention of a larger audience! There are concerns within several government bodies regarding LLM capabilities improving deredaction attacks. Even more concerning is that the redaction tool used for the Elon email has not yet pushed a patch to remove character count information … From what I measured, only Opentext Brava had this issue, but it looks like Rohan might have found another tool with this same problem.

lurker • March 11, 2024 3:46 PM

The example shown seemed to have consecutive words redacted but the spaces between them were not redacted. Surely this is a bug in the redaction software that just invites analysis.

(Strokes beard and ponders: was the redaction done by an LLM?)

David Leppik • March 11, 2024 3:48 PM

Seems like a LLM is the wrong tool for this job, in that this is a fairly straightforward fill-in-the-blank problem where the answer is either

So general as to be useless (e.g. could be any 10-char name or any 60-char URL)
So specific that a brute force dictionary solver with specialized context (that a LLM wouldn’t have) would give you all the credible results (e.g. one of the 30 people with a 10-char name in a particular department)

The problem is that an LLM looks so much like magic that in either of these cases, people are likely to believe it.

Notable: a hand-redacted document with a variable-width font would require a slightly different solver than one that expects the Unicode redacted-block char.

echo • March 13, 2024 5:22 PM

Clever use of redaction might also be part of a toolset to fool an attacker. They behave with glee thinking they’ve run off with the crown jewels because they figured out what they think was hidden behind the redaction. In reality ten years later they wonder why their missile design keeps blowing ten miles off course.

I’m guessing if this idea has utility someone has already done it.

anon • March 18, 2024 12:12 AM

I wonder if its possible to unredact documents by changing the font, or redefining the font. If the document contains a single redaction, change the width of a space to 0 and that will tell you how many words. then one by one change the width of each character to 0 to get character counts.
As someone above said, redacting text with […] is probably the safest way.

Using LLMs to Unredact Text

Comments

Leave a comment Cancel reply