LLM-Assisted Deanonymization
Turns out that LLMs are good at de-anonymization:
We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision and scales to tens of thousands of candidates.
While it has been known that individuals can be uniquely identified by surprisingly few attributes, this was often practically limited. Data is often only available in unstructured form and deanonymization used to require human investigators to search and reason based on clues. We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests—then search for you on the web. In our new research, we show that this is not only possible but increasingly practical.
Subscribe to comments on this entry
Clive Robinson • March 2, 2026 7:50 AM
@ ALL,
I guess it shows that,
“Statistics match Statistics”
Which is actually a dangerous thing to do…
Because macrostates are not microstates[1] and telling the difference between which is which can be difficult.
But worse, consider you are even if you are an identical twin unique. Even though you appear to share your microstate you do not because the number of “measures” that can be made to form indicators is always too small.
LLMs use “tokens” that are “vectors” that are just an ordered “Bag of Bits”. The “Digital Neural Network”(DNN) they use each neuron is a simple “Multiply and ADd”(MAD) “Digital Signal Processing”(DSP) function.
BUT… the output of the neuron is put through a “reducing function” befor it is “fed forward” into the next layer that reduces things down. All to often it is little more than a “Hard Limited Rectifier Function” which is nearly the equivalent of a “parity” function in effect reducing the vectors down to a macrostate.
Thus by their very design DNNs in LLMs are guaranteed to make mistakes.
[1] To understand the difference imagine a number that is say a thousand bits long. The macrostate is the number of bits that are set. Whilst the microstate is the actual pattern of bits that are set. Obviously the number of macrostates is N but the number of unique microstates is 2^N. So the number of microstates goes up way way faster than the number of macrostates.