LLMs Generate Predictable Passwords

LLMs are bad at generating passwords:

There are strong noticeable patterns among these 50 passwords that can be seen easily:

All of the passwords start with a letter, usually uppercase G, almost always followed by the digit 7.

Character choices are highly uneven for example, L , 9, m, 2, $ and # appeared in all 50 passwords, but 5 and @ only appeared in one password each, and most of the letters in the alphabet never appeared at all.

There are no repeating characters within any password. Probabilistically, this would be very unlikely if the passwords were truly random but Claude preferred to avoid repeating characters, possibly because it “looks like it’s less random”.

Claude avoided the symbol *. This could be because Claude’s output format is Markdown, where * has a special meaning.

Even entire passwords repeat: In the above 50 attempts, there are actually only 30 unique passwords. The most common password was G7$kL9#mQ2&xP4!w, which repeated 18 times, giving this specific password a 36% probability in our test set; far higher than the expected probability 2-100 if this were truly a 100-bit password.

This result is not surprising. Password generation seems precisely the thing that LLMs shouldn’t be good at. But if AI agents are doing things autonomously, they will be creating accounts. So this is a problem.

Actually, the whole process of authenticating an autonomous agent has all sorts of deep problems.

News article.

Slashdot story

Tags: AI, LLM, passwords, random numbers, reports

Posted on February 26, 2026 at 7:07 AM • 25 Comments

Comments

Matthias Urlichs • February 26, 2026 8:26 AM

Heh. That’s not just an LLM problem. Humans do that too: we all know that the correct way to create a password is to fire up “pwgen”, or ask your password manager or whatever, no exceptions — but when we’re in the flow and need a quick password-ish string, we still resort to hitting a not-quite-random bunch of keys. Or just type “$ekriT1248”.

The real issue is that the distance between institutional memory (the LLM knows how a password should be generated if you ask it!) and short-term objectives is too large. Fixing this requires access to a tool — followed by training, to break the pattern of not using it. In fact, the frontier labs should probably just fix training input: replace all literal password-ish strings with instructions to do an MCP call.

Vesselin Bontchev • February 26, 2026 8:31 AM

Programs designed to generate statistically likely words happen to generate statistically likely passwords. News at eleven.

a clown • February 26, 2026 9:16 AM

Laziness and ignorance have their consequences.
There are places where you can order food, drinks, medications, etc. etc. so you do not have to get out of your car. These places were invented in the USA so people could have the convenience. This has only added to the OBESITY epidemic in the USA and many other countries that thought “if Americans are doing it – it must be good” (LOL!).
Add the High Fructose Corn Syrup and much of other GARBAGE that is legally being fed to all Americans, and many many other things that are actually cancer generating substances but hey, who the fck cares. If it makes a buck – Bring it on!

In the world of cybersecurity, there’s a price to be paid for taking shortcuts (laziness is sometimes also called “time saving measures” or “efficiency” or “productivity” or blah blah blah) and the key thing here is When and Where to resort to an App to do something for you that will be better, more secure, than if you’d done it yourself, the old, “slow” manual way.

Patrick Gill • February 26, 2026 9:25 AM

LLMs used to be bad at arithmetic too. How long before a good LLM will know to defer to /dev/urandom when it needs entropy to make a password? This seems like a fixable problem.

Clive Robinson • February 26, 2026 9:43 AM

@ Bruce, ALL,

Predictable is not Random and Random is essential to AI function

We used to call the Current AI systems “Stochastic Parrots” implying a uniform random selection probability for phrases.

If an LLM system can not “do random” for passwords then it calls into question the “random selection of phrases”. Which calls into question the rest of the LLM usage.

Which calls into question the use of LLMs all and their other uses for which LLMs have been suggested…

jm • February 26, 2026 10:00 AM

But if AI agents are doing things autonomously, they will be creating accounts. So this is a problem.

If that were the extent of the problem, the solution would be simple: delegate to a tool that uses a properly seeded CSPRNG to generate passwords when needed (as Patrick suggests).

The real problem is that any credential that is exposed to the model’s context becomes vulnerable to subsequent extraction via prompt injection. And even if you isolate the credential in tool configuration, a prompt-injected agent is still a Confused Deputy.

Rontea • February 26, 2026 10:38 AM

Large language models, by design, optimize for pattern recognition and human-like output—not for entropy. When tasked with generating passwords, they produce predictable sequences and systematically avoid certain characters, creating a security liability for autonomous agents. This isn’t just about weak passwords; it’s a symptom of a deeper problem: authenticating non-human actors in a system designed for human credentials. Until we rethink how these agents establish trust, we’re layering brittle automation onto brittle security assumptions.

Clive Robinson • February 26, 2026 11:37 AM

@ ALL,

A part of the quote from the article says,

“There are no repeating characters within any password. Probabilistically, this would be very unlikely if the passwords were truly random”

Is actually not technically not true.

There are two degrees of freedom in a random sequence, value and order position.

We normally think about the value being random, but actually we are more used to the order / position being random.

That is think of a pack of cards, there are 52 unique cards each has a different “value” there are no repeats. When we “shuffle the pack” if we do it properly then the order the cards are in is “random” but there can not be any repeating values.

This statement from the authors of the article gives me pause to think about what else they have written that may be wrong… because they actually should know this, but either they do not or have chosen to not mention it…

There are quite a few “Card Shuffling” algorithms perhaps the most well known for various reasons
is RC4. Along with our hosts @Bruce’s “Solitaire”.

Matt • February 26, 2026 1:39 PM

I normally avoid LLMs like the plague, but occasionally I’ll test something in CHATGPT (usually trying to break it or test injection attacks). Yesterday I was able to get it to generate a response of “random words” and it continued outputting for almost 7 minutes straight totalling around 25,000 words. Way more than its ostensible token limit.

Even funnier is that it kept generating the same set of 50 or so words over and over, thousands of times in a row. So the notion that it can’t generate random passwords is quite believable.

Snarki, child of Loki • February 26, 2026 3:20 PM

I’m somewhat surprised that an LLM asked to generate a password didn’t generate an abundance of replies “HorseBatteryStapleCorrect!”

Matthias Urlichs • February 26, 2026 4:19 PM

@Clive,

> We normally think about the value being random, but actually we are more used to the order / position being random.

A password generator does not work by shuffling an alphabet. It randomizes each character separately. Or, in other words, it puts the card it just drew back before drawing a new one. So OF COURSE there can be duplicate characters, and in a set of sufficiently many and/or long passwords there WILL be some (or rather, there not being duplicates gets very unlikely, stochastically).

If not, somebody’s doing something wrong. Passwords are for machines, not for people; the appearance of randomness is strictly less important than actual randomness.

Clive Robonson • February 26, 2026 5:06 PM

@ Matthias Urlichs,

What you say,

“… in other words, it puts the card it just drew back before drawing a new one”

Is not true of short strings like passwords as it’s not just inefficient, it can also provide an opportunity for easier to exploit side channels that leak information.

You simply shuffle the deck (array) then deal from the deck (read out the array) the required length.

But for strings longer than the array you try to avoid the “put the card back one by one” as much as possible because you have to “shuffle it in” and that causes both bias in the output and side channels that leak information.

P.S. the URL you use of,

http://matthias.urlichs.de/

Blows up in web browsers to a mess of code because it is incorrectly coded according to standards and EU law. Maybe you should sort it out.

Ron • February 26, 2026 6:11 PM

Somewhat amusing that we again worry about some new scenario where LLMs aren’t perfect, a scenario which almost immediately will be fixed by the big AIs.

Ron • February 26, 2026 6:15 PM

And how did it know my G7$kL9#mQ2&xP4!w password!! Inconceivable!

Rob Russell • February 26, 2026 7:05 PM

Hmmm…I wonder if the LLMs are fixing this as we go. Just tested Claude (Sonnet 4.6) and I couldn’t get it to repeat a password in multiple tries.

However, the first lot of 50 passwords all started with a capital letter.

After a gentle scolding, Claude acknowledged it’s own bias and had another go with a much more seeming random distribution (I had it chart the results of letter counts and occurrences).

but then it wrote me a python script using the secrets module which produces seemingly random results. I wonder if this is how password managers generate their “random” strings?

Good fun, even considering the limitations.

Clive Robinson • February 26, 2026 9:04 PM

@ Ron, Rob Russell, ALL,

You note,

“we again worry about some new scenario where LLMs aren’t perfect, a scenario which almost immediately will be fixed by the big AIs.”

Two things to think further on that,

1, Quite alarmingly there are almost daily “new scenarios”.
2, The speed they are fixed is even more alarming.

The way these things get fixed is in effect “Reinforcement Learning from Human Feedback”(RLHF),

https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback

Which describes the method as,

“Optimizing a model based on human feedback is desirable when a task is difficult to specify yet easy to judge. For example, one may want to train a model to generate safe text that is both helpful and harmless (such as lacking bias, toxicity, or otherwise harmful content). Asking humans to manually create examples of harmless and harmful text would be difficult and time-consuming. However, humans are adept at swiftly assessing and comparing the harmfulness of different AI-generated text. Therefore, a more practical objective would be to allow the model to use this type of human feedback to improve its text generation.“

Which is not that dissimilar to the “old school” teaching technique of giving “a smack around the head” or keeping the child in detention over lunch so they are denied food, for giving wrong answers. That is “incentivising by reward” where the reward is not to feel pain…

Then at some later point the fault might be corrected by changing training data or policy, which is eye wateringly expensive.

Clive Robinson • February 26, 2026 9:22 PM

@ Bruce, ALL,

Speaking of DNNs, randomisation, and eye wateringly expensive

Up above I noted that good randomisation was important for some Current AI systems, but did not give an example.

One such is image generation that in effect tries to reverse “noise” back to a picture by iteration.

Needless to say it’s power hungry and sensitive to the quality and type of randomised data used.

Well some think they can significantly reduce the cost,

Thermodynamic Computing Slashes AI-Image Energy Use

Heat may be 10 billion times as efficient for randomization

“Generative AI tools such as DALL-E, Midjourney, and Stable Diffusion create photorealistic images. However, they burn lavish amounts of energy. Now a pair of studies finds that “thermodynamic computing” might generate images using one ten-billionth the energy.“

https://spectrum.ieee.org/thermodynamic-computing-for-ai

I guess the real question though is not about the actual energy saving as such but,

“The result of the energy saving for a given performance in time.”

Ismar • February 27, 2026 12:07 AM

Why are we still using passwords should be the question being asked

ResearcherZero • February 27, 2026 1:53 AM

List of papers submitted at the Network and Distributed System Security (NDSS) Symposium.
Includes LLM research in many areas and papers on other networked and distributed systems.

265 papers in total. Plenty of reading material for anyone.

https://www.ndss-symposium.org/ndss2026/accepted-papers/

Matthias Urlichs • February 27, 2026 3:00 AM

@Clive

Why the heck is the “put back” method inefficient (you’re not literally putting anything back, it’s just a sequence of independently-random characters), and why should it have a side channel to leak?

NB Thanks for the heads-up WRT my broken website, but you might have written me privately about that. (NB² if you think I’m breaking EU law with this one, your understanding of EU law is faulty.)

Ian Stewart • February 27, 2026 7:39 AM

Ask a random selection of people to play a well-known melody on an obscure instrument and they almost certainly couldn’t. Teach them how to play the instrument, tell them to practise for sometime, then many could.
Numerous surveys have shown that people can’t produce random numbers or passwords. I maintain that if taught what constituted a random number or password, followed by considerable practise, then humans can produce random numbers or passwords.
As far as I know, no study has ever done this.

Short version, why use AI to produce passwords when with practise you can do it yourself?

JB • February 27, 2026 6:50 PM

It’s still wild to me that LLMs are so bad at exactly the stuff computers are typically good at. You’d think the first new ability they would give an AI agent beyond generating likely text is let it use a calculator. Train the AI to query Wolfram Alpha any time it needs to make a numerical calculation, generate a random number, etc. I’m entirely naïve on this stuff, but I would have thought by now LLMs would be doing math stuff just fine by using the same old dumb calculating circuits that I use when I need a computer’s help.

Chris Drake • March 19, 2026 4:52 AM

LLMs 100% create perfect passwords: they use an appropriate tool for that.

(without even prompting it, mine used “rand()” and “openssl rand” in places where it needed random).

People who don’t know how to use LLMs are vastly more of a problem – forgetting to give it the right tools to do its job properly, or to check it used one.

There is a deeper problem here: passwords. It would make more sense to use keys or a TPM (which has keys which cannot be stolen) instead of passwords.

1id.com is my PoC for helping to sort out some of this mess.

Clive Robinson • March 20, 2026 9:57 AM

@ Matthias Urlichs,

With regards,

“if you think I’m breaking EU law with this one, your understanding of EU law is faulty.”

It’s the one nearly every Web site breaks and falls under,

“Disability discrimination”

It requires a web site to have a second usable interface for visually, auditorily, etc disabled persons.

Clive Robinson • March 20, 2026 10:14 AM

@ Chris Drake,

With regards,

“LLMs 100% create perfect passwords: they use an appropriate tool for that.”

So,

“No the LLM does not create the pasword”

but,

“The tool the AI can access can.”

It’s quite important to make the distinction. Because if the LLM only has a crap or backdoored tool then the password is going to be unusable.

The other problem is that most LLM’s don’t come with a user manual, let alone one for the tools it might have access to.

So users doing vibe coding or other “let the LLM make the call” type scripting are in effect doing podial target practice on their own anatomy.

Schneier on Security

LLMs Generate Predictable Passwords

Comments

Leave a comment Cancel reply