Claude Used to Hack Mexican Government

An unknown hacker used Anthropic’s LLM to hack the Mexican government:

The unknown Claude user wrote Spanish-language prompts for the chatbot to act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them and determining ways to automate data theft, Israeli cybersecurity startup Gambit Security said in research published Wednesday.

[…]

Claude initially warned the unknown user of malicious intent during their conversation about the Mexican government, but eventually complied with the attacker’s requests and executed thousands of commands on government computer networks, the researchers said.

Anthropic investigated Gambit’s claims, disrupted the activity and banned the accounts involved, a representative said. The company feeds examples of malicious activity back into Claude to learn from it, and one of its latest AI models, Claude Opus 4.6, includes probes that can disrupt misuse, the representative said.

Alternative link here.

Posted on March 6, 2026 at 6:53 AM4 Comments

Comments

Clive Robinson March 6, 2026 7:05 AM

@ ALL,

A rose by any other name…

I’ve yet to see research on bypassing guard-rails by using a language other than “english” but I can see it being “easily possible” and this gives an indicator it probably is.

As has already been proved, no matter where you put guard-rails around Current AI LLM systems you can by simple ciphering or coding get past them. Using a different language is at the end of the day, just another form of “code” as the “code-talkers” during WWII proved,

https://www.britannica.com/topic/code-talker

Clive Robinson March 6, 2026 7:17 AM

@ ALL,

As a certain disruptive entity of this blog that in the past used demand references etc has pooped up yet again…

You can read more about the proof about getting past guard rails,

https://www.quantamagazine.org/cryptographers-show-that-ai-protections-will-always-have-holes-20251210/

However long before that I’ve described on this blog how to build the same to get past not LLMs but “humans” using,

1, A code book
2, A stream cipher

Which one or both have a “one time element” to achieve Shannon “Perfect Secrecy” using just a pencil and paper.

Tobacco Road aka Clive Robinson March 6, 2026 8:10 AM

Man, not only you are senile but also quite thick.
Your end is near.

Rontea March 6, 2026 10:02 AM

Security controls around AI models need to extend beyond prompt-level guardrails. We need robust frameworks for detecting misuse, monitoring outputs, and integrating AI into secure operational environments. As attackers accelerate their workflows with AI, defenders must account for the inherent dual-use nature of these systems—or we’ll keep seeing stories like this, only bigger and faster.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.