Entries Tagged "LLM"

Page 1 of 12

Anthropic’s Fable and the State of AI

On June 9th, Anthropic released its Fable generative AI model. Three days later, the US government classified it as a dangerous munition, and used its export-control authority to prohibit any foreign nationals from accessing it. Unable to differentiate between Americans and foreigners, the company shut off access for everyone.

The government’s actions won’t help. The problem isn’t any one particular model; it’s the general trend of increasing AI capabilities. And any real solution requires the sort of collective action that just isn’t possible right now.

Fable is the constrained version of Mythos, the AI model Anthropic announced in April. Anthropic only released it to a few selected organizations, because the company claimed it was so good at finding and exploiting vulnerabilities in computer code that releasing it more generally would be dangerous.

It was an obviously self-serving announcement, and because few were able to verify Anthropic’s claims they were met with some skepticism. Those with access used Mythos to find and patch many vulnerabilities in their own software. But one UK group found the latest, already public, OpenAI model to be just as powerful.

Fable is just another incremental improvement in the years-long climb of AI capabilities. But just as important as the AI model is the “harness.” This is typically not AI. It’s ordinary computer code that interfaces with the user. It stitches together AI models, decides how and for what purposes they can be used, and gives them useful tools such as web search and the ability to run their own computer code.

When Mythos first entered limited release, there was widespread debate whether its power came from the model or the harness. With Mythos demonstrating that it was possible, the open-source community scrambled to build harnesses that could steer other AI models towards similar capabilities. Harness improvements don’t need massive data or data centers.

They largely succeeded. For example, a Prague company was able to replicate Anthropic’s few verifiable cybersecurity capabilities with a much smaller and cheaper model—and a more sophisticated harness. Last week, a group showed that multiple cheaper models harnessed in concert matches Fable’s performance.

The broader community had only a few days with Fable, but that time we learned some about its capabilities. Its difference is less the new model’s raw analytical and problem solving capabilities, and more that the model doesn’t need that sophisticated harness.

Fable requires much less expertise and detailed prompting from the human user. You can give it a difficult goal and it will figure out novel and unexpected ways to satisfy it, finding loopholes in whatever constraints you or the system have imposed on it.

“Relentlessly proactive” is how AI researcher Simon Willison described it. Another descriptor might be “creative.” Experienced AI developers have had that combination of creativity and proactivity since last year, but Fable puts it within easy reach of everyone.

In the hands of someone with a legitimate problem that needs solving, that can be an incredibly useful capability. But in the hands of someone who wants to do harm, it can be equally dangerous. AIs don’t have a moral compass in the same way that people do. They are agents of the wants and desires of the people who prompt them.

That points to the real problem with relentlessly proactive AI. In language, wants and desires are always underspecified. If I ask you to get me some coffee, you would probably pour me a cup from the coffeepot, or buy one from a nearby coffee shop.

You couldn’t buy me a pound of raw beans, or a coffee plantation. You wouldn’t order a cup of coffee for delivery next month. You wouldn’t find a nearby person, rip a cup of coffee out of their hands, and bring it to me. I wouldn’t have to specify any of the million limitations to my request; you would just know.

Human stories are filled with warnings about underspecified desires. King Midas wished that everything he touch turn to gold, forgetting to add “but not my food, drink, and daughter.” And genies are notorious for granting your wish in a way you wish they hadn’t.

The deeper point is that it’s impossible to list all limitations and restrictions, and like a malicious genie, a creative AI will find the ones you forgot. Block a database you don’t want it to have access to, and it might figure out how to bypass your control. Ask it to book a flight, and it might hack the airline because the website says the flight is sold out. Ask it to save money on your cellphone plan, and it might cancel it altogether—or get someone else to pay for it. As far as we know now AI has not done any of this yet, but you get the idea.

Malicious intent is not required. To an AI model, constraints are just things to get around and not general truisms about the world. They are creative problem solvers and natural rule breakers. They “hack” in the sense that they find and exploit loopholes.

Human systems rely on so many norms that we scarcely recognize the existence of until they are broken. AIs naturally think outside the box, because they don’t have any real conception of what the box is or why it’s there in the first place.

There is no foolproof way to prevent people from using AI models to complete harmful tasks. There is no way to prevent the models from incidentally causing harm while completing benign tasks. AI models are no longer isolated from the real world. They browse the internet and answer emails.

They trade stocks and make purchases. They control physical systems. They are, in effect, robots that affect life and property. We have no technical mechanisms to verify the integrity of an AI system. This level of capability and creativity in the hands of us untrustworthy humans will have both great and terrible results.

The problem is not unique to Anthropic. Mythos/Fable might currently be the most capable rules hacker, but more sophisticated harnesses give other models similar capabilities. And we should assume that the other frontier models are no more than a few months behind, and that open-source models are less than a year behind. At best, any ban only serves to delay the problem for a short while.

That delay might be useful if we—as a society, as a planet—would use that time to come together and figure out what to do. This isn’t a US/China arms race problem; this a species-level problem that requires coordinated action at that scale. Unfortunately, we have no mechanism to do that. I first wrote about this problem five years ago, but it was all too futuristic.

Today, when its right in front of us, there is no world government that can impose constraints on the for-profit corporations currently controlling AI models and research. The US has no appetite to effectively and even-handedly regulate those corporations, even as they do catastrophic damage to the environment, democracy, and—in this case—society in general.

This all makes an AI public option all the more necessary, and urgent. Today’s AIs can be fast, smart and secure, but only two of the three are possible for any given system. These safety tradeoffs are tightly held secrets of companies racing to beat one another, and they tell us we have to trust them. Instead, the choices and their consequences need to be brought out into the sunlight.

We should be funding open-source harnesses that balance capability and safety—that achieve useful goals without so much power—and open-source AI models whose provenance and biases are public and well understood. We have opened the AI Pandora’s box. Now we have to make the best of it.

This essay originally appeared in The Guardian.

Posted on June 19, 2026 at 7:03 AMView Comments

Embedding Forbidden Text in Spyware to Discourage AI Analysis

At least one malware developer is adding text about nuclear and biological weapons to their spyware, in an effort to stop automatic AI analysis.

Details:

The _index.js payload begins with a large JavaScript block comment containing fake system instructions and policy-triggering content. Because it is inside a comment, it does not affect JavaScript execution. The runtime skips it. The real malware begins after the comment with a try{eval(…)} wrapper around a large character-code array and a ROT-style substitution function.

This header appears designed for AI-mediated analysis, not for Node, Bun, or Python. It attempts to derail scanners or analyst copilots that feed the beginning of a file to a language model without clearly isolating the content as untrusted data. In weak pipelines, this can cause refusal behavior, prompt confusion, context pollution, or premature classification before the scanner reaches the actual malware.

This is not a magical bypass against static detection. YARA rules, entropy checks, AST parsing, string extraction, deobfuscation, and behavioral rules still work. But it is a practical anti-analysis trick against naive LLM-first triage systems.

Posted on June 18, 2026 at 7:04 AMView Comments

AI Use by the US Government

On 14 April, the Trump administration quietly acknowledged the widespread use of AI to automate government processes. The office of management and budget (OMB) disclosed a staggering 3,611 active or planned use cases for AI across the federal government. The list has ballooned by 70% from the one published in the final year of the Biden administration, and includes many disturbing-seeming plans to hand over sensitive governmental functions to AI.

Scanning this list, many readers may find many causes for alarm. It represents a transfer of decision processes from human to machine on a massive scale over matters of individual freedom, public health and well-being, nuclear reactor safety and more.

Consider these examples. The Health and Human Services’ (HHS) office of administration for children and families hired the world’s “scariest AI company,” Palantir—notorious for its work on behalf of the military, the CIA and ICE—to scan all grant applications to flag those not ideologically aligned with the administration’s dictates. The Federal Bureau of Prisons is developing an AI system to assess the “potential for misconduct for newly admitted inmates,” routing people into high-security confinement before they have actually done anything wrong in their custody. These read like programs fit for a Philip K Dick or George Orwell novel.

Other use cases insert AI into life-and-death decision making. The Department of Veterans Affairs is developing an AI that will listen in on calls to the veterans crisis line, and then gather information from external databases to assess the mental state and suicide risk of the caller.

The Department of Energy is testing the use of AI to control nuclear reactors, targeting a way to autonomously respond to potential nuclear safety incidents. Here’s one that’s disturbing for its retirement, rather than its deployment: the state department has ended a program to use AI to forecast mass civilian killings, which had been intended to aid conflict prevention.

While it’s easy to raise questions about these and similar uses of AI, the reality is that any of these programs could be implemented responsibly. In some cases, like the HHS system, the AI might be enforcing alignment to a policy prescription that opponents abhor. But that concern is more about the policy itself rather than the idea that agencies should comply with executive orders.

In other cases, there may even be bipartisan agreement on the goal, like taking urgent action to help veterans at risk of self-harm. Lots of work and validation is needed to prove AI safe and effective for these use cases and convince the public it is appropriate, but the idea is plausible.

In other cases, a scary-sounding AI use may not even be new. The use of predictive methods and statistics to assign prisoner security classifications goes back decades, even if such systems are often biased and ineffective.

Using autonomous systems for model predictive control (MPC) of nuclear reactors is a well studied, and a widely applied aspect of nuclear plant management. And the recently disclosed addition of AI was initiated under the Biden administration.

But anyone reviewing the 2025 inventory could be forgiven for leaping to severe conclusions. What matters are the details of how the AI system is used, and here the inventory is severely lacking.

The disclosures carry minimal information, and lack the context necessary to understand their purpose and approach. The descriptions are typically just a sentence, and rarely more than a paragraph.

And while the process theoretically involves some form of public consultation, in reality there is generally none. It would take an eagle-eyed citizen to even come across this disclosure. Unless you read FedScoop regularly, or watch the OMB’s federal chief information officer’s GitHub account, you probably missed it.

Only one of the examples cited above (the DoJ) even proposes to involve the public. Under the administration’s policy, it’s not required for the rest because they are not classified as “high impact” use cases—a label that is applied inconsistently across agencies.

We wrote a book surveying applications of AI to democratic processes worldwide, including executive agencies as well as the courts, legislatures and politics. Our conclusion was that, while there are inappropriate applications of AI in governance that should be resisted, an urgent need to reform the economics of AI, and an imperative for renovating the democratic systems it is being unleashed on, there are also valuable and beneficial use cases for AI in government.

Machine translation is a good example. Customs and Border Protection (CBP) has deployed an AI translation system to help officers when human interpreters are not available. The idea that CBP, an agency under heavy scrutiny for reported abuses of human rights, would direct people to talk to a machine instead of a person may strike many as inhumane.

It’s true that human interpreters have very real advantages when it comes to understanding nuance from physical cues and social context. But an officer with a competent AI translator available immediately is better than one who cannot communicate with the person in front of them.

The Trump administration’s AI use case inventory has 70 such translation use cases, up from 58 in the Biden administration’s 2024 disclosure.

Disclosure of AI use cases could be a means to build public confidence and trust, but only if paired with consistent, meaningful public consultation. Washington DC and California are actively engaging the public to determine where and how it’s appropriate to use AI in government processes, or for government to regulate AI use in society.

Both have held public deliberations on this topic at a wide scale, using AI platforms. These examples demonstrate the potential for capturing broad-based public input to steer AI policy.

The international gold standard was arguably set by the French in 2016, via their Digital Republic Act. The law, itself informed by an online citizen consultation, requires all algorithms used to automate government administrative decisions to be subject to public records requests, to be appealable to a human reviewer, and to have mandatory notification of the use of automation to those affected by the decisions.

Canada offers another example of what more rigorous and participatory disclosure might look like. In 2025, they launched an AI use case registry, not unlike the US inventory. However, Canada also has a federal directive mandating a transparent risk-scoring and impact assessment process for automated systems that make administrative decisions about citizens.

That longstanding directive requires a detailed explanation of risks and benefits as well as consultation with certain stakeholders from the conception of the AI use case. The Canadian system could be improved; it could require a public comment period and an obligation for agencies to respond substantively to feedback before engaging in sensitive uses of AI.

AI offers real potential to improve the efficacy, efficiency and accessibility of government. But, equally, there is legitimate reason for public concern and distrust that can only be addressed through transparency and dialog. The US should adopt, at the federal and state level, algorithmic impact risk assessment procedures and public comment processes to facilitate a safe, trusted, equitable transformation of government agencies to take advantage of modern technology.

This essay was written with Nathan E. Sanders, and originally appeared in The Guardian.

Posted on June 17, 2026 at 7:04 AMView Comments

Bernie Sanders’ AI Sovereign Wealth Fund Plan

Let no one accuse Bernie Sanders of ducking the big questions. Writing in the New York Times last week, the senator asked: “Will the future of humanity be determined by a handful of billionaires who have promoted and developed AI, with virtually no democratic input, who stand to become even richer and more powerful than they are today?”

We agree entirely that this is one of the most potent questions facing global democracy today. Our book, Rewiring Democracy, surveys the emerging uses for and impacts of AI in democracy around the world and reaches the same conclusion: that the most urgent risk posed by AI is the concentration of power, wealth and control among tech oligarchs.

And yet we reached a vastly different conclusion than Sanders on what to do about it.

The senator points to a once radical but increasingly popular solution: creating a US sovereign wealth fund by taking 50% stock in AI companies such as Anthropic, OpenAI and xAI. The argument in favor of this is twofold. One: it would establish democratic control over the AI companies, giving the government “the power, through its voting shares and an equal representation on each company’s board, to block decisions that hurt our citizens and to push for policies that help them.” Two: it would return a big chunk of the economic rewards of soaring AI valuations to the public, ensuring “trillions of dollars potentially generated by AI are used to improve the lives of all of us.”

We laud both these goals unreservedly.

We wholeheartedly agree that there must be public influence over the development and use of AI, just as we demand the government intervene to ensure that automakers, drugmakers, airlines and other industries balance profitability with public safety and the public interest. And we credit the senator with recognizing that there are more levers for the government to pull beyond the promulgation of regulation to achieve this.

And we also agree that the obscene, dangerous accumulation of wealth among AI companies needs to be disrupted. As OpenAI and Anthropic race to be minted as the world’s latest trillion-dollar AI companies, we should recognize that—whether or not it constitutes a bubble—these staggering market capitalizations represent a transfer of wealth. The flow of money goes from the smaller businesses and actual people using AI, and being subjected to it, to the owners of these tech companies.

That includes the world’s 86 AI billionaires “seeking to maximize their power and profit” aiming to decide the “fate of humanity… behind closed doors in Silicon Valley,” as Sanders said.

And yet, while we do not outright oppose the taking of AI company stock, or of a US sovereign wealth fund, there are better ways to achieve Sanders’ stated goals.

Public ownership of these companies entangles corporate profit and valuation with the public interest. It would incentivize the government to clear regulations, permit the exploitation of workers and users, suppress competition, encourage AI adoption regardless of the responsibleness of the implementation or appropriateness of the use case, and otherwise act on behalf of corporate interests.

After all, if growing, say, Nvidia from its first $5tn in value to its next $5tn also represents a doubling in value of this segment of the sovereign wealth fund, then you can expect the fund managers to support chip sales, foreign and domestic, with the same zeal as the company’s private investors.

This is not an effective way to influence corporations to act in the public interest. In fact, it makes corporate influence on the government more likely.

We should be wary of this possibility because we’ve seen it before. Ownership of substantial stakes in oil companies by the Norwegian sovereign wealth fund, the world’s largest, does not seem to have steered those corporations to pro-environmental policies. Instead, the Norwegian government’s dependence on those companies has inhibited them from taking climate action. Here in the US, public employee pension funds merit the same criticism: the fiduciary duty to generate wealth overwhelms any intention to direct their corporate holdings in the public interest.

A better answer is to separate the two goals. The standard way to share private rewards with the broader society that made them possible is taxation. Senator Elizabeth Warren has proposed an excise tax on datacenters’ energy use. Others have proposed an AI token tax, which has much the same effect.

As to the goal of reshaping AI in the public interest, we have proposed an AI Public Option. The concept is for governments, be it federal or state, to establish publicly developed and operated AI models run by public institutions under democratic control. The idea is not to eliminate corporate AI or to seize it as a public asset, but rather for government to provide a competitive baseline that private AI offerings must meet or exceed to win business—just like the notion of a healthcare public option.

The Swiss have trailblazed this approach. Apertus is a large language model built by Swiss public servants, researchers at Swiss universities, using appropriately licensed training data and pre-existing Swiss public supercomputing infrastructure powered by renewable energy.

While Apertus doesn’t seriously compete with the latest OpenAI and Anthropic models on performance benchmarks, it blows them out of the water in transparency, sustainability and compliance with EU regulations including adherence to copyright. It’s a nascent project, but suggestive of how public institutions can apply competitive pressure for corporate actors to behave responsibly.

Don’t confuse public AI with “sovereign AI,” the notion that every country needs to invest in domestic AI infrastructure. Sovereign AI is often invoked as a marketing scheme for big tech companies looking to sell to governments; it demands public investment without guaranteeing public control.

Sanders is a bold and savvy political operator. So why is he pursuing the sovereign wealth fund strategy when he must be aware of these risks? It may be due to another argument he makes in his op-ed: that the Trump administration and the billionaire owners of AI are aligned to the idea.

It’s expedient to capitalize on rare moments of seeming alignment across diverse political factions, but it also behooves us to ask why the AI billionaires are open to this extraordinary intervention. The answer, of course, is that they believe that for every dollar ceded to government stock expropriation, they will get back more in favorable government policies to protect that newfound investment.

Energy taxation is a straightforward way to make AI companies pay for the social disruption of their technologies. Public AI represents a non-monetary mechanism for governments to shape the development of AI, complementary to direct regulation of private actors, one with a far greater chance of influencing corporate behavior towards the public interest. We urge Sanders and other political leaders to consider them.

This essay was written with Nathan E. Sanders, and originally appeared in The Guardian.

Posted on June 12, 2026 at 7:03 AMView Comments

Hacking Meta’s AI Chatbot

Hackers are convincing Meta’s AI support chatbot to let them take over other peoples’ accounts:

A video posted on X showed the step-by-step process to hack someone’s Instagram account. The hacker allegedly used a VPN to spoof the targets’ presumed location to avoid triggering Instagram’s automated account protections. Then, the hacker opened a chat with Meta AI Support Assistant and asked the bot to add a new email address to the target’s account. The chatbot can be seen sending a verification code to the email address provided by the hacker; the hacker then shares the verification code with the chatbot, which prompts the chatbot to show a button to “Reset Password.” The hacker enters a new password and takes over the victim’s account.

[…]

On Monday, Instagram spokesperson Andy Stone said in a reply to Wong’s post and others that the issue was now fixed. It’s unclear how many Instagram users had their accounts improperly accessed.

It’s not that easy. Probably this particular tactic is now blocked. But there are others, many others, and they cannot be blocked as a class. The real problem is that LLM chatbots are not trustworthy enough for this application.

Another news article.

Posted on June 4, 2026 at 7:04 AMView Comments

How Dangerous Is Anthropic’s Mythos AI?

Last month, Anthropic made a remarkable announcement about its new model, Claude Mythos Preview: it was so good at finding security vulnerabilities in software that the company would not release it to the general public. Instead, it would only be available to a select group of companies to scan and fix their own software.

The announcement requires context—but it contained an essential truth.

While Anthropic’s model is really good at finding software vulnerabilities, so are other models. The UK’s AI Security Institute found that OpenAI’s GPT-5.5, already generally available, is comparable in capability. The company Aisle reproduced Anthropic’s published results with smaller, cheaper models.

At the same time, Anthropic’s refusal to publicly release its new model makes a virtue out of necessity. Mythos is very expensive to run, and the company doesn’t appear to have the resources for a general release. What better way to juice the company’s valuation than to hint at capabilities but not prove them, and then have others parrot their claims?

Nonetheless, the truth is scary. Modern generative AI systems—not just Anthropic’s, but OpenAI’s and other, open-source models—are getting really good at finding and exploiting vulnerabilities in software. And that has important ramifications for cybersecurity: on both the offense and the defense.

Attackers will use these capabilities to find, and automatically hack, vulnerabilities in systems of all kinds. They will be able to break into critical systems around the world, sometimes to plant ransomware and make money, sometimes to steal data for espionage purposes, and sometimes to control systems in times of hostility. This will make the world a much more dangerous, and more volatile, place.

But at the same time, defenders will use these same capabilities to find, and then patch, many of those same systems. For example, Mozilla used Mythos to find 271 vulnerabilities in Firefox. Those vulnerabilities have been fixed, and will never again be available to attackers. In the future, AIs automatically finding and fixing vulnerabilities in all software will be a normal part of the development process, which will result in much more secure software.

Of course, it’s not that simple. We should expect a deluge of both attackers using newly found vulnerabilities to break into systems, and at the same time much more frequent software updates for every app and device we use. But lots of systems aren’t patchable, and many systems that are don’t get patched, meaning that many vulnerabilities will stick around. And it does seem that finding and exploiting is easier than finding and fixing. All of this points to a more dangerous short-term future. Organizations will need to adapt their security to this new reality.

But it’s the long term that we need to focus on. Mythos isn’t unique, but it’s more capable than many models that have come before. And it’s less capable than models that will come after. AIs are much better at writing software than they were just six months ago. There’s every reason to believe that they will continue to get better, which means that they will get better at writing more secure software. The endgame gives AI-enhanced defenders advantages over AI-enhanced attackers.

Even more interesting are the broader implications. The same searching, pattern-matching and reasoning capabilities that make these models so good at analyzing software almost certainly apply to similar systems. The tax code isn’t computer code, but it’s a series of algorithms with inputs and outputs. It has vulnerabilities; we call them tax loopholes. It has exploits; we call them tax avoidance strategies. And it has black hat hackers: attorneys and accountants.

Just as these models are finding hundreds of vulnerabilities in complex software systems, we should expect them to be equally effective at finding many new and undiscovered tax loopholes. I am confident that the major investment banks are working on this right now, in secret. They’ve fed AI the tax code of the US, or the UK, or maybe every industrialized country, and tasked the system with looking for money-saving strategies. How many tax loopholes will those AIs find? Ten? One hundred? One thousand? The Double Dutch Irish Sandwich is a tax loophole that involves multiple different tax jurisdictions. Can AIs find loopholes even more complex? We have no idea.

Sure, the AIs will come up with a bunch of tricks that won’t work, but that’s where those attorneys and accountants come in—to verify, and then justify, the loopholes. And then to market them to their wealthy clients.

As goes the tax code, so goes any other complex system of rules and strategies. These models could be tasked with finding loopholes in environmental rules, or food and safety rules—anywhere there are complex regulatory systems and powerful people who want to evade those rules.

The results will be much worse than insecure computers. Tax loopholes result in less revenue collected by governments, and regulatory loopholes allow the powerful to skirt the rules, both of which have all sorts of social ramifications. And while software vendors can patch their systems in days, it generally takes years for a country to amend its tax code. And that process is political, with lobbyists pressuring legislators not to patch. Just look at the carried interest loophole, a US tax dodge that has been exploited for decades. Various administrations have tried to close the vulnerability, but legislators just can’t seem to resist lobbyists long enough to patch it.

AI technologies are poised to remake much of society. Just as the industrial revolution gave humans the ability to consume calories outside of their bodies at scale, the AI revolution will give humans the ability to perform cognitive tasks outside of their bodies at scale. Our systems aren’t designed for that; they’re designed for more human paces of cognition. We’re seeing it right now in the deluge of software vulnerabilities that these models are finding and exploiting. And we will soon see it in a deluge of vulnerabilities in all sorts of other systems of rules. Adapting to this new reality will be hard, but we don’t have any choice.

This essay originally appeared in The Guardian.

Posted on May 14, 2026 at 7:04 AMView Comments

What Anthropic’s Mythos Means for the Future of Cybersecurity

Two weeks ago, Anthropic announced that its new model, Claude Mythos Preview, can autonomously find and weaponize software vulnerabilities, turning them into working exploits without expert guidance. These were vulnerabilities in key software like operating systems and internet infrastructure that thousands of software developers working on those systems failed to find. This capability will have major security implications, compromising the devices and services we use every day. As a result, Anthropic is not releasing the model to the general public, but instead to a limited number of companies.

The news rocked the internet security community. There were few details in Anthropic’s announcement, angering many observers. Some speculate that Anthropic doesn’t have the GPUs to run the thing, and that cybersecurity was the excuse to limit its release. Others argue Anthropic is holding to its AI safety mission. There’s hype and counterhype, reality and marketing. It’s a lot to sort out, even if you’re an expert.

We see Mythos as a real but incremental step, one in a long line of incremental steps. But even incremental steps can be important when we look at the big picture.

How AI Is Changing Cybersecurity

We’ve written about shifting baseline syndrome, a phenomenon that leads people—the public and experts alike—to discount massive long-term changes that are hidden in incremental steps. It has happened with online privacy, and it’s happening with AI. Even if the vulnerabilities found by Mythos could have been found using AI models from last month or last year, they couldn’t have been found by AI models from five years ago.

The Mythos announcement reminds us that AI has come a long way in just a few years: The baseline really has shifted. Finding vulnerabilities in source code is the type of task that today’s large language models excel at. Regardless of whether it happened last year or will happen next year, it’s been clear for a while this kind of capability was coming soon. The question is how we adapt to it.

We don’t believe that an AI that can hack autonomously will create permanent asymmetry between offense and defense; it’s likely to be more nuanced than that. Some vulnerabilities can be found, verified, and patched automatically. Some vulnerabilities will be hard to find but easy to verify and patch—consider generic cloud-hosted web applications built on standard software stacks, where updates can be deployed quickly. Still others will be easy to find (even without powerful AI) and relatively easy to verify, but harder or impossible to patch, such as IoT appliances and industrial equipment that are rarely updated or can’t be easily modified.

Then there are systems whose vulnerabilities will be easy to find in code but difficult to verify in practice. For example, complex distributed systems and cloud platforms can be composed of thousands of interacting services running in parallel, making it difficult to distinguish real vulnerabilities from false positives and to reliably reproduce them.

So we must separate the patchable from the unpatchable, and the easy to verify from the hard to verify. This taxonomy also provides us guidance for how to protect such systems in an era of powerful AI vulnerability-finding tools.

Unpatchable or hard to verify systems should be protected by wrapping them in more restrictive, tightly controlled layers. You want your fridge or thermostat or industrial control system behind a restrictive and constantly updated firewall, not freely talking to the internet.

Distributed systems that are fundamentally interconnected should be traceable and should follow the principle of least privilege, where each component has only the access it needs. These are bog-standard security ideas that we might have been tempted to throw out in the era of AI, but they’re still as relevant as ever.

Rethinking Software Security Practices

This also raises the salience of best practices in software engineering. Automated, thorough, and continuous testing was always important. Now we can take this practice a step further and use defensive AI agents to test exploits against a real stack, over and over, until the false positives have been weeded out and the real vulnerabilities and fixes are confirmed. This kind of VulnOps is likely to become a standard part of the development process.

Documentation becomes more valuable, as it can guide an AI agent on a bug-finding mission just as it does developers. And following standard practices and using standard tools and libraries allows AI and engineers alike to recognize patterns more effectively, even in a world of individual and ephemeral instant software—code that can be generated and deployed on demand.

Will this favor offense or defense? The defense eventually, probably, especially in systems that are easy to patch and verify. Fortunately, that includes our phones, web browsers, and major internet services. But today’s cars, electrical transformers, fridges, and lampposts are connected to the internet. Legacy banking and airline systems are networked.

Not all of those are going to get patched as fast as needed, and we may see a few years of constant hacks until we arrive at a new normal: where verification is paramount and software is patched continuously.

This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.

Posted on April 28, 2026 at 7:06 AMView Comments

Mythos and Cybersecurity

Last week, Anthropic pulled back the curtain on Claude Mythos Preview, an AI model so capable at finding and exploiting software vulnerabilities that the company decided it was too dangerous to release to the public. Instead, access has been restricted to roughly 50 organizations—Microsoft, Apple, Amazon Web Services, CrowdStrike and other vendors of critical infrastructure—under an initiative called Project Glasswing.

The announcement was accompanied by a barrage of hair-raising anecdotes: thousands of vulnerabilities uncovered across every major operating system and browser, including a 27-year-old bug in OpenBSD, a 16-year-old flaw in FFmpeg. Mythos was able to weaponize a set of vulnerabilities it found in the Firefox browser into 181 usable attacks; Anthropic’s previous flagship model could only achieve two.

This is, in many respects, exactly the kind of responsible disclosure that security researchers have long urged. And yet the public has been given remarkably little with which to evaluate Anthropic’s decision. We have been shown a highlight reel of spectacular successes. However, we can’t tell if we have a blockbuster until they let us see the whole movie.

For example, we don’t know how many times Mythos mistakenly flagged code as vulnerable. Anthropic said security contractors agreed with the AI’s severity rating 198 times, with an 89 per cent severity agreement. That’s impressive, but incomplete. Independent researchers examining similar models have found that AI that detects nearly every real bug also hallucinates plausible-sounding vulnerabilities in patched, correct code.

This matters. A model that autonomously finds and exploits hundreds of vulnerabilities with inhuman precision is a game changer, but a model that generates thousands of false alarms and non-working attacks still needs skilled and knowledgeable humans. Without knowing the rate of false alarms in Mythos’s unfiltered output, we cannot tell whether the examples showcased are representative.

There is a second, subtler problem. Large language models, including Mythos, perform best on inputs that resemble what they were trained on: widely used open-source projects, major browsers, the Linux kernel and popular web frameworks. Concentrating early access among the largest vendors of precisely this software is sensible; it lets them patch first, before adversaries catch up.

But the inverse is also true. Software outside the training distribution—industrial control systems, medical device firmware, bespoke financial infrastructure, regional banking software, older embedded systems—is exactly where out-of-the-box Mythos is likely least able to find or exploit bugs.

However, a sufficiently motivated attacker with domain expertise in one of these fields could nevertheless wield Mythos’s advanced reasoning capabilities as a force multiplier, probing systems that Anthropic’s own engineers lack the specialized knowledge to audit. The danger is not that Mythos fails in those domains; it is that Mythos may succeed for whoever brings the expertise.

Broader, structured access for academic researchers and domain specialists—cardiologists’ partners in medical device security, control-systems engineers, researchers in less prominent languages and ecosystems—would meaningfully reduce this asymmetry. Fifty companies, however well chosen, cannot substitute for the distributed expertise of the entire research community.

None of this is an indictment of Anthropic. By all appearances the company is trying to act responsibly, and its decision to hold the model back is evidence of seriousness.

But Anthropic is a private company and, in some ways, still a start-up. Yet it is making unilateral decisions about which pieces of our critical global infrastructure get defended first, and which must wait their turn.

It has finite staff, finite budget and finite expertise. It will miss things, and when the thing missed is in the software running a hospital or a power grid, the cost will be borne by people who never had a say.

The security problem is far greater than one company and one model. There’s no reason to believe that Mythos Preview is unique. (Not to be outdone, OpenAI announced that its new GPT-5.4-Cyber is so dangerous that the model also will not be released to the general public.) And it’s unclear how much of an advance these new models represent. The security company Aisle was able to replicate many of Anthropic’s published anecdotes using smaller, cheaper, public AI models.

Any decisions we make about whether and how to release these powerful models are more than one company’s responsibility. Ultimately, this will probably lead to regulation. That will be hard to get right and requires a long process of consultation and feedback.

In the short term, we need something simpler: greater transparency and information sharing with the broader community. This doesn’t necessarily mean making powerful models like Claude Mythos widely available. Rather, it means sharing as much data and information as possible, so that we can collectively make informed decisions.

We need globally co-ordinated frameworks for independent auditing, mandatory disclosure of aggregate performance metrics and funded access for academic and civil-society researchers.

This has implications for national security, personal safety and corporate competitiveness. Any technology that can find thousands of exploitable flaws in the systems we all depend on should not be governed solely by the internal judgment of its creators, however well intentioned.

Until that changes, each Mythos-class release will put the world at the edge of another precipice, without any visibility into whether there is a landing out of view just below, or whether this time the drop will be fatal. That is not a choice a for-profit corporation should be allowed to make in a democratic society. Nor should such a company be able to restrict the ability of society to make choices about its own security.

This essay was written with David Lie, and originally appeared in The Globe and Mail.

Posted on April 17, 2026 at 7:02 AMView Comments

Human Trust of AI Agents

Interesting research: “Humans expect rationality and cooperation from LLM opponents in strategic games.”

Abstract: As Large Language Models (LLMs) integrate into our social and economic interactions, we need to deepen our understanding of how humans respond to LLMs opponents in strategic settings. We present the results of the first controlled monetarily-incentivised laboratory experiment looking at differences in human behaviour in a multi-player p-beauty contest against other humans and LLMs. We use a within-subject design in order to compare behaviour at the individual level. We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans, which is mainly driven by the increased prevalence of ‘zero’ Nash-equilibrium choices. This shift is mainly driven by subjects with high strategic reasoning ability. Subjects who play the zero Nash-equilibrium choice motivate their strategy by appealing to perceived LLM’s reasoning ability and, unexpectedly, propensity towards cooperation. Our findings provide foundational insights into the multi-player human-LLM interaction in simultaneous choice games, uncover heterogeneities in both subjects’ behaviour and beliefs about LLM’s play when playing against them, and suggest important implications for mechanism design in mixed human-LLM systems.

Posted on April 16, 2026 at 5:41 AMView Comments

1 2 3 12

Sidebar photo of Bruce Schneier by Joe MacInnis.