Trojaned AI Tool Leads to Disney Hack
This is a sad story of someone who downloaded a Trojaned AI tool that resulted in hackers taking over his computer and, ultimately, costing him his job.
Page 12 of 28
This is a sad story of someone who downloaded a Trojaned AI tool that resulted in hackers taking over his computer and, ultimately, costing him his job.
Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“:
Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment.
In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger.
It’s important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.
The emergent properties of LLMs are so, so weird.
These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.
Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.
In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.
Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.
Here’s the paper.
Interesting research: “How to Securely Implement Cryptography in Deep Neural Networks.”
Abstract: The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g, to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output). The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. This discrepancy between the discrete and continuous computational models raises the question of what is the best way to implement standard cryptographic primitives as DNNs, and whether DNN implementations of secure cryptosystems remain secure in the new setting, in which an attacker can ask the DNN to process a message whose “bits” are arbitrary real numbers.
In this paper we lay the foundations of this new theory, defining the meaning of correctness and security for implementations of cryptographic primitives as ReLU-based DNNs. We then show that the natural implementations of block ciphers as DNNs can be broken in linear time by using such nonstandard inputs. We tested our attack in the case of full round AES-128, and had success rate in finding randomly chosen keys. Finally, we develop a new method for implementing any desired cryptographic functionality as a standard ReLU-based DNN in a provably secure and correct way. Our protective technique has very low overhead (a constant number of additional layers and a linear number of additional neurons), and is completely practical.
Donald Trump and Elon Musk’s chaotic approach to reform is upending government operations. Critical functions have been halted, tens of thousands of federal staffers are being encouraged to resign, and congressional mandates are being disregarded. The next phase: The Department of Government Efficiency reportedly wants to use AI to cut costs. According to The Washington Post, Musk’s group has started to run sensitive data from government systems through AI programs to analyze spending and determine what could be pruned. This may lead to the elimination of human jobs in favor of automation. As one government official who has been tracking Musk’s DOGE team told the Post, the ultimate aim is to use AI to replace “the human workforce with machines.” (Spokespeople for the White House and DOGE did not respond to requests for comment.)
Using AI to make government more efficient is a worthy pursuit, and this is not a new idea. The Biden administration disclosed more than 2,000 AI applications in development across the federal government. For example, FEMA has started using AI to help perform damage assessment in disaster areas. The Centers for Medicare and Medicaid Services has started using AI to look for fraudulent billing. The idea of replacing dedicated and principled civil servants with AI agents, however, is new—and complicated.
The civil service—the massive cadre of employees who operate government agencies—plays a vital role in translating laws and policy into the operation of society. New presidents can issue sweeping executive orders, but they often have no real effect until they actually change the behavior of public servants. Whether you think of these people as essential and inspiring do-gooders, boring bureaucratic functionaries, or as agents of a “deep state,” their sheer number and continuity act as ballast that resists institutional change.
This is why Trump and Musk’s actions are so significant. The more AI decision making is integrated into government, the easier change will be. If human workers are widely replaced with AI, executives will have unilateral authority to instantaneously alter the behavior of the government, profoundly raising the stakes for transitions of power in democracy. Trump’s unprecedented purge of the civil service might be the last time a president needs to replace the human beings in government in order to dictate its new functions. Future leaders may do so at the press of a button.
To be clear, the use of AI by the executive branch doesn’t have to be disastrous. In theory, it could allow new leadership to swiftly implement the wishes of its electorate. But this could go very badly in the hands of an authoritarian leader. AI systems concentrate power at the top, so they could allow an executive to effectuate change over sprawling bureaucracies instantaneously. Firing and replacing tens of thousands of human bureaucrats is a huge undertaking. Swapping one AI out for another, or modifying the rules that those AIs operate by, would be much simpler.
Social-welfare programs, if automated with AI, could be redirected to systematically benefit one group and disadvantage another with a single prompt change. Immigration-enforcement agencies could prioritize people for investigation and detainment with one instruction. Regulatory-enforcement agencies that monitor corporate behavior for malfeasance could turn their attention to, or away from, any given company on a whim.
Even if Congress were motivated to fight back against Trump and Musk, or against a future president seeking to bulldoze the will of the legislature, the absolute power to command AI agents would make it easier to subvert legislative intent. AI has the power to diminish representative politics. Written law is never fully determinative of the actions of government—there is always wiggle room for presidents, appointed leaders, and civil servants to exercise their own judgment. Whether intentional or not, whether charitably or not, each of these actors uses discretion. In human systems, that discretion is widely distributed across many individuals—people who, in the case of career civil servants, usually outlast presidencies.
Today, the AI ecosystem is dominated by a small number of corporations that decide how the most widely used AI models are designed, which data they are trained on, and which instructions they follow. Because their work is largely secretive and unaccountable to public interest, these tech companies are capable of making changes to the bias of AI systems—either generally or with aim at specific governmental use cases—that are invisible to the rest of us. And these private actors are both vulnerable to coercion by political leaders and self-interested in appealing to their favor. Musk himself created and funded xAI, now one of the world’s largest AI labs, with an explicitly ideological mandate to generate anti-“woke” AI and steer the wider AI industry in a similar direction.
But there’s a second way that AI’s transformation of government could go. AI development could happen inside of transparent and accountable public institutions, alongside its continued development by Big Tech. Applications of AI in democratic governments could be focused on benefitting public servants and the communities they serve by, for example, making it easier for non-English speakers to access government services, making ministerial tasks such as processing routine applications more efficient and reducing backlogs, or helping constituents weigh in on the policies deliberated by their representatives. Such AI integrations should be done gradually and carefully, with public oversight for their design and implementation and monitoring and guardrails to avoid unacceptable bias and harm.
Governments around the world are demonstrating how this could be done, though it’s early days. Taiwan has pioneered the use of AI models to facilitate deliberative democracy at an unprecedented scale. Singapore has been a leader in the development of public AI models, built transparently and with public-service use cases in mind. Canada has illustrated the role of disclosure and public input on the consideration of AI use cases in government. Even if you do not trust the current White House to follow any of these examples, U.S. states—which have much greater contact and influence over the daily lives of Americans than the federal government—could lead the way on this kind of responsible development and deployment of AI.
As the political theorist David Runciman has written, AI is just another in a long line of artificial “machines” used to govern how people live and act, not unlike corporations and states before it. AI doesn’t replace those older institutions, but it changes how they function. As the Trump administration forges stronger ties to Big Tech and AI developers, we need to recognize the potential of that partnership to steer the future of democratic governance—and act to make sure that it does not enable future authoritarians.
This essay was written with Nathan E. Sanders, and originally appeared in The Atlantic.
Most people know that robots no longer sound like tinny trash cans. They sound like Siri, Alexa, and Gemini. They sound like the voices in labyrinthine customer support phone trees. And even those robot voices are being made obsolete by new AI-generated voices that can mimic every vocal nuance and tic of human speech, down to specific regional accents. And with just a few seconds of audio, AI can now clone someone’s specific voice.
This technology will replace humans in many areas. Automated customer support will save money by cutting staffing at call centers. AI agents will make calls on our behalf, conversing with others in natural language. All of that is happening, and will be commonplace soon.
But there is something fundamentally different about talking with a bot as opposed to a person. A person can be a friend. An AI cannot be a friend, despite how people might treat it or react to it. AI is at best a tool, and at worst a means of manipulation. Humans need to know whether we’re talking with a living, breathing person or a robot with an agenda set by the person who controls it. That’s why robots should sound like robots.
You can’t just label AI-generated speech. It will come in many different forms. So we need a way to recognize AI that works no matter the modality. It needs to work for long or short snippets of audio, even just a second long. It needs to work for any language, and in any cultural context. At the same time, we shouldn’t constrain the underlying system’s sophistication or language complexity.
We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic. Over the last few decades, we have become accustomed to robotic voices, simply because text-to-speech systems were good enough to produce intelligible speech that was not human-like in its sound. Now we can use that same technology to make robotic speech that is indistinguishable from human sound robotic again.
A ring modulator has several advantages: It is computationally simple, can be applied in real-time, does not affect the intelligibility of the voice, and—most importantly—is universally “robotic sounding” because of its historical usage for depicting robots.
Responsible AI companies that provide voice synthesis or AI voice assistants in any form should add a ring modulator of some standard frequency (say, between 30-80 Hz) and of a minimum amplitude (say, 20 percent). That’s it. People will catch on quickly.
Here are a couple of examples you can listen to for examples of what we’re suggesting. The first clip is an AI-generated “podcast” of this article made by Google’s NotebookLM featuring two AI “hosts.” Google’s NotebookLM created the podcast script and audio given only the text of this article. The next two clips feature that same podcast with the AIs’ voices modulated more and less subtly by a ring modulator:
We were able to generate the audio effect with a 50-line Python script generated by Anthropic’s Claude. One of the most well-known robot voices were those of the Daleks from Doctor Who in the 1960s. Back then robot voices were difficult to synthesize, so the audio was actually an actor’s voice run through a ring modulator. It was set to around 30 Hz, as we did in our example, with different modulation depth (amplitude) depending on how strong the robotic effect is meant to be. Our expectation is that the AI industry will test and converge on a good balance of such parameters and settings, and will use better tools than a 50-line Python script, but this highlights how simple it is to achieve.
Of course there will also be nefarious uses of AI voices. Scams that use voice cloning have been getting easier every year, but they’ve been possible for many years with the right know-how. Just like we’re learning that we can no longer trust images and videos we see because they could easily have been AI-generated, we will all soon learn that someone who sounds like a family member urgently requesting money may just be a scammer using a voice-cloning tool.
We don’t expect scammers to follow our proposal: They’ll find a way no matter what. But that’s always true of security standards, and a rising tide lifts all boats. We think the bulk of the uses will be with popular voice APIs from major companies—and everyone should know that they’re talking with a robot.
This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.
Microsoft’s AI Red Team just published “Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful:
- Understand what the system can do and where it is applied.
- You don’t have to compute gradients to break an AI system.
- AI red teaming is not safety benchmarking.
- Automation can help cover more of the risk landscape.
- The human element of AI red teaming is crucial.
- Responsible AI harms are pervasive but difficult to measure.
- LLMs amplify existing security risks and introduce new ones.
- The work of securing AI systems will never be complete.
Interesting analysis:
We analyzed every instance of AI use in elections collected by the WIRED AI Elections Project (source for our analysis), which tracked known uses of AI for creating political content during elections taking place in 2024 worldwide. In each case, we identified what AI was used for and estimated the cost of creating similar content without AI.
We find that (1) half of AI use isn’t deceptive, (2) deceptive content produced using AI is nevertheless cheap to replicate without AI, and (3) focusing on the demand for misinformation rather than the supply is a much more effective way to diagnose problems and identify interventions.
This tracks with my analysis. People share as a form of social signaling. I send you a meme/article/clipping/photo to show that we are on the same team. Whether it is true, or misinformation, or actual propaganda, is of secondary importance. Sometimes it’s completely irrelevant. This is why fact checking doesn’t work. This is why “cheap fakes”—obviously fake photos and videos—are effective. This is why, as the authors of that analysis said, the demand side is the real problem.
Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative bodies—all it takes is one legislator, or legislative assistant, to use generative AI in the process of drafting a bill.
In fact, the use of AI by legislators is only likely to become more prevalent. There are currently projects in the US House, US Senate, and legislatures around the world to trial the use of AI in various ways: searching databases, drafting text, summarizing meetings, performing policy research and analysis, and more. A Brazilian municipality passed the first known AI-written law in 2023.
That’s not surprising; AI is being used more everywhere. What is coming into focus is how policymakers will use AI and, critically, how this use will change the balance of power between the legislative and executive branches of government. Soon, US legislators may turn to AI to help them keep pace with the increasing complexity of their lawmaking—and this will suppress the power and discretion of the executive branch to make policy.
Legislators are writing increasingly long, intricate, and complicated laws that human legislative drafters have trouble producing. Already in the US, the multibillion-dollar lobbying industry is subsidizing lawmakers in writing baroque laws: suggesting paragraphs to add to bills, specifying benefits for some, carving out exceptions for others. Indeed, the lobbying industry is growing in complexity and influence worldwide.
Several years ago, researchers studied bills introduced into state legislatures throughout the US, looking at which bills were wholly original texts and which borrowed text from other states or from lobbyist-written model legislation. Their conclusion was not very surprising. Those who borrowed the most text were in legislatures that were less resourced. This makes sense: If you’re a part-time legislator, perhaps unpaid and without a lot of staff, you need to rely on more external support to draft legislation. When the scope of policymaking outstrips the resources of legislators, they look for help. Today, that often means lobbyists, who provide expertise, research services, and drafting labor to legislators at the local, state, and federal levels at no charge. Of course, they are not unbiased: They seek to exert influence on behalf of their clients.
Another study, at the US federal level, measured the complexity of policies proposed in legislation and tried to determine the factors that led to such growing complexity. While there are numerous ways to measure legal complexity, these authors focused on the specificity of institutional design: How exacting is Congress in laying out the relational network of branches, agencies, and officials that will share power to implement the policy?
In looking at bills enacted between 1993 and 2014, the researchers found two things. First, they concluded that ideological polarization drives complexity. The suggestion is that if a legislator is on the extreme end of the ideological spectrum, they’re more likely to introduce a complex law that constrains the discretion of, as the authors put it, “entrenched bureaucratic interests.” And second, they found that divided government drives complexity to a large degree: Significant legislation passed under divided government was found to be 65 percent more complex than similar legislation passed under unified government. Their conclusion is that, if a legislator’s party controls Congress, and the opposing party controls the White House, the legislator will want to give the executive as little wiggle room as possible. When legislators’ preferences disagree with the executive’s, the legislature is incentivized to write laws that specify all the details. This gives the agency designated to implement the law as little discretion as possible.
Because polarization and divided government are increasingly entrenched in the US, the demand for complex legislation at the federal level is likely to grow. Today, we have both the greatest ideological polarization in Congress in living memory and an increasingly divided government at the federal level. Between 1900 and 1970 (57th through 90th Congresses), we had 27 instances of unified government and only seven divided; nearly a four-to-one ratio. Since then, the trend is roughly the opposite. As of the start of the next Congress, we will have had 20 divided governments and only eight unified (nearly a three-to-one ratio). And while the incoming Trump administration will see a unified government, the extremely closely divided House may often make this Congress look and feel like a divided one (see the recent government shutdown crisis as an exemplar) and makes truly divided government a strong possibility in 2027.
Another related factor driving the complexity of legislation is the need to do it all at once. The lobbyist feeding frenzy—spurring major bills like the Affordable Care Act to be thousands of pages in length—is driven in part by gridlock in Congress. Congressional productivity has dropped so low that bills on any given policy issue seem like a once-in-a-generation opportunity for legislators—and lobbyists—to set policy.
These dynamics also impact the states. States often have divided governments, albeit less often than they used to, and their demand for drafting assistance is arguably higher due to their significantly smaller staffs. And since the productivity of Congress has cratered in recent years, significantly more policymaking is happening at the state level.
But there’s another reason, particular to the US federal government, that will likely force congressional legislation to be more complex even during unified government. In June 2024, the US Supreme Court overturned the Chevron doctrine, which gave executive agencies broad power to specify and implement legislation. Suddenly, there is a mandate from the Supreme Court for more specific legislation. Issues that have historically been left implicitly to the executive branch are now required to be either explicitly delegated to agencies or specified directly in statute. Either way, the Court’s ruling implied that law should become more complex and that Congress should increase its policymaking capacity.
This affects the balance of power between the executive and legislative branches of government. When the legislature delegates less to the executive branch, it increases its own power. Every decision made explicitly in statute is a decision the executive makes not on its own but, rather, according to the directive of the legislature. In the US system of separation of powers, administrative law is a tool for balancing power among the legislative, executive, and judicial branches. The legislature gets to decide when to delegate and when not to, and it can respond to judicial review to adjust its delegation of control as needed. The elimination of Chevron will induce the legislature to exert its control over delegation more robustly.
At the same time, there are powerful political incentives for Congress to be vague and to rely on someone else, like agency bureaucrats, to make hard decisions. That empowers third parties—the corporations, or lobbyists—that have been gifted by the overturning of Chevron a new tool in arguing against administrative regulations not specifically backed up by law. A continuing stream of Supreme Court decisions handing victories to unpopular industries could be another driver of complex law, adding political pressure to pass legislative fixes.
Congress may or may not be up to the challenge of putting more policy details into law, but the external forces outlined above—lobbyists, the judiciary, and an increasingly divided and polarized government—are pushing them to do so. When Congress does take on the task of writing complex legislation, it’s quite likely it will turn to AI for help.
Two particular AI capabilities enable Congress to write laws different from laws humans tend to write. One, AI models have an enormous scope of expertise, whereas people have only a handful of specializations. Large language models (LLMs) like the one powering ChatGPT can generate legislative text on funding specialty crop harvesting mechanization equally as well as material on energy efficiency standards for street lighting. This enables a legislator to address more topics simultaneously. Two, AI models have the sophistication to work with a higher degree of complexity than people can. Modern LLM systems can instantaneously perform several simultaneous multistep reasoning tasks using information from thousands of pages of documents. This enables a legislator to fill in more baroque detail on any given topic.
That’s not to say that handing over legislative drafting to machines is easily done. Modernizing any institutional process is extremely hard, even when the technology is readily available and performant. And modern AI still has a ways to go to achieve mastery of complex legal and policy issues. But the basic tools are there.
AI can be used in each step of lawmaking, and this will bring various benefits to policymakers. It could let them work on more policies—more bills—at the same time, add more detail and specificity to each bill, or interpret and incorporate more feedback from constituents and outside groups. The addition of a single AI tool to a legislative office may have an impact similar to adding several people to their staff, but with far lower cost.
Speed sometimes matters when writing law. When there is a change of governing party, there is often a rush to change as much policy as possible to match the platform of the new regime. AI could help legislators do that kind of wholesale revision. The result could be policy that is more responsive to voters—or more political instability. Already in 2024, the US House’s Office of the Clerk has begun using AI to speed up the process of producing cost estimates for bills and understanding how new legislation relates to existing code. Ohio has used an AI tool to do wholesale revision of state administrative law since 2020.
AI can also make laws clearer and more consistent. With their superhuman attention spans, AI tools are good at enforcing syntactic and grammatical rules. They will be effective at drafting text in precise and proper legislative language, or offering detailed feedback to human drafters. Borrowing ideas from software development, where coders use tools to identify common instances of bad programming practices, an AI reviewer can highlight bad law-writing practices. For example, it can detect when significant phrasing is inconsistent across a long bill. If a bill about insurance repeatedly lists a variety of disaster categories, but leaves one out one time, AI can catch that.
Perhaps this seems like minutiae, but a small ambiguity or mistake in law can have massive consequences. In 2015, the Affordable Care Act came close to being struck down because of a typo in four words, imperiling health care services extended to more than 7 million Americans.
There’s more that AI can do in the legislative process. AI can summarize bills and answer questions about their provisions. It can highlight aspects of a bill that align with, or are contrary to, different political points of view. We can even imagine a future in which AI can be used to simulate a new law and determine whether or not it would be effective, or what the side effects would be. This means that beyond writing them, AI could help lawmakers understand laws. Congress is notorious for producing bills hundreds of pages long, and many other countries sometimes have similarly massive omnibus bills that address many issues at once. It’s impossible for any one person to understand how each of these bills’ provisions would work. Many legislatures employ human analysis in budget or fiscal offices that analyze these bills and offer reports. AI could do this kind of work at greater speed and scale, so legislators could easily query an AI tool about how a particular bill would affect their district or areas of concern.
This is a use case that the House subcommittee on modernization has urged the Library of Congress to take action on. Numerous software vendors are already marketing AI legislative analysis tools. These tools can potentially find loopholes or, like the human lobbyists of today, craft them to benefit particular private interests.
These capabilities will be attractive to legislators who are looking to expand their power and capabilities but don’t necessarily have more funding to hire human staff. We should understand the idea of AI-augmented lawmaking contextualized within the longer history of legislative technologies. To serve society at modern scales, we’ve had to come a long way from the Athenian ideals of direct democracy and sortition. Democracy no longer involves just one person and one vote to decide a policy. It involves hundreds of thousands of constituents electing one representative, who is augmented by a staff as well as subsidized by lobbyists, and who implements policy through a vast administrative state coordinated by digital technologies. Using AI to help those representatives specify and refine their policy ideas is part of a long history of transformation.
Whether all this AI augmentation is good for all of us subject to the laws they make is less clear. There are real risks to AI-written law, but those risks are not dramatically different from what we endure today. AI-written law trying to optimize for certain policy outcomes may get it wrong (just as many human-written laws are misguided). AI-written law may be manipulated to benefit one constituency over others, by the tech companies that develop the AI, or by the legislators who apply it, just as human lobbyists steer policy to benefit their clients.
Regardless of what anyone thinks of any of this, regardless of whether it will be a net positive or a net negative, AI-made legislation is coming—the growing complexity of policy demands it. It doesn’t require any changes in legislative procedures or agreement from any rules committee. All it takes is for one legislative assistant, or lobbyist, to fire up a chatbot and ask it to create a draft. When legislators voted on that Brazilian bill in 2023, they didn’t know it was AI-written; the use of ChatGPT was undisclosed. And even if they had known, it’s not clear it would have made a difference. In the future, as in the past, we won’t always know which laws will have good impacts and which will have bad effects, regardless of the words on the page, or who (or what) wrote them.
This essay was written with Nathan E. Sanders, and originally appeared in Lawfare.
Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.
Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes.
Humanity is now rapidly integrating a wholly different kind of mistake-maker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. It seems ridiculous when chatbots tell you to eat rocks or add glue to pizza. But it’s not the frequency or severity of AI systems’ mistakes that differentiates them from human mistakes. It’s their weirdness. AI systems do not make mistakes in the same ways that humans do.
Much of the friction—and risk—associated with our use of AI arise from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.
Life experience makes it fairly easy for each of us to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane, predictably depending on factors such as fatigue and distraction. And mistakes are often accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.
To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.
AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats.
And AI mistakes aren’t accompanied by ignorance. A LLM will be just as confident when saying something completely wrong—and obviously so, to a human—as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it’s not enough to see that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.
This situation indicates two possible areas of research. The first is to engineer LLMs that make more human-like mistakes. The second is to build new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.
We already have some tools to lead LLMs to act in more human-like ways. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals and motivations of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is (figuratively) rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make more human-like mistakes, particularly by penalizing them more for mistakes that are less intelligible.
When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible, but truly ridiculous, explanations for their flights from reason.
Other mistake mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated in the way that humans do, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its multiple responses. Humans won’t put up with that kind of annoying repetition, but machines will.
Researchers are still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more human-like than it first appears. Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.
LLMs also seem to have a bias towards repeating the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs, with machines spitting out the first thing that comes to mind rather than reasoning through the question. And like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they’re better able to remember facts from the beginning and end. There is already progress on improving this error mode, as researchers have found that LLMs trained on more examples of retrieving information from long texts seem to do better at retrieving information uniformly.
In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. For example, some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.
Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.
This essay was written with Nathan E. Sanders, and originally appeared in IEEE Spectrum.
EDITED TO ADD (1/24): Slashdot thread.
Sidebar photo of Bruce Schneier by Joe MacInnis.