Page 39

AI Mistakes Are Very Different from Human Mistakes

Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.

Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes.

Humanity is now rapidly integrating a wholly different kind of mistake-maker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. It seems ridiculous when chatbots tell you to eat rocks or add glue to pizza. But it’s not the frequency or severity of AI systems’ mistakes that differentiates them from human mistakes. It’s their weirdness. AI systems do not make mistakes in the same ways that humans do.

Much of the friction—and risk—associated with our use of AI arise from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.

Human Mistakes vs AI Mistakes

Life experience makes it fairly easy for each of us to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane, predictably depending on factors such as fatigue and distraction. And mistakes are often accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.

To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats.

And AI mistakes aren’t accompanied by ignorance. A LLM will be just as confident when saying something completely wrong—and obviously so, to a human—as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it’s not enough to see that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.

How to Deal with AI Mistakes

This situation indicates two possible areas of research. The first is to engineer LLMs that make more human-like mistakes. The second is to build new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.

We already have some tools to lead LLMs to act in more human-like ways. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals and motivations of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is (figuratively) rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make more human-like mistakes, particularly by penalizing them more for mistakes that are less intelligible.

When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible, but truly ridiculous, explanations for their flights from reason.

Other mistake mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated in the way that humans do, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its multiple responses. Humans won’t put up with that kind of annoying repetition, but machines will.

Understanding Similarities and Differences

Researchers are still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more human-like than it first appears. Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.

LLMs also seem to have a bias towards repeating the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs, with machines spitting out the first thing that comes to mind rather than reasoning through the question. And like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they’re better able to remember facts from the beginning and end. There is already progress on improving this error mode, as researchers have found that LLMs trained on more examples of retrieving information from long texts seem to do better at retrieving information uniformly.

In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. For example, some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.

This essay was written with Nathan E. Sanders, and originally appeared in IEEE Spectrum.

EDITED TO ADD (1/24): Slashdot thread.

Posted on January 21, 2025 at 7:02 AMView Comments

Biden Signs New Cybersecurity Order

President Biden has signed a new cybersecurity order. It has a bunch of provisions, most notably using the US government’s procurement power to improve cybersecurity practices industry-wide.

Some details:

The core of the executive order is an array of mandates for protecting government networks based on lessons learned from recent major incidents­—namely, the security failures of federal contractors.

The order requires software vendors to submit proof that they follow secure development practices, building on a mandate that debuted in 2022 in response to Biden’s first cyber executive order. The Cybersecurity and Infrastructure Security Agency would be tasked with double-checking these security attestations and working with vendors to fix any problems. To put some teeth behind the requirement, the White House’s Office of the National Cyber Director is “encouraged to refer attestations that fail validation to the Attorney General” for potential investigation and prosecution.

The order gives the Department of Commerce eight months to assess the most commonly used cyber practices in the business community and issue guidance based on them. Shortly thereafter, those practices would become mandatory for companies seeking to do business with the government. The directive also kicks off updates to the National Institute of Standards and Technology’s secure software development guidance.

More information.

Posted on January 20, 2025 at 7:06 AMView Comments

Friday Squid Blogging: Opioid Alternatives from Squid Research

Is there nothing that squid research can’t solve?

“If you’re working with an organism like squid that can edit genetic information way better than any other organism, then it makes sense that that might be useful for a therapeutic application like deadening pain,” he said.

[…]

Researchers hope to mimic how squid and octopus use RNA editing in nerve channels that interpret pain and use that knowledge to manipulate human cells.

Blog moderation policy.

Posted on January 17, 2025 at 5:02 PM

Social Engineering to Disable iMessage Protections

I am always interested in new phishing tricks, and watching them spread across the ecosystem.

A few days ago I started getting phishing SMS messages with a new twist. They were standard messages about delayed packages or somesuch, with the goal of getting me to click on a link and entering some personal information into a website. But because they came from unknown phone numbers, the links did not work. So—this is the new bit—the messages said something like: “Please reply Y, then exit the text message, reopen the text message activation link, or copy the link to Safari browser to open it.”

I saw it once, and now I am seeing it again and again. Everyone has now adopted this new trick.

One article claims that this trick has been popular since last summer. I don’t know; I would have expected to have seen it before last weekend.

Posted on January 17, 2025 at 7:05 AMView Comments

FBI Deletes PlugX Malware from Thousands of Computers

According to a DOJ press release, the FBI was able to delete the Chinese-used PlugX malware from “approximately 4,258 U.S.-based computers and networks.”

Details:

To retrieve information from and send commands to the hacked machines, the malware connects to a command-and-control server that is operated by the hacking group. According to the FBI, at least 45,000 IP addresses in the US had back-and-forths with the command-and-control server since September 2023.

It was that very server that allowed the FBI to finally kill this pesky bit of malicious software. First, they tapped the know-how of French intelligence agencies, which had recently discovered a technique for getting PlugX to self-destruct. Then, the FBI gained access to the hackers’ command-and-control server and used it to request all the IP addresses of machines that were actively infected by PlugX. Then it sent a command via the server that causes PlugX to delete itself from its victims’ computers.

Posted on January 16, 2025 at 7:03 AMView Comments

Upcoming Speaking Engagements

This is a current list of where and when I am scheduled to speak:

  • I’m speaking on “AI: Trust & Power” at Capricon 45 in Chicago, Illinois, USA, at 11:30 AM on February 7, 2025. I’m also signing books there on Saturday, February 8, starting at 1:45 PM.
  • I’m speaking at Boskone 62 in Boston, Massachusetts, USA, which runs from February 14-16, 2025.
  • I’m speaking at the Rossfest Symposium in Cambridge, UK, on March 25, 2025.

The list is maintained on this page.

Posted on January 14, 2025 at 12:05 PMView Comments

The First Password on the Internet

It was created in 1973 by Peter Kirstein:

So from the beginning I put password protection on my gateway. This had been done in such a way that even if UK users telephoned directly into the communications computer provided by Darpa in UCL, they would require a password.

In fact this was the first password on Arpanet. It proved invaluable in satisfying authorities on both sides of the Atlantic for the 15 years I ran the service ­ during which no security breach occurred over my link. I also put in place a system of governance that any UK users had to be approved by a committee which I chaired but which also had UK government and British Post Office representation.

I wish he’d told us what that password was.

Posted on January 14, 2025 at 7:00 AMView Comments

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Not sure this will matter in the end, but it’s a positive move:

Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content.

The foreign-based defendants developed tools specifically designed to bypass safety guardrails Microsoft has erected to prevent the creation of harmful content through its generative AI services, said Steven Masada, the assistant general counsel for Microsoft’s Digital Crimes Unit. They then compromised the legitimate accounts of paying customers. They combined those two things to create a fee-based platform people could use.

It was a sophisticated scheme:

The service contained a proxy server that relayed traffic between its customers and the servers providing Microsoft’s AI services, the suit alleged. Among other things, the proxy service used undocumented Microsoft network application programming interfaces (APIs) to communicate with the company’s Azure computers. The resulting requests were designed to mimic legitimate Azure OpenAPI Service API requests and used compromised API keys to authenticate them.

Slashdot thread.

Posted on January 13, 2025 at 7:01 AMView Comments

Friday Squid Blogging: Cotton-and-Squid-Bone Sponge

News:

A sponge made of cotton and squid bone that has absorbed about 99.9% of microplastics in water samples in China could provide an elusive answer to ubiquitous microplastic pollution in water across the globe, a new report suggests.

[…]

The study tested the material in an irrigation ditch, a lake, seawater and a pond, where it removed up to 99.9% of plastic. It addressed 95%-98% of plastic after five cycles, which the authors say is remarkable reusability.

The sponge is made from chitin extracted from squid bone and cotton cellulose, materials that are often used to address pollution. Cost, secondary pollution and technological complexities have stymied many other filtration systems, but large-scale production of the new material is possible because it is cheap, and raw materials are easy to obtain, the authors say.

Research paper.

Blog moderation policy.

Posted on January 10, 2025 at 5:06 PM

Sidebar photo of Bruce Schneier by Joe MacInnis.