Why AIs Will Become Hackers

At a 2022 RSA Conference keynote, technologist Bruce Schneier asserted that artificial intelligence agents will start to hack human systems—and what that will mean for us.

“Nice to see you all again,” Bruce Schneier told the audience at his keynote for the in-person return of RSA Conference, taking off his trademark cap. “It’s kinda neat. Kinda a little scary.” Schneier is a security technologist, researcher, and lecturer at Harvard Kennedy School. He has a long list of publications, including books from as early as 1993 and as recent as 2019’s We Have Root, with a new one launching in January 2023. But he’s best known for his long-running newsletter Crypto-Gram and blog Schneier on Security. And his upcoming book is about hacking.

To Schneier, hacking does not necessarily mean computer systems. “Think about the tax code,” he said. “It’s not computer code, but it’s code. It’s a series of algorithms with inputs and outputs.”

Because the tax code is a system, it can be hacked, Schneier said. “The tax code has vulnerabilities. We call them tax loopholes. The tax code has exploits. We call them tax avoidance strategies. And there’s an entire industry of black-hat hackers—we call them tax accountants and tax attorneys,” he added, to audience laughter. He defined hacking as “a clever, unintended exploitation of a system, which subverts the rules of the system at the expense of some other part of the system.” He noted that any system can be hacked, from the tax code to professional hockey, where a player—it’s contested just who—started using a curved stick to improve their ability to lift the puck. That player hacked the hockey system.

“Even the best-thought-out sets of rules will be incomplete or inconsistent,” Schneier said. “It’ll have ambiguity. It’ll have things the designers haven’t thought of. And as long as there are people who want to subvert the goals of the system, there will be hacks.

“What I want to talk about here is what happens when AIs start hacking.”

Rise of the Machines

When AIs start hacking human systems, Schneier said, the impact will be something completely new. “It won’t just be a difference in degree but a difference in kind, and it’ll culminate in AI systems hacking other AI systems and us humans being collateral damage,” he said, then paused. “So that’s a bit of hyperbole, probably my back-cover copy, but none of that requires any far-future science-fiction technology. I’m not postulating a singularity. I’m not assuming intelligent androids. I’m actually not even assuming evil intent on the part of anyone.

“The hacks I think about don’t even require major breakthroughs in AI. They’ll improve as AI gets more sophisticated, but we can see shadows of them in operation today. And the hacking will come naturally as AIs become more advanced in learning, understanding, and problem-solving.”

He traced the evolution of AI hackers using examples of competitions. Technically it’s the human developers who compete in events like DARPA’s 2016 Cyber Grand Challenge or China’s Robot Hacking Games, but the AIs operate autonomously once set into motion.

“We know how this goes, right?” he asked. “The AIs will improve in capability every year, and we humans stay about the same, and eventually the AIs surpass the humans.”

While he acknowledged that bad actors might set up AI systems to hack financial systems for profit or mayhem, Schneier also posited that an AI might hack human systems independently and without intent.

“[That] is more dangerous because we might never know it happened. And this is because of the explainability problem,” he said, “which I will now explain.”

Explaining the Explainability Problem

Schneier set up the discussion of explainability with a literary reference. In Douglas Adams’ Hitchhiker’s Guide to the Galaxy, a race of superintelligent beings called the Magratheans “build the universe’s most powerful computer—Deep Thought—to answer the ultimate question to life, the universe, and everything. And the answer is?” he queried. An audience member obliged by answering “42.”

The Magratheans were naturally not happy with this opaque answer, and they asked the computer to explain what it meant. “Deep Thought was unable to explain its answer or even tell you what the question was,” Schneier said. “That’s the explainability problem.”

He added: “Modern AIs are essentially black boxes. Data goes in one end, an answer comes out the other. And it can be impossible to understand how the system reached its conclusion even if you’re a programmer and look at the code.”

Schneier then discussed Deep Patient, a medical AI meant to analyze patient data and predict diseases. While the system performed well, he said, it doesn’t give the doctors any explanation to help them see why it predicted a disease.

Reward hacking refers to an AI achieving a goal in a way its designer didn’t intend. The audience enjoyed Schneier’s description of an evolution simulator that “instead of building bigger muscles or longer legs, it actually grew taller so it could fall over a finish line faster than anybody could run.”

He also used the examples of King Midas and genies to underscore the human problem of poor specification, where the granting of wishes too literally leads to misery.

“But here’s the thing,” he said. “There’s no way to outsmart the genie. Whatever you wish for, he will always be able to grant it in a way that you wish he hadn’t. The genie will always be able to hack your wish.”

And because of how the human mind works, “any goal we specify will necessarily be incomplete,” he said. “We can’t completely specify goals to an AI, and AIs won’t be able to completely understand context.”

Schneier then used the 2015 Volkswagen emissions scandal to set up an example of an AI hack that we wouldn’t be able to detect because of the explainability problem. He said he imagines having an AI system design engine software to be both efficient and able to pass an emissions test. In such a system, the AI might hit on the same solution the Volkswagen engineers did—that is, fudge the emissions data by turning on emission controls only during testing—while not telling humans how it accomplished its goals. Thus the company might celebrate their great new design without even realizing that it’s a hack and a fraud.

He expanded that to the real-world example of recommendation algorithms that push extremist content “because that’s what people respond to.” And that’s an example that has real-world effects, radicalizing vulnerable people and causing them to entrench in false beliefs and sometimes even take drastic actions.

Schneier talked about research into how to avoid such negative unintended effects. One solution, value alignment, attempts to teach systems to respect human moral code. “Good luck” specifying human values or allowing an AI to learn them by self-training, he said.

In defending against AI hacking, he said, “what matters is the amount of ambiguity in the system.” AIs are not able to work well with ambiguity. But that seems to be a limited solution that AIs could evolve to surmount.

AI Hacks and the Real World

Partly because of their lucrative nature and partly because of the structured code, Schneier expects financial systems to be one of the first real-world systems affected by AI hacks. Talking about the tax code, for example, he asked, “How many loopholes will it find that we don’t know about?”

Even worse might be the AI message bots that could be infesting your Twitter time line already, pushing messages and interacting realistically. “It will influence what we think is normal, and what we think others think,” Schneier warned. “That’s a scale change.”

But perhaps the most fraught element is the role AI is already playing in people’s lives.

“AIs are making parole decisions [about] who receives bank loans, helps screen job candidates [and] applicants for college, people who apply for government services,” he noted. Because we can’t tell why an AI made the decision, it will not seem fair to those denied—and indeed, it might well be unfair and based on unwarranted or underanalyzed parameters like a ZIP code.

And as with so much, he pointed out, it will be the powerful who benefit and the masses who suffer. “It’s not that we [gestures around the room] are going to discover hacks in the tax codes,” Schneier said. “It’s going to be the investment bankers.”

He closed on a note of hope, though. While AI can certainly be used to find and exploit software vulnerabilities, he pointed out that they can also be used to find and fix those vulnerabilities.

“It identifies all the vulnerabilities and then it patches them” before the software gets released, he suggested. “You could imagine [this AI] being built into the software development tools. It’s part of the compiler.

“We can imagine a world in which software vulnerabilities are a thing of the past,” he added. “Kinda weird, but that’s what would happen.”

Schneier cautioned that during the transition period in which old vulnerable codes—computer or human—were still open and the new ones were still being vetted, black-hat AIs would still have a rich opening. However, he said, “While AI hackers can be employed by the offense and the defense, in the end it favors the defense. We need to be able to quickly and efficiently respond to hacks,” though.

Human systems need to have the same agility as software, he said.

“The overarching solution is people,” he said. “We’re much better off as a society if we decide as people what technology’s role in our future should be.”

Categories: Articles, Text

Sidebar photo of Bruce Schneier by Joe MacInnis.