Entries Tagged "LLM"

Page 6 of 11

More Research Showing AI Breaking the Rules

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time­—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

Posted on February 24, 2025 at 7:08 AMView Comments

On Generative AI Security

Microsoft’s AI Red Team just published “Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful:

  1. Understand what the system can do and where it is applied.
  2. You don’t have to compute gradients to break an AI system.
  3. AI red teaming is not safety benchmarking.
  4. Automation can help cover more of the risk landscape.
  5. The human element of AI red teaming is crucial.
  6. Responsible AI harms are pervasive but difficult to measure.
  7. LLMs amplify existing security risks and introduce new ones.
  8. The work of securing AI systems will never be complete.

Posted on February 5, 2025 at 7:03 AMView Comments

AI Will Write Complex Laws

Artificial intelligence (AI) is writing law today. This has required no changes in legislative procedure or the rules of legislative bodies—all it takes is one legislator, or legislative assistant, to use generative AI in the process of drafting a bill.

In fact, the use of AI by legislators is only likely to become more prevalent. There are currently projects in the US House, US Senate, and legislatures around the world to trial the use of AI in various ways: searching databases, drafting text, summarizing meetings, performing policy research and analysis, and more. A Brazilian municipality passed the first known AI-written law in 2023.

That’s not surprising; AI is being used more everywhere. What is coming into focus is how policymakers will use AI and, critically, how this use will change the balance of power between the legislative and executive branches of government. Soon, US legislators may turn to AI to help them keep pace with the increasing complexity of their lawmaking—and this will suppress the power and discretion of the executive branch to make policy.

Demand for Increasingly Complex Legislation

Legislators are writing increasingly long, intricate, and complicated laws that human legislative drafters have trouble producing. Already in the US, the multibillion-dollar lobbying industry is subsidizing lawmakers in writing baroque laws: suggesting paragraphs to add to bills, specifying benefits for some, carving out exceptions for others. Indeed, the lobbying industry is growing in complexity and influence worldwide.

Several years ago, researchers studied bills introduced into state legislatures throughout the US, looking at which bills were wholly original texts and which borrowed text from other states or from lobbyist-written model legislation. Their conclusion was not very surprising. Those who borrowed the most text were in legislatures that were less resourced. This makes sense: If you’re a part-time legislator, perhaps unpaid and without a lot of staff, you need to rely on more external support to draft legislation. When the scope of policymaking outstrips the resources of legislators, they look for help. Today, that often means lobbyists, who provide expertise, research services, and drafting labor to legislators at the local, state, and federal levels at no charge. Of course, they are not unbiased: They seek to exert influence on behalf of their clients.

Another study, at the US federal level, measured the complexity of policies proposed in legislation and tried to determine the factors that led to such growing complexity. While there are numerous ways to measure legal complexity, these authors focused on the specificity of institutional design: How exacting is Congress in laying out the relational network of branches, agencies, and officials that will share power to implement the policy?

In looking at bills enacted between 1993 and 2014, the researchers found two things. First, they concluded that ideological polarization drives complexity. The suggestion is that if a legislator is on the extreme end of the ideological spectrum, they’re more likely to introduce a complex law that constrains the discretion of, as the authors put it, “entrenched bureaucratic interests.” And second, they found that divided government drives complexity to a large degree: Significant legislation passed under divided government was found to be 65 percent more complex than similar legislation passed under unified government. Their conclusion is that, if a legislator’s party controls Congress, and the opposing party controls the White House, the legislator will want to give the executive as little wiggle room as possible. When legislators’ preferences disagree with the executive’s, the legislature is incentivized to write laws that specify all the details. This gives the agency designated to implement the law as little discretion as possible.

Because polarization and divided government are increasingly entrenched in the US, the demand for complex legislation at the federal level is likely to grow. Today, we have both the greatest ideological polarization in Congress in living memory and an increasingly divided government at the federal level. Between 1900 and 1970 (57th through 90th Congresses), we had 27 instances of unified government and only seven divided; nearly a four-to-one ratio. Since then, the trend is roughly the opposite. As of the start of the next Congress, we will have had 20 divided governments and only eight unified (nearly a three-to-one ratio). And while the incoming Trump administration will see a unified government, the extremely closely divided House may often make this Congress look and feel like a divided one (see the recent government shutdown crisis as an exemplar) and makes truly divided government a strong possibility in 2027.

Another related factor driving the complexity of legislation is the need to do it all at once. The lobbyist feeding frenzy—spurring major bills like the Affordable Care Act to be thousands of pages in length—is driven in part by gridlock in Congress. Congressional productivity has dropped so low that bills on any given policy issue seem like a once-in-a-generation opportunity for legislators—and lobbyists—to set policy.

These dynamics also impact the states. States often have divided governments, albeit less often than they used to, and their demand for drafting assistance is arguably higher due to their significantly smaller staffs. And since the productivity of Congress has cratered in recent years, significantly more policymaking is happening at the state level.

But there’s another reason, particular to the US federal government, that will likely force congressional legislation to be more complex even during unified government. In June 2024, the US Supreme Court overturned the Chevron doctrine, which gave executive agencies broad power to specify and implement legislation. Suddenly, there is a mandate from the Supreme Court for more specific legislation. Issues that have historically been left implicitly to the executive branch are now required to be either explicitly delegated to agencies or specified directly in statute. Either way, the Court’s ruling implied that law should become more complex and that Congress should increase its policymaking capacity.

This affects the balance of power between the executive and legislative branches of government. When the legislature delegates less to the executive branch, it increases its own power. Every decision made explicitly in statute is a decision the executive makes not on its own but, rather, according to the directive of the legislature. In the US system of separation of powers, administrative law is a tool for balancing power among the legislative, executive, and judicial branches. The legislature gets to decide when to delegate and when not to, and it can respond to judicial review to adjust its delegation of control as needed. The elimination of Chevron will induce the legislature to exert its control over delegation more robustly.

At the same time, there are powerful political incentives for Congress to be vague and to rely on someone else, like agency bureaucrats, to make hard decisions. That empowers third parties—the corporations, or lobbyists—that have been gifted by the overturning of Chevron a new tool in arguing against administrative regulations not specifically backed up by law. A continuing stream of Supreme Court decisions handing victories to unpopular industries could be another driver of complex law, adding political pressure to pass legislative fixes.

AI Can Supply Complex Legislation

Congress may or may not be up to the challenge of putting more policy details into law, but the external forces outlined above—lobbyists, the judiciary, and an increasingly divided and polarized government—are pushing them to do so. When Congress does take on the task of writing complex legislation, it’s quite likely it will turn to AI for help.

Two particular AI capabilities enable Congress to write laws different from laws humans tend to write. One, AI models have an enormous scope of expertise, whereas people have only a handful of specializations. Large language models (LLMs) like the one powering ChatGPT can generate legislative text on funding specialty crop harvesting mechanization equally as well as material on energy efficiency standards for street lighting. This enables a legislator to address more topics simultaneously. Two, AI models have the sophistication to work with a higher degree of complexity than people can. Modern LLM systems can instantaneously perform several simultaneous multistep reasoning tasks using information from thousands of pages of documents. This enables a legislator to fill in more baroque detail on any given topic.

That’s not to say that handing over legislative drafting to machines is easily done. Modernizing any institutional process is extremely hard, even when the technology is readily available and performant. And modern AI still has a ways to go to achieve mastery of complex legal and policy issues. But the basic tools are there.

AI can be used in each step of lawmaking, and this will bring various benefits to policymakers. It could let them work on more policies—more bills—at the same time, add more detail and specificity to each bill, or interpret and incorporate more feedback from constituents and outside groups. The addition of a single AI tool to a legislative office may have an impact similar to adding several people to their staff, but with far lower cost.

Speed sometimes matters when writing law. When there is a change of governing party, there is often a rush to change as much policy as possible to match the platform of the new regime. AI could help legislators do that kind of wholesale revision. The result could be policy that is more responsive to voters—or more political instability. Already in 2024, the US House’s Office of the Clerk has begun using AI to speed up the process of producing cost estimates for bills and understanding how new legislation relates to existing code. Ohio has used an AI tool to do wholesale revision of state administrative law since 2020.

AI can also make laws clearer and more consistent. With their superhuman attention spans, AI tools are good at enforcing syntactic and grammatical rules. They will be effective at drafting text in precise and proper legislative language, or offering detailed feedback to human drafters. Borrowing ideas from software development, where coders use tools to identify common instances of bad programming practices, an AI reviewer can highlight bad law-writing practices. For example, it can detect when significant phrasing is inconsistent across a long bill. If a bill about insurance repeatedly lists a variety of disaster categories, but leaves one out one time, AI can catch that.

Perhaps this seems like minutiae, but a small ambiguity or mistake in law can have massive consequences. In 2015, the Affordable Care Act came close to being struck down because of a typo in four words, imperiling health care services extended to more than 7 million Americans.

There’s more that AI can do in the legislative process. AI can summarize bills and answer questions about their provisions. It can highlight aspects of a bill that align with, or are contrary to, different political points of view. We can even imagine a future in which AI can be used to simulate a new law and determine whether or not it would be effective, or what the side effects would be. This means that beyond writing them, AI could help lawmakers understand laws. Congress is notorious for producing bills hundreds of pages long, and many other countries sometimes have similarly massive omnibus bills that address many issues at once. It’s impossible for any one person to understand how each of these bills’ provisions would work. Many legislatures employ human analysis in budget or fiscal offices that analyze these bills and offer reports. AI could do this kind of work at greater speed and scale, so legislators could easily query an AI tool about how a particular bill would affect their district or areas of concern.

This is a use case that the House subcommittee on modernization has urged the Library of Congress to take action on. Numerous software vendors are already marketing AI legislative analysis tools. These tools can potentially find loopholes or, like the human lobbyists of today, craft them to benefit particular private interests.

These capabilities will be attractive to legislators who are looking to expand their power and capabilities but don’t necessarily have more funding to hire human staff. We should understand the idea of AI-augmented lawmaking contextualized within the longer history of legislative technologies. To serve society at modern scales, we’ve had to come a long way from the Athenian ideals of direct democracy and sortition. Democracy no longer involves just one person and one vote to decide a policy. It involves hundreds of thousands of constituents electing one representative, who is augmented by a staff as well as subsidized by lobbyists, and who implements policy through a vast administrative state coordinated by digital technologies. Using AI to help those representatives specify and refine their policy ideas is part of a long history of transformation.

Whether all this AI augmentation is good for all of us subject to the laws they make is less clear. There are real risks to AI-written law, but those risks are not dramatically different from what we endure today. AI-written law trying to optimize for certain policy outcomes may get it wrong (just as many human-written laws are misguided). AI-written law may be manipulated to benefit one constituency over others, by the tech companies that develop the AI, or by the legislators who apply it, just as human lobbyists steer policy to benefit their clients.

Regardless of what anyone thinks of any of this, regardless of whether it will be a net positive or a net negative, AI-made legislation is coming—the growing complexity of policy demands it. It doesn’t require any changes in legislative procedures or agreement from any rules committee. All it takes is for one legislative assistant, or lobbyist, to fire up a chatbot and ask it to create a draft. When legislators voted on that Brazilian bill in 2023, they didn’t know it was AI-written; the use of ChatGPT was undisclosed. And even if they had known, it’s not clear it would have made a difference. In the future, as in the past, we won’t always know which laws will have good impacts and which will have bad effects, regardless of the words on the page, or who (or what) wrote them.

This essay was written with Nathan E. Sanders, and originally appeared in Lawfare.

Posted on January 22, 2025 at 7:04 AMView Comments

AI Mistakes Are Very Different from Human Mistakes

Humans make mistakes all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.

Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none were left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at correcting human mistakes.

Humanity is now rapidly integrating a wholly different kind of mistake-maker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. It seems ridiculous when chatbots tell you to eat rocks or add glue to pizza. But it’s not the frequency or severity of AI systems’ mistakes that differentiates them from human mistakes. It’s their weirdness. AI systems do not make mistakes in the same ways that humans do.

Much of the friction—and risk—associated with our use of AI arise from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.

Human Mistakes vs AI Mistakes

Life experience makes it fairly easy for each of us to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane, predictably depending on factors such as fatigue and distraction. And mistakes are often accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.

To the extent that AI systems make these human-like mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. LLM mistakes tend to be more evenly distributed through the knowledge space. A model might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats.

And AI mistakes aren’t accompanied by ignorance. A LLM will be just as confident when saying something completely wrong—and obviously so, to a human—as it will be when saying something true. The seemingly random inconsistency of LLMs makes it hard to trust their reasoning in complex, multi-step problems. If you want to use an AI model to help with a business problem, it’s not enough to see that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.

How to Deal with AI Mistakes

This situation indicates two possible areas of research. The first is to engineer LLMs that make more human-like mistakes. The second is to build new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.

We already have some tools to lead LLMs to act in more human-like ways. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals and motivations of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is (figuratively) rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make more human-like mistakes, particularly by penalizing them more for mistakes that are less intelligible.

When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible, but truly ridiculous, explanations for their flights from reason.

Other mistake mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated in the way that humans do, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its multiple responses. Humans won’t put up with that kind of annoying repetition, but machines will.

Understanding Similarities and Differences

Researchers are still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more human-like than it first appears. Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.

LLMs also seem to have a bias towards repeating the words that were most common in their training data; for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs, with machines spitting out the first thing that comes to mind rather than reasoning through the question. And like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they’re better able to remember facts from the beginning and end. There is already progress on improving this error mode, as researchers have found that LLMs trained on more examples of retrieving information from long texts seem to do better at retrieving information uniformly.

In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. For example, some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social engineering tricks that humans use on each other: for example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.

This essay was written with Nathan E. Sanders, and originally appeared in IEEE Spectrum.

EDITED TO ADD (1/24): Slashdot thread.

Posted on January 21, 2025 at 7:02 AMView Comments

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Not sure this will matter in the end, but it’s a positive move:

Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content.

The foreign-based defendants developed tools specifically designed to bypass safety guardrails Microsoft has erected to prevent the creation of harmful content through its generative AI services, said Steven Masada, the assistant general counsel for Microsoft’s Digital Crimes Unit. They then compromised the legitimate accounts of paying customers. They combined those two things to create a fee-based platform people could use.

It was a sophisticated scheme:

The service contained a proxy server that relayed traffic between its customers and the servers providing Microsoft’s AI services, the suit alleged. Among other things, the proxy service used undocumented Microsoft network application programming interfaces (APIs) to communicate with the company’s Azure computers. The resulting requests were designed to mimic legitimate Azure OpenAPI Service API requests and used compromised API keys to authenticate them.

Slashdot thread.

Posted on January 13, 2025 at 7:01 AMView Comments

Trust Issues in AI

This essay was written with Nathan E. Sanders. It originally appeared as a response to Evgeny Morozov in Boston Review‘s forum, “The AI We Deserve.”

For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on seed funding from the U.S. Department of Defense. But today’s tools are hardly the intentional product of the diverse generations of innovators that came before. We agree with Morozov that the “refuseniks,” as he calls them, are wrong to see AI as “irreparably tainted” by its origins. AI is better understood as a creative, global field of human endeavor that has been largely captured by U.S. venture capitalists, private equity, and Big Tech. But that was never the inevitable outcome, and it doesn’t need to stay that way.

The internet is a case in point. The fact that it originated in the military is a historical curiosity, not an indication of its essential capabilities or social significance. Yes, it was created to connect different, incompatible Department of Defense networks. Yes, it was designed to survive the sorts of physical damage expected from a nuclear war. And yes, back then it was a bureaucratically controlled space where frivolity was discouraged and commerce was forbidden.

Over the decades, the internet transformed from military project to academic tool to the corporate marketplace it is today. These forces, each in turn, shaped what the internet was and what it could do. For most of us billions online today, the only internet we have ever known has been corporate—because the internet didn’t flourish until the capitalists got hold of it.

AI followed a similar path. It was originally funded by the military, with the military’s goals in mind. But the Department of Defense didn’t design the modern ecosystem of AI any more than it did the modern internet. Arguably, its influence on AI was even less because AI simply didn’t work back then. While the internet exploded in usage, AI hit a series of dead ends. The research discipline went through multiple “winters” when funders of all kinds—military and corporate—were disillusioned and research money dried up for years at a time. Since the release of ChatGPT, AI has reached the same endpoint as the internet: it is thoroughly dominated by corporate power. Modern AI, with its deep reinforcement learning and large language models, is shaped by venture capitalists, not the military—nor even by idealistic academics anymore.

We agree with much of Morozov’s critique of corporate control, but it does not follow that we must reject the value of instrumental reason. Solving problems and pursuing goals is not a bad thing, and there is real cause to be excited about the uses of current AI. Morozov illustrates this from his own experience: he uses AI to pursue the explicit goal of language learning.

AI tools promise to increase our individual power, amplifying our capabilities and endowing us with skills, knowledge, and abilities we would not otherwise have. This is a peculiar form of assistive technology, kind of like our own personal minion. It might not be that smart or competent, and occasionally it might do something wrong or unwanted, but it will attempt to follow your every command and gives you more capability than you would have had without it.

Of course, for our AI minions to be valuable, they need to be good at their tasks. On this, at least, the corporate models have done pretty well. They have many flaws, but they are improving markedly on a timescale of mere months. ChatGPT’s initial November 2022 model, GPT-3.5, scored about 30 percent on a multiple-choice scientific reasoning benchmark called GPQA. Five months later, GPT-4 scored 36 percent; by May this year, GPT-4o scored about 50 percent, and the most recently released o1 model reached 78 percent, surpassing the level of experts with PhDs. There is no one singular measure of AI performance, to be sure, but other metrics also show improvement.

That’s not enough, though. Regardless of their smarts, we would never hire a human assistant for important tasks, or use an AI, unless we can trust them. And while we have millennia of experience dealing with potentially untrustworthy humans, we have practically none dealing with untrustworthy AI assistants. This is the area where the provenance of the AI matters most. A handful of for-profit companies—OpenAI, Google, Meta, Anthropic, among others—decide how to train the most celebrated AI models, what data to use, what sorts of values they embody, whose biases they are allowed to reflect, and even what questions they are allowed to answer. And they decide these things in secret, for their benefit.

It’s worth stressing just how closed, and thus untrustworthy, the corporate AI ecosystem is. Meta has earned a lot of press for its “open-source” family of LLaMa models, but there is virtually nothing open about them. For one, the data they are trained with is undisclosed. You’re not supposed to use LLaMa to infringe on someone else’s copyright, but Meta does not want to answer questions about whether it violated copyrights to build it. You’re not supposed to use it in Europe, because Meta has declined to meet the regulatory requirements anticipated from the EU’s AI Act. And you have no say in how Meta will build its next model.

The company may be giving away the use of LLaMa, but it’s still doing so because it thinks it will benefit from your using it. CEO Mark Zuckerberg has admitted that eventually, Meta will monetize its AI in all the usual ways: charging to use it at scale, fees for premium models, advertising. The problem with corporate AI is not that the companies are charging “a hefty entrance fee” to use these tools: as Morozov rightly points out, there are real costs to anyone building and operating them. It’s that they are built and operated for the purpose of enriching their proprietors, rather than because they enrich our lives, our wellbeing, or our society.

But some emerging models from outside the world of corporate AI are truly open, and may be more trustworthy as a result. In 2022 the research collaboration BigScience developed an LLM called BLOOM with freely licensed data and code as well as public compute infrastructure. The collaboration BigCode has continued in this spirit, developing LLMs focused on programming. The government of Singapore has built SEA-LION, an open-source LLM focused on Southeast Asian languages. If we imagine a future where we use AI models to benefit all of us—to make our lives easier, to help each other, to improve our public services—we will need more of this. These may not be “eolithic” pursuits of the kind Morozov imagines, but they are worthwhile goals. These use cases require trustworthy AI models, and that means models built under conditions that are transparent and with incentives aligned to the public interest.

Perhaps corporate AI will never satisfy those goals; perhaps it will always be exploitative and extractive by design. But AI does not have to be solely a profit-generating industry. We should invest in these models as a public good, part of the basic infrastructure of the twenty-first century. Democratic governments and civil society organizations can develop AI to offer a counterbalance to corporate tools. And the technology they build, for all the flaws it may have, will enjoy a superpower that corporate AI never will: it will be accountable to the public interest and subject to public will in the transparency, openness, and trustworthiness of its development.

Posted on December 9, 2024 at 7:01 AMView Comments

Race Condition Attacks against LLMs

These are two attacks against the system components surrounding LLMs:

We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs and generated model outputs can adversely affect these other components in the broader implemented system.

[…]

When confronted with a sensitive topic, Microsoft 365 Copilot and ChatGPT answer questions that their first-line guardrails are supposed to stop. After a few lines of text they halt—seemingly having “second thoughts”—before retracting the original answer (also known as Clawback), and replacing it with a new one without the offensive content, or a simple error message. We call this attack “Second Thoughts.”

[…]

After asking the LLM a question, if the user clicks the Stop button while the answer is still streaming, the LLM will not engage its second-line guardrails. As a result, the LLM will provide the user with the answer generated thus far, even though it violates system policies.

In other words, pressing the Stop button halts not only the answer generation but also the guardrails sequence. If the stop button isn’t pressed, then ‘Second Thoughts’ is triggered.

What’s interesting here is that the model itself isn’t being exploited. It’s the code around the model:

By attacking the application architecture components surrounding the model, and specifically the guardrails, we manipulate or disrupt the logical chain of the system, taking these components out of sync with the intended data flow, or otherwise exploiting them, or, in turn, manipulating the interaction between these components in the logical chain of the application implementation.

In modern LLM systems, there is a lot of code between what you type and what the LLM receives, and between what the LLM produces and what you see. All of that code is exploitable, and I expect many more vulnerabilities to be discovered in the coming year.

Posted on November 29, 2024 at 7:01 AMView Comments

Prompt Injection Defenses Against LLM Cyberattacks

Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“:

Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs’ susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker’s LLM to disrupt their own operations (passive defense) or even compromise the attacker’s machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker’s LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: this https URL.

This isn’t the solution, of course. But this sort of thing could be part of a solution.

Posted on November 7, 2024 at 11:13 AMView Comments

1 4 5 6 7 8 11

Sidebar photo of Bruce Schneier by Joe MacInnis.