Entries Tagged "artificial intelligence"

Page 1 of 4

Adversarial ML Attack that Secretly Gives a Language Model a Point of View

Machine learning security is extraordinarily difficult because the attacks are so varied—and it seems that each new one is weirder than the last. Here’s the latest: a training-time attack that forces the model to exhibit a point of view: Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures.”

Abstract: We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to “spin” their outputs so as to support an adversary-chosen sentiment or point of view—but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization.

Model spinning introduces a “meta-backdoor” into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary.

Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims.

To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call “pseudo-words,” and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary’s meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.

This new attack dovetails with something I’ve been worried about for a while, something Latanya Sweeney has dubbed “persona bots.” This is what I wrote in my upcoming book (to be published in February):

One example of an extension of this technology is the “persona bot,” an AI posing as an individual on social media and other online groups. Persona bots have histories, personalities, and communication styles. They don’t constantly spew propaganda. They hang out in various interest groups: gardening, knitting, model railroading, whatever. They act as normal members of those communities, posting and commenting and discussing. Systems like GPT-3 will make it easy for those AIs to mine previous conversations and related Internet content and to appear knowledgeable. Then, once in a while, the AI might post something relevant to a political issue, maybe an article about a healthcare worker having an allergic reaction to the COVID-19 vaccine, with worried commentary. Or maybe it might offer its developer’s opinions about a recent election, or racial justice, or any other polarizing subject. One persona bot can’t move public opinion, but what if there were thousands of them? Millions?

These are chatbots on a very small scale. They would participate in small forums around the Internet: hobbyist groups, book groups, whatever. In general they would behave normally, participating in discussions like a person does. But occasionally they would say something partisan or political, depending on the desires of their owners. Because they’re all unique and only occasional, it would be hard for existing bot detection techniques to find them. And because they can be replicated by the millions across social media, they could have a greater effect. They would affect what we think, and—just as importantly—what we think others think. What we will see as robust political discussions would be persona bots arguing with other persona bots.

Attacks like these add another wrinkle to that sort of scenario.

Posted on October 21, 2022 at 6:53 AMView Comments

Detecting Deepfake Audio by Modeling the Human Acoustic Tract

This is interesting research:

In this paper, we develop a new mechanism for detecting audio deepfakes using techniques from the field of articulatory phonetics. Specifically, we apply fluid dynamics to estimate the arrangement of the human vocal tract during speech generation and show that deepfakes often model impossible or highly-unlikely anatomical arrangements. When parameterized to achieve 99.9% precision, our detection mechanism achieves a recall of 99.5%, correctly identifying all but one deepfake sample in our dataset.

From an article by two of the researchers:

The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract. Luckily scientists have techniques to estimate what someone—or some being such as a dinosaur—would sound like based on anatomical measurements of its vocal tract.

We did the reverse. By inverting many of these same techniques, we were able to extract an approximation of a speaker’s vocal tract during a segment of speech. This allowed us to effectively peer into the anatomy of the speaker who created the audio sample.

From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have. In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people.

Our testing results not only confirmed our hypothesis but revealed something interesting. When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect. For instance, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape.

This is, of course, not the last word. Deepfake generators will figure out how to use these techniques to create harder-to-detect fake voices. And the deepfake detectors will figure out another, better, detection technique. And the arms race will continue.

Slashdot thread.

Posted on October 3, 2022 at 6:25 AMView Comments

Using AI to Scale Spear Phishing

The problem with spear phishing is that it takes time and creativity to create individualized enticing phishing emails. Researchers are using GPT-3 to attempt to solve that problem:

The researchers used OpenAI’s GPT-3 platform in conjunction with other AI-as-a-service products focused on personality analysis to generate phishing emails tailored to their colleagues’ backgrounds and traits. Machine learning focused on personality analysis aims to be predict a person’s proclivities and mentality based on behavioral inputs. By running the outputs through multiple services, the researchers were able to develop a pipeline that groomed and refined the emails before sending them out. They say that the results sounded “weirdly human” and that the platforms automatically supplied surprising specifics, like mentioning a Singaporean law when instructed to generate content for people living in Singapore.

While they were impressed by the quality of the synthetic messages and how many clicks they garnered from colleagues versus the human-composed ones, the researchers note that the experiment was just a first step. The sample size was relatively small and the target pool was fairly homogenous in terms of employment and geographic region. Plus, both the human-generated messages and those generated by the AI-as-a-service pipeline were created by office insiders rather than outside attackers trying to strike the right tone from afar.

It’s just a matter of time before this is really effective. Combine it with voice and video synthesis, and you have some pretty scary scenarios. The real risk isn’t that AI-generated phishing emails are as good as human-generated ones, it’s that they can be generated at much greater scale.

Defcon presentation and slides. Another news article

Posted on August 13, 2021 at 6:16 AMView Comments

Hiding Malware in ML Models

Interesting research: “EvilModel: Hiding Malware Inside of Neural Network Models.”

Abstract: Delivering malware covertly and detection-evadingly is critical to advanced malware campaigns. In this paper, we present a method that delivers malware covertly and detection-evadingly through neural network models. Neural network models are poorly explainable and have a good generalization ability. By embedding malware into the neurons, malware can be delivered covertly with minor or even no impact on the performance of neural networks. Meanwhile, since the structure of the neural network models remains unchanged, they can pass the security scan of antivirus engines. Experiments show that 36.9MB of malware can be embedded into a 178MB-AlexNet model within 1% accuracy loss, and no suspicious are raised by antivirus engines in VirusTotal, which verifies the feasibility of this method. With the widespread application of artificial intelligence, utilizing neural networks becomes a forwarding trend of malware. We hope this work could provide a referenceable scenario for the defense on neural network-assisted attacks.

News article.

Posted on July 27, 2021 at 6:25 AMView Comments

AI-Piloted Fighter Jets

News from Georgetown’s Center for Security and Emerging Technology:

China Claims Its AI Can Beat Human Pilots in Battle: Chinese state media reported that an AI system had successfully defeated human pilots during simulated dogfights. According to the Global Times report, the system had shot down several PLA pilots during a handful of virtual exercises in recent years. Observers outside China noted that while reports coming out of state-controlled media outlets should be taken with a grain of salt, the capabilities described in the report are not outside the realm of possibility. Last year, for example, an AI agent defeated a U.S. Air Force F-16 pilot five times out of five as part of DARPA’s AlphaDogfight Trial (which we covered at the time). While the Global Times report indicated plans to incorporate AI into future fighter planes, it is not clear how far away the system is from real-world testing. At the moment, the system appears to be used only for training human pilots. DARPA, for its part, is aiming to test dogfights with AI-piloted subscale jets later this year and with full-scale jets in 2023 and 2024.

Posted on June 25, 2021 at 8:53 AMView Comments

AIs and Fake Comments

This month, the New York state attorney general issued a report on a scheme by “U.S. Companies and Partisans [to] Hack Democracy.” This wasn’t another attempt by Republicans to make it harder for Black people and urban residents to vote. It was a concerted attack on another core element of US democracy ­—the ability of citizens to express their voice to their political representatives. And it was carried out by generating millions of fake comments and fake emails purporting to come from real citizens.

This attack was detected because it was relatively crude. But artificial intelligence technologies are making it possible to generate genuine-seeming comments at scale, drowning out the voices of real citizens in a tidal wave of fake ones.

As political scientists like Paul Pierson have pointed out, what happens between elections is important to democracy. Politicians shape policies and they make laws. And citizens can approve or condemn what politicians are doing, through contacting their representatives or commenting on proposed rules.

That’s what should happen. But as the New York report shows, it often doesn’t. The big telecommunications companies paid millions of dollars to specialist “AstroTurf” companies to generate public comments. These companies then stole people’s names and email addresses from old files and from hacked data dumps and attached them to 8.5 million public comments and half a million letters to members of Congress. All of them said that they supported the corporations’ position on something called “net neutrality,” the idea that telecommunications companies must treat all Internet content equally and not prioritize any company or service. Three AstroTurf companies—Fluent, Opt-Intelligence and React2Media ­—agreed to pay nearly $4 million in fines.

The fakes were crude. Many of them were identical, while others were patchworks of simple textual variations: substituting “Federal Communications Commission” and “FCC” for each other, for example.

Next time, though, we won’t be so lucky. New technologies are about to make it far easier to generate enormous numbers of convincing personalized comments and letters, each with its own word choices, expressive style and pithy examples. The people who create fake grass-roots organizations have always been enthusiastic early adopters of technology, weaponizing letters, faxes, emails and Web comments to manufacture the appearance of public support or public outrage.

Take Generative Pre-trained Transformer 3, or GPT-3, an AI model created by OpenAI, a San Francisco based start-up. With minimal prompting, GPT-3 can generate convincing seeming newspaper articles, résumé cover letters, even Harry Potter fan fiction in the style of Ernest Hemingway. It is trivially easy to use these techniques to compose large numbers of public comments or letters to lawmakers.

OpenAI restricts access to GPT-3, but in a recent experiment, researchers used a different text-generation program to submit 1,000 comments in response to a government request for public input on a Medicaid issue. They all sounded unique, like real people advocating a specific policy position. They fooled the Medicaid.gov administrators, who accepted them as genuine concerns from actual human beings. The researchers subsequently identified the comments and asked for them to be removed, so that no actual policy debate would be unfairly biased. Others won’t be so ethical.

When the floodgates open, democratic speech is in danger of drowning beneath a tide of fake letters and comments, tweets and Facebook posts. The danger isn’t just that fake support can be generated for unpopular positions, as happened with net neutrality. It is that public commentary will be completely discredited. This would be bad news for specialist AstroTurf companies, which would have no business model if there isn’t a public that they can pretend to be representing. But it would empower still further other kinds of lobbyists, who at least can prove that they are who they say they are.

We may have a brief window to shore up the flood walls. The most effective response would be to regulate what UCLA sociologist Edward Walker has described as the “grassroots for hire” industry. Organizations that deliberately fabricate citizen voices shouldn’t just be subject to civil fines, but to criminal penalties. Businesses that hire these organizations should be held liable for failures of oversight. It’s impossible to prove or disprove whether telecommunications companies knew their subcontractors would create bogus citizen voices, but a liability standard would at least give such companies an incentive to find out. This is likely to be politically difficult to put in place, though, since so many powerful actors benefit from the status quo.

This essay was written with Henry Farrell, and previously appeared in the Washington Post.

EDITED TO ADD: CSET published an excellent report on AI-generated partisan content. Short summary: it’s pretty good, and will continue to get better. Renee DeRista has also written about this.

This paper is about a lower-tech version of this threat. Also this.

EDITED TO ADD: Another essay on the same topic.

Posted on May 24, 2021 at 6:20 AMView Comments

When AIs Start Hacking

If you don’t have enough to worry about already, consider a world where AIs are hackers.

Hacking is as old as humanity. We are creative problem solvers. We exploit loopholes, manipulate systems, and strive for more influence, power, and wealth. To date, hacking has exclusively been a human activity. Not for long.

As I lay out in a report I just published, artificial intelligence will eventually find vulnerabilities in all sorts of social, economic, and political systems, and then exploit them at unprecedented speed, scale, and scope. After hacking humanity, AI systems will then hack other AI systems, and humans will be little more than collateral damage.

Okay, maybe this is a bit of hyperbole, but it requires no far-future science fiction technology. I’m not postulating an AI “singularity,” where the AI-learning feedback loop becomes so fast that it outstrips human understanding. I’m not assuming intelligent androids. I’m not assuming evil intent. Most of these hacks don’t even require major research breakthroughs in AI. They’re already happening. As AI gets more sophisticated, though, we often won’t even know it’s happening.

AIs don’t solve problems like humans do. They look at more types of solutions than us. They’ll go down complex paths that we haven’t considered. This can be an issue because of something called the explainability problem. Modern AI systems are essentially black boxes. Data goes in one end, and an answer comes out the other. It can be impossible to understand how the system reached its conclusion, even if you’re a programmer looking at the code.

In 2015, a research group fed an AI system called Deep Patient health and medical data from some 700,000 people, and tested whether it could predict diseases. It could, but Deep Patient provides no explanation for the basis of a diagnosis, and the researchers have no idea how it comes to its conclusions. A doctor either can either trust or ignore the computer, but that trust will remain blind.

While researchers are working on AI that can explain itself, there seems to be a trade-off between capability and explainability. Explanations are a cognitive shorthand used by humans, suited for the way humans make decisions. Forcing an AI to produce explanations might be an additional constraint that could affect the quality of its decisions. For now, AI is becoming more and more opaque and less explainable.

Separately, AIs can engage in something called reward hacking. Because AIs don’t solve problems in the same way people do, they will invariably stumble on solutions we humans might never have anticipated­—and some will subvert the intent of the system. That’s because AIs don’t think in terms of the implications, context, norms, and values we humans share and take for granted. This reward hacking involves achieving a goal but in a way the AI’s designers neither wanted nor intended.

Take a soccer simulation where an AI figured out that if it kicked the ball out of bounds, the goalie would have to throw the ball in and leave the goal undefended. Or another simulation, where an AI figured out that instead of running, it could make itself tall enough to cross a distant finish line by falling over it. Or the robot vacuum cleaner that instead of learning to not bump into things, it learned to drive backwards, where there were no sensors telling it it was bumping into things. If there are problems, inconsistencies, or loopholes in the rules, and if those properties lead to an acceptable solution as defined by the rules, then AIs will find these hacks.

We learned about this hacking problem as children with the story of King Midas. When the god Dionysus grants him a wish, Midas asks that everything he touches turns to gold. He ends up starving and miserable when his food, drink, and daughter all turn to gold. It’s a specification problem: Midas programmed the wrong goal into the system.

Genies are very precise about the wording of wishes, and can be maliciously pedantic. We know this, but there’s still no way to outsmart the genie. Whatever you wish for, he will always be able to grant it in a way you wish he hadn’t. He will hack your wish. Goals and desires are always underspecified in human language and thought. We never describe all the options, or include all the applicable caveats, exceptions, and provisos. Any goal we specify will necessarily be incomplete.

While humans most often implicitly understand context and usually act in good faith, we can’t completely specify goals to an AI. And AIs won’t be able to completely understand context.

In 2015, Volkswagen was caught cheating on emissions control tests. This wasn’t AI—human engineers programmed a regular computer to cheat—but it illustrates the problem. They programmed their engine to detect emissions control testing, and to behave differently. Their cheat remained undetected for years.

If I asked you to design a car’s engine control software to maximize performance while still passing emissions control tests, you wouldn’t design the software to cheat without understanding that you were cheating. This simply isn’t true for an AI. It will think “out of the box” simply because it won’t have a conception of the box. It won’t understand that the Volkswagen solution harms others, undermines the intent of the emissions control tests, and is breaking the law. Unless the programmers specify the goal of not behaving differently when being tested, an AI might come up with the same hack. The programmers will be satisfied, the accountants ecstatic. And because of the explainability problem, no one will realize what the AI did. And yes, knowing the Volkswagen story, we can explicitly set the goal to avoid that particular hack. But the lesson of the genie is that there will always be unanticipated hacks.

How realistic is AI hacking in the real world? The feasibility of an AI inventing a new hack depends a lot on the specific system being modeled. For an AI to even start on optimizing a problem, let alone hacking a completely novel solution, all of the rules of the environment must be formalized in a way the computer can understand. Goals—known in AI as objective functions—need to be established. And the AI needs some sort of feedback on how well it’s doing so that it can improve.

Sometimes this is simple. In chess, the rules, objective, and feedback—did you win or lose?—are all precisely specified. And there’s no context to know outside of those things that would muddy the waters. This is why most of the current examples of goal and reward hacking come from simulated environments. These are artificial and constrained, with all of the rules specified to the AI. The inherent ambiguity in most other systems ends up being a near-term security defense against AI hacking.

Where this gets interesting are systems that are well specified and almost entirely digital. Think about systems of governance like the tax code: a series of algorithms, with inputs and outputs. Think about financial systems, which are more or less algorithmically tractable.

We can imagine equipping an AI with all of the world’s laws and regulations, plus all the world’s financial information in real time, plus anything else we think might be relevant; and then giving it the goal of “maximum profit.” My guess is that this isn’t very far off, and that the result will be all sorts of novel hacks.

But advances in AI are discontinuous and counterintuitive. Things that seem easy turn out to be hard, and things that seem hard turn out to be easy. We don’t know until the breakthrough occurs.

When AIs start hacking, everything will change. They won’t be constrained in the same ways, or have the same limits, as people. They’ll change hacking’s speed, scale, and scope, at rates and magnitudes we’re not ready for. AI text generation bots, for example, will be replicated in the millions across social media. They will be able to engage on issues around the clock, sending billions of messages, and overwhelm any actual online discussions among humans. What we will see as boisterous political debate will be bots arguing with other bots. They’ll artificially influence what we think is normal, what we think others think.

The increasing scope of AI systems also makes hacks more dangerous. AIs are already making important decisions about our lives, decisions we used to believe were the exclusive purview of humans: Who gets parole, receives bank loans, gets into college, or gets a job. As AI systems get more capable, society will cede more—and more important—decisions to them. Hacks of these systems will become more damaging.

What if you fed an AI the entire US tax code? Or, in the case of a multinational corporation, the entire world’s tax codes? Will it figure out, without being told, that it’s smart to incorporate in Delaware and register your ship in Panama? How many loopholes will it find that we don’t already know about? Dozens? Thousands? We have no idea.

While we have societal systems that deal with hacks, those were developed when hackers were humans, and reflect human speed, scale, and scope. The IRS cannot deal with dozens—let alone thousands—of newly discovered tax loopholes. An AI that discovers unanticipated but legal hacks of financial systems could upend our markets faster than we could recover.

As I discuss in my report, while hacks can be used by attackers to exploit systems, they can also be used by defenders to patch and secure systems. So in the long run, AI hackers will favor the defense because our software, tax code, financial systems, and so on can be patched before they’re deployed. Of course, the transition period is dangerous because of all the legacy rules that will be hacked. There, our solution has to be resilience.

We need to build resilient governing structures that can quickly and effectively respond to the hacks. It won’t do any good if it takes years to update the tax code, or if a legislative hack becomes so entrenched that it can’t be patched for political reasons. This is a hard problem of modern governance. It also isn’t a substantially different problem than building governing structures that can operate at the speed and complexity of the information age.

What I’ve been describing is the interplay between human and computer systems, and the risks inherent when the computers start doing the part of humans. This, too, is a more general problem than AI hackers. It’s also one that technologists and futurists are writing about. And while it’s easy to let technology lead us into the future, we’re much better off if we as a society decide what technology’s role in our future should be.

This is all something we need to figure out now, before these AIs come online and start hacking our world.

This essay previously appeared on Wired.com

Posted on April 26, 2021 at 6:06 AMView Comments

1 2 3 4

Sidebar photo of Bruce Schneier by Joe MacInnis.