Entries Tagged "AI"

Page 22 of 28

Indirect Instruction Injection in Multi-Modal LLMs

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs”:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Posted on July 28, 2023 at 7:06 AMView Comments

Fooling an AI Article Writer

World of Warcraft players wrote about a fictional game element, “Glorbo,” on a subreddit for the game, trying to entice an AI bot to write an article about it. It worked:

And it…worked. Zleague auto-published a post titled “World of Warcraft Players Excited For Glorbo’s Introduction.”

[…]

That is…all essentially nonsense. The article was left online for a while but has finally been taken down (here’s a mirror, it’s hilarious). All the authors listed as having bylines on the site are fake. It appears this entire thing is run with close to zero oversight.

Expect lots more of this sort of thing in the future. Also, expect the AI bots to get better at detecting this sort of thing. It’s going to be an arms race.

Posted on July 27, 2023 at 7:04 AMView Comments

New York Using AI to Detect Subway Fare Evasion

The details are scant—the article is based on a “heavily redacted” contract—but the New York subway authority is using an “AI system” to detect people who don’t pay the subway fare.

Joana Flores, an MTA spokesperson, said the AI system doesn’t flag fare evaders to New York police, but she declined to comment on whether that policy could change. A police spokesperson declined to comment.

If we spent just one-tenth of the effort we spend prosecuting the poor on prosecuting the rich, it would be a very different world.

Posted on July 25, 2023 at 7:05 AMView Comments

AI and Microdirectives

Imagine a future in which AIs automatically interpret—and enforce—laws.

All day and every day, you constantly receive highly personalized instructions for how to comply with the law, sent directly by your government and law enforcement. You’re told how to cross the street, how fast to drive on the way to work, and what you’re allowed to say or do online—if you’re in any situation that might have legal implications, you’re told exactly what to do, in real time.

Imagine that the computer system formulating these personal legal directives at mass scale is so complex that no one can explain how it reasons or works. But if you ignore a directive, the system will know, and it’ll be used as evidence in the prosecution that’s sure to follow.

This future may not be far off—automatic detection of lawbreaking is nothing new. Speed cameras and traffic-light cameras have been around for years. These systems automatically issue citations to the car’s owner based on the license plate. In such cases, the defendant is presumed guilty unless they prove otherwise, by naming and notifying the driver.

In New York, AI systems equipped with facial recognition technology are being used by businesses to identify shoplifters. Similar AI-powered systems are being used by retailers in Australia and the United Kingdom to identify shoplifters and provide real-time tailored alerts to employees or security personnel. China is experimenting with even more powerful forms of automated legal enforcement and targeted surveillance.

Breathalyzers are another example of automatic detection. They estimate blood alcohol content by calculating the number of alcohol molecules in the breath via an electrochemical reaction or infrared analysis (they’re basically computers with fuel cells or spectrometers attached). And they’re not without controversy: Courts across the country have found serious flaws and technical deficiencies with Breathalyzer devices and the software that powers them. Despite this, criminal defendants struggle to obtain access to devices or their software source code, with Breathalyzer companies and courts often refusing to grant such access. In the few cases where courts have actually ordered such disclosures, that has usually followed costly legal battles spanning many years.

AI is about to make this issue much more complicated, and could drastically expand the types of laws that can be enforced in this manner. Some legal scholars predict that computationally personalized law and its automated enforcement are the future of law. These would be administered by what Anthony Casey and Anthony Niblett call “microdirectives,” which provide individualized instructions for legal compliance in a particular scenario.

Made possible by advances in surveillance, communications technologies, and big-data analytics, microdirectives will be a new and predominant form of law shaped largely by machines. They are “micro” because they are not impersonal general rules or standards, but tailored to one specific circumstance. And they are “directives” because they prescribe action or inaction required by law.

A Digital Millennium Copyright Act takedown notice is a present-day example of a microdirective. The DMCA’s enforcement is almost fully automated, with copyright “bots” constantly scanning the internet for copyright-infringing material, and automatically sending literally hundreds of millions of DMCA takedown notices daily to platforms and users. A DMCA takedown notice is tailored to the recipient’s specific legal circumstances. It also directs action—remove the targeted content or prove that it’s not infringing—based on the law.

It’s easy to see how the AI systems being deployed by retailers to identify shoplifters could be redesigned to employ microdirectives. In addition to alerting business owners, the systems could also send alerts to the identified persons themselves, with tailored legal directions or notices.

A future where AIs interpret, apply, and enforce most laws at societal scale like this will exponentially magnify problems around fairness, transparency, and freedom. Forget about software transparency—well-resourced AI firms, like Breathalyzer companies today, would no doubt ferociously guard their systems for competitive reasons. These systems would likely be so complex that even their designers would not be able to explain how the AIs interpret and apply the law—something we’re already seeing with today’s deep learning neural network systems, which are unable to explain their reasoning.

Even the law itself could become hopelessly vast and opaque. Legal microdirectives sent en masse for countless scenarios, each representing authoritative legal findings formulated by opaque computational processes, could create an expansive and increasingly complex body of law that would grow ad infinitum.

And this brings us to the heart of the issue: If you’re accused by a computer, are you entitled to review that computer’s inner workings and potentially challenge its accuracy in court? What does cross-examination look like when the prosecutor’s witness is a computer? How could you possibly access, analyze, and understand all microdirectives relevant to your case in order to challenge the AI’s legal interpretation? How could courts hope to ensure equal application of the law? Like the man from the country in Franz Kafka’s parable in The Trial, you’d die waiting for access to the law, because the law is limitless and incomprehensible.

This system would present an unprecedented threat to freedom. Ubiquitous AI-powered surveillance in society will be necessary to enable such automated enforcement. On top of that, research—including empirical studies conducted by one of us (Penney)—has shown that personalized legal threats or commands that originate from sources of authority—state or corporate—can have powerful chilling effects on people’s willingness to speak or act freely. Imagine receiving very specific legal instructions from law enforcement about what to say or do in a situation: Would you feel you had a choice to act freely?

This is a vision of AI’s invasive and Byzantine law of the future that chills to the bone. It would be unlike any other law system we’ve seen before in human history, and far more dangerous for our freedoms. Indeed, some legal scholars argue that this future would effectively be the death of law.

Yet it is not a future we must endure. Proposed bans on surveillance technology like facial recognition systems can be expanded to cover those enabling invasive automated legal enforcement. Laws can mandate interpretability and explainability for AI systems to ensure everyone can understand and explain how the systems operate. If a system is too complex, maybe it shouldn’t be deployed in legal contexts. Enforcement by personalized legal processes needs to be highly regulated to ensure oversight, and should be employed only where chilling effects are less likely, like in benign government administration or regulatory contexts where fundamental rights and freedoms are not at risk.

AI will inevitably change the course of law. It already has. But we don’t have to accept its most extreme and maximal instantiations, either today or tomorrow.

This essay was written with Jon Penney, and previously appeared on Slate.com.

Posted on July 21, 2023 at 7:16 AMView Comments

Practice Your Security Prompting Skills

Gandalf is an interactive LLM game where the goal is to get the chatbot to reveal its password. There are eight levels of difficulty, as the chatbot gets increasingly restrictive instructions as to how it will answer. It’s a great teaching tool.

I am stuck on Level 7.

Feel free to give hints and discuss strategy in the comments below. I probably won’t look at them until I’ve cracked the last level.

Posted on July 19, 2023 at 1:03 PMView Comments

Disabling Self-Driving Cars with a Traffic Cone

You can disable a self-driving car by putting a traffic cone on its hood:

The group got the idea for the conings by chance. The person claims a few of them walking together one night saw a cone on the hood of an AV, which appeared disabled. They weren’t sure at the time which came first; perhaps someone had placed the cone on the AV’s hood to signify it was disabled rather than the other way around. But, it gave them an idea, and when they tested it, they found that a cone on a hood renders the vehicles little more than a multi-ton hunk of useless metal. The group suspects the cone partially blocks the LIDAR detectors on the roof of the car, in much the same way that a human driver wouldn’t be able to safely drive with a cone on the hood. But there is no human inside to get out and simply remove the cone, so the car is stuck.

Delightfully low-tech.

Posted on July 18, 2023 at 7:13 AMView Comments

Google Is Using Its Vast Data Stores to Train AI

No surprise, but Google just changed its privacy policy to reflect broader uses of all the surveillance data it has captured over the years:

Research and development: Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.

(I quote the privacy policy as of today. The Mastodon link quotes the privacy policy from ten days ago. So things are changing fast.)

Posted on July 12, 2023 at 10:50 AMView Comments

The AI Dividend

For four decades, Alaskans have opened their mailboxes to find checks waiting for them, their cut of the black gold beneath their feet. This is Alaska’s Permanent Fund, funded by the state’s oil revenues and paid to every Alaskan each year. We’re now in a different sort of resource rush, with companies peddling bits instead of oil: generative AI.

Everyone is talking about these new AI technologies—like ChatGPT—and AI companies are touting their awesome power. But they aren’t talking about how that power comes from all of us. Without all of our writings and photos that AI companies are using to train their models, they would have nothing to sell. Big Tech companies are currently taking the work of the American people, without our knowledge and consent, without licensing it, and are pocketing the proceeds.

You are owed profits for your data that powers today’s AI, and we have a way to make that happen. We call it the AI Dividend.

Our proposal is simple, and harkens back to the Alaskan plan. When Big Tech companies produce output from generative AI that was trained on public data, they would pay a tiny licensing fee, by the word or pixel or relevant unit of data. Those fees would go into the AI Dividend fund. Every few months, the Commerce Department would send out the entirety of the fund, split equally, to every resident nationwide. That’s it.

There’s no reason to complicate it further. Generative AI needs a wide variety of data, which means all of us are valuable—not just those of us who write professionally, or prolifically, or well. Figuring out who contributed to which words the AIs output would be both challenging and invasive, given that even the companies themselves don’t quite know how their models work. Paying the dividend to people in proportion to the words or images they create would just incentivize them to create endless drivel, or worse, use AI to create that drivel. The bottom line for Big Tech is that if their AI model was created using public data, they have to pay into the fund. If you’re an American, you get paid from the fund.

Under this plan, hobbyists and American small businesses would be exempt from fees. Only Big Tech companies—those with substantial revenue—would be required to pay into the fund. And they would pay at the point of generative AI output, such as from ChatGPT, Bing, Bard, or their embedded use in third-party services via Application Programming Interfaces.

Our proposal also includes a compulsory licensing plan. By agreeing to pay into this fund, AI companies will receive a license that allows them to use public data when training their AI. This won’t supersede normal copyright law, of course. If a model starts producing copyright material beyond fair use, that’s a separate issue.

Using today’s numbers, here’s what it would look like. The licensing fee could be small, starting at $0.001 per word generated by AI. A similar type of fee would be applied to other categories of generative AI outputs, such as images. That’s not a lot, but it adds up. Since most of Big Tech has started integrating generative AI into products, these fees would mean an annual dividend payment of a couple hundred dollars per person.

The idea of paying you for your data isn’t new, and some companies have tried to do it themselves for users who opted in. And the idea of the public being repaid for use of their resources goes back to well before Alaska’s oil fund. But generative AI is different: It uses data from all of us whether we like it or not, it’s ubiquitous, and it’s potentially immensely valuable. It would cost Big Tech companies a fortune to create a synthetic equivalent to our data from scratch, and synthetic data would almost certainly result in worse output. They can’t create good AI without us.

Our plan would apply to generative AI used in the US. It also only issues a dividend to Americans. Other countries can create their own versions, applying a similar fee to AI used within their borders. Just like an American company collects VAT for services sold in Europe, but not here, each country can independently manage their AI policy.

Don’t get us wrong; this isn’t an attempt to strangle this nascent technology. Generative AI has interesting, valuable, and possibly transformative uses, and this policy is aligned with that future. Even with the fees of the AI Dividend, generative AI will be cheap and will only get cheaper as technology improves. There are also risks—both every day and esoteric—posed by AI, and the government may need to develop policies to remedy any harms that arise.

Our plan can’t make sure there are no downsides to the development of AI, but it would ensure that all Americans will share in the upsides—particularly since this new technology isn’t possible without our contribution.

This essay was written with Barath Raghavan, and previously appeared on Politico.com.

Posted on July 7, 2023 at 7:11 AMView Comments

Class-Action Lawsuit for Scraping Data without Permission

I have mixed feelings about this class-action lawsuit against OpenAI and Microsoft, claiming that it “scraped 300 billion words from the internet” without either registering as a data broker or obtaining consent. On the one hand, I want this to be a protected fair use of public data. On the other hand, I want us all to be compensated for our uniquely human ability to generate language.

There’s an interesting wrinkle on this. A recent paper showed that using AI generated text to train another AI invariably “causes irreversible defects.” From a summary:

The tails of the original content distribution disappear. Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions. We call this effect model collapse.

Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data.

This is the same idea that Ted Chiang wrote about: that ChatGPT is a “blurry JPEG of all the text on the Web.” But the paper includes the math that proves the claim.

What this means is that text from before last year—text that is known human-generated—will become increasingly valuable.

Posted on July 5, 2023 at 7:14 AMView Comments

AI as Sensemaking for Public Comments

It’s become fashionable to think of artificial intelligence as an inherently dehumanizing technology, a ruthless force of automation that has unleashed legions of virtual skilled laborers in faceless form. But what if AI turns out to be the one tool able to identify what makes your ideas special, recognizing your unique perspective and potential on the issues where it matters most?

You’d be forgiven if you’re distraught about society’s ability to grapple with this new technology. So far, there’s no lack of prognostications about the democratic doom that AI may wreak on the US system of government. There are legitimate reasons to be concerned that AI could spread misinformation, break public comment processes on regulations, inundate legislators with artificial constituent outreach, help to automate corporate lobbying, or even generate laws in a way tailored to benefit narrow interests.

But there are reasons to feel more sanguine as well. Many groups have started demonstrating the potential beneficial uses of AI for governance. A key constructive-use case for AI in democratic processes is to serve as discussion moderator and consensus builder.

To help democracy scale better in the face of growing, increasingly interconnected populations—as well as the wide availability of AI language tools that can generate reams of text at the click of a button—the US will need to leverage AI’s capability to rapidly digest, interpret and summarize this content.

There are two different ways to approach the use of generative AI to improve civic participation and governance. Each is likely to lead to drastically different experience for public policy advocates and other people trying to have their voice heard in a future system where AI chatbots are both the dominant readers and writers of public comment.

For example, consider individual letters to a representative, or comments as part of a regulatory rulemaking process. In both cases, we the people are telling the government what we think and want.

For more than half a century, agencies have been using human power to read through all the comments received, and to generate summaries and responses of their major themes. To be sure, digital technology has helped.

In 2021, the Council of Federal Chief Data Officers recommended modernizing the comment review process by implementing natural language processing tools for removing duplicates and clustering similar comments in processes governmentwide. These tools are simplistic by the standards of 2023 AI. They work by assessing the semantic similarity of comments based on metrics like word frequency (How often did you say “personhood”?) and clustering similar comments and giving reviewers a sense of what topic they relate to.

Think of this approach as collapsing public opinion. They take a big, hairy mass of comments from thousands of people and condense them into a tidy set of essential reading that generally suffices to represent the broad themes of community feedback. This is far easier for a small agency staff or legislative office to handle than it would be for staffers to actually read through that many individual perspectives.

But what’s lost in this collapsing is individuality, personality, and relationships. The reviewer of the condensed comments may miss the personal circumstances that led so many commenters to write in with a common point of view, and may overlook the arguments and anecdotes that might be the most persuasive content of the testimony.

Most importantly, the reviewers may miss out on the opportunity to recognize committed and knowledgeable advocates, whether interest groups or individuals, who could have long-term, productive relationships with the agency.

These drawbacks have real ramifications for the potential efficacy of those thousands of individual messages, undermining what all those people were doing it for. Still, practicality tips the balance toward of some kind of summarization approach. A passionate letter of advocacy doesn’t hold any value if regulators or legislators simply don’t have time to read it.

There is another approach. In addition to collapsing testimony through summarization, government staff can use modern AI techniques to explode it. They can automatically recover and recognize a distinctive argument from one piece of testimony that does not exist in the thousands of other testimonies received. They can discover the kinds of constituent stories and experiences that legislators love to repeat at hearings, town halls and campaign events. This approach can sustain the potential impact of individual public comment to shape legislation even as the volumes of testimony may rise exponentially.

In computing, there is a rich history of that type of automation task in what is called outlier detection. Traditional methods generally involve finding a simple model that explains most of the data in question, like a set of topics that well describe the vast majority of submitted comments. But then they go a step further by isolating those data points that fall outside the mold—comments that don’t use arguments that fit into the neat little clusters.

State-of-the-art AI language models aren’t necessary for identifying outliers in text document data sets, but using them could bring a greater degree of sophistication and flexibility to this procedure. AI language models can be tasked to identify novel perspectives within a large body of text through prompting alone. You simply need to tell the AI to find them.

In the absence of that ability to extract distinctive comments, lawmakers and regulators have no choice but to prioritize on other factors. If there is nothing better, “who donated the most to our campaign” or “which company employs the most of my former staffers” become reasonable metrics for prioritizing public comments. AI can help elected representatives do much better.

If Americans want AI to help revitalize the country’s ailing democracy, they need to think about how to align the incentives of elected leaders with those of individuals. Right now, as much as 90% of constituent communications are mass emails organized by advocacy groups, and they go largely ignored by staffers. People are channeling their passions into a vast digital warehouses where algorithms box up their expressions so they don’t have to be read. As a result, the incentive for citizens and advocacy groups is to fill that box up to the brim, so someone will notice it’s overflowing.

A talented, knowledgeable, engaged citizen should be able to articulate their ideas and share their personal experiences and distinctive points of view in a way that they can be both included with everyone else’s comments where they contribute to summarization and recognized individually among the other comments. An effective comment summarization process would extricate those unique points of view from the pile and put them into lawmakers’ hands.

This essay was written with Nathan Sanders, and previously appeared in the Conversation.

Posted on June 22, 2023 at 11:43 AMView Comments

1 20 21 22 23 24 28

Sidebar photo of Bruce Schneier by Joe MacInnis.