Entries Tagged "transparency"

Page 1 of 5

Large Language Models and Elections

Earlier this week, the Republican National Committee released a video that it claims was “built entirely with AI imagery.” The content of the ad isn’t especially novel—a dystopian vision of America under a second term with President Joe Biden—but the deliberate emphasis on the technology used to create it stands out: It’s a “Daisy” moment for the 2020s.

We should expect more of this kind of thing. The applications of AI to political advertising have not escaped campaigners, who are already “pressure testing” possible uses for the technology. In the 2024 presidential election campaign, you can bank on the appearance of AI-generated personalized fundraising emails, text messages from chatbots urging you to vote, and maybe even some deepfaked campaign avatars. Future candidates could use chatbots trained on data representing their views and personalities to approximate the act of directly connecting with people. Think of it like a whistle-stop tour with an appearance in every living room. Previous technological revolutions—railroad, radio, television, and the World Wide Web—transformed how candidates connect to their constituents, and we should expect the same from generative AI. This isn’t science fiction: The era of AI chatbots standing in as avatars for real, individual people has already begun, as the journalist Casey Newton made clear in a 2016 feature about a woman who used thousands of text messages to create a chatbot replica of her best friend after he died.

The key is interaction. A candidate could use tools enabled by large language models, or LLMs—the technology behind apps such as ChatGPT and the art-making DALL-E—to do micro-polling or message testing, and to solicit perspectives and testimonies from their political audience individually and at scale. The candidates could potentially reach any voter who possesses a smartphone or computer, not just the ones with the disposable income and free time to attend a campaign rally. At its best, AI could be a tool to increase the accessibility of political engagement and ease polarization. At its worst, it could propagate misinformation and increase the risk of voter manipulation. Whatever the case, we know political operatives are using these tools. To reckon with their potential now isn’t buying into the hype—it’s preparing for whatever may come next.

On the positive end, and most profoundly, LLMs could help people think through, refine, or discover their own political ideologies. Research has shown that many voters come to their policy positions reflexively, out of a sense of partisan affiliation. The very act of reflecting on these views through discourse can change, and even depolarize, those views. It can be hard to have reflective policy conversations with an informed, even-keeled human discussion partner when we all live within a highly charged political environment; this is a role almost custom-designed for LLM. In US politics, it is a truism that the most valuable resource in a campaign is time. People are busy and distracted. Campaigns have a limited window to convince and activate voters. Money allows a candidate to purchase time: TV commercials, labor from staffers, and fundraising events to raise even more money. LLMs could provide campaigns with what is essentially a printing press for time.

If you were a political operative, which would you rather do: play a short video on a voter’s TV while they are folding laundry in the next room, or exchange essay-length thoughts with a voter on your candidate’s key issues? A staffer knocking on doors might need to canvass 50 homes over two hours to find one voter willing to have a conversation. OpenAI charges pennies to process about 800 words with its latest GPT-4 model, and that cost could fall dramatically as competitive AIs become available. People seem to enjoy interacting with chatbots; Open’s product reportedly has the fastest-growing user base in the history of consumer apps.

Optimistically, one possible result might be that we’ll get less annoyed with the deluge of political ads if their messaging is more usefully tailored to our interests by AI tools. Though the evidence for microtargeting’s effectiveness is mixed at best, some studies show that targeting the right issues to the right people can persuade voters. Expecting more sophisticated, AI-assisted approaches to be more consistently effective is reasonable. And anything that can prevent us from seeing the same 30-second campaign spot 20 times a day seems like a win.

AI can also help humans effectuate their political interests. In the 2016 US presidential election, primitive chatbots had a role in donor engagement and voter-registration drives: simple messaging tasks such as helping users pre-fill a voter-registration form or reminding them where their polling place is. If it works, the current generation of much more capable chatbots could supercharge small-dollar solicitations and get-out-the-vote campaigns.

And the interactive capability of chatbots could help voters better understand their choices. An AI chatbot could answer questions from the perspective of a candidate about the details of their policy positions most salient to an individual user, or respond to questions about how a candidate’s stance on a national issue translates to a user’s locale. Political organizations could similarly use them to explain complex policy issues, such as those relating to the climate or health care or…anything, really.

Of course, this could also go badly. In the time-honored tradition of demagogues worldwide, the LLM could inconsistently represent the candidate’s views to appeal to the individual proclivities of each voter.

In fact, the fundamentally obsequious nature of the current generation of large language models results in them acting like demagogues. Current LLMs are known to hallucinate—or go entirely off-script—and produce answers that have no basis in reality. These models do not experience emotion in any way, but some research suggests they have a sophisticated ability to assess the emotion and tone of their human users. Although they weren’t trained for this purpose, ChatGPT and its successor, GPT-4, may already be pretty good at assessing some of their users’ traits—say, the likelihood that the author of a text prompt is depressed. Combined with their persuasive capabilities, that means that they could learn to skillfully manipulate the emotions of their human users.

This is not entirely theoretical. A growing body of evidence demonstrates that interacting with AI has a persuasive effect on human users. A study published in February prompted participants to co-write a statement about the benefits of social-media platforms for society with an AI chatbot configured to have varying views on the subject. When researchers surveyed participants after the co-writing experience, those who interacted with a chatbot that expressed that social media is good or bad were far more likely to express the same view than a control group that didn’t interact with an “opinionated language model.”

For the time being, most Americans say they are resistant to trusting AI in sensitive matters such as health care. The same is probably true of politics. If a neighbor volunteering with a campaign persuades you to vote a particular way on a local ballot initiative, you might feel good about that interaction. If a chatbot does the same thing, would you feel the same way? To help voters chart their own course in a world of persuasive AI, we should demand transparency from our candidates. Campaigns should have to clearly disclose when a text agent interacting with a potential voter—through traditional robotexting or the use of the latest AI chatbots—is human or automated.

Though companies such as Meta (Facebook’s parent company) and Alphabet (Google’s) publish libraries of traditional, static political advertising, they do so poorly. These systems would need to be improved and expanded to accommodate user-level differentiation in ad copy to offer serviceable protection against misuse.

A public, anonymized log of chatbot conversations could help hold candidates’ AI representatives accountable for shifting statements and digital pandering. Candidates who use chatbots to engage voters may not want to make all transcripts of those conversations public, but their users could easily choose to share them. So far, there is no shortage of people eager to share their chat transcripts, and in fact, an online database exists of nearly 200,000 of them. In the recent past, Mozilla has galvanized users to opt into sharing their web data to study online misinformation.

We also need stronger nationwide protections on data privacy, as well as the ability to opt out of targeted advertising, to protect us from the potential excesses of this kind of marketing. No one should be forcibly subjected to political advertising, LLM-generated or not, on the basis of their Internet searches regarding private matters such as medical issues. In February, the European Parliament voted to limit political-ad targeting to only basic information, such as language and general location, within two months of an election. This stands in stark contrast to the US, which has for years failed to enact federal data-privacy regulations. Though the 2018 revelation of the Cambridge Analytica scandal led to billions of dollars in fines and settlements against Facebook, it has so far resulted in no substantial legislative action.

Transparency requirements like these are a first step toward oversight of future AI-assisted campaigns. Although we should aspire to more robust legal controls on campaign uses of AI, it seems implausible that these will be adopted in advance of the fast-approaching 2024 general presidential election.

Credit the RNC, at least, with disclosing that their recent ad was AI-generated—a transparent attempt at publicity still counts as transparency. But what will we do if the next viral AI-generated ad tries to pass as something more conventional?

As we are all being exposed to these rapidly evolving technologies for the first time and trying to understand their potential uses and effects, let’s push for the kind of basic transparency protection that will allow us to know what we’re dealing with.

This essay was written with Nathan Sanders, and previously appeared on the Atlantic.

EDITED TO ADD (5/12): Better article on the “daisy” ad.

Posted on May 4, 2023 at 6:45 AMView Comments

Facebook Has No Idea What Data It Has

This is from a court deposition:

Facebook’s stonewalling has been revealing on its own, providing variations on the same theme: It has amassed so much data on so many billions of people and organized it so confusingly that full transparency is impossible on a technical level. In the March 2022 hearing, Zarashaw and Steven Elia, a software engineering manager, described Facebook as a data-processing apparatus so complex that it defies understanding from within. The hearing amounted to two high-ranking engineers at one of the most powerful and resource-flush engineering outfits in history describing their product as an unknowable machine.

The special master at times seemed in disbelief, as when he questioned the engineers over whether any documentation existed for a particular Facebook subsystem. “Someone must have a diagram that says this is where this data is stored,” he said, according to the transcript. Zarashaw responded: “We have a somewhat strange engineering culture compared to most where we don’t generate a lot of artifacts during the engineering process. Effectively the code is its own design document often.” He quickly added, “For what it’s worth, this is terrifying to me when I first joined as well.”

[…]

Facebook’s inability to comprehend its own functioning took the hearing up to the edge of the metaphysical. At one point, the court-appointed special master noted that the “Download Your Information” file provided to the suit’s plaintiffs must not have included everything the company had stored on those individuals because it appears to have no idea what it truly stores on anyone. Can it be that Facebook’s designated tool for comprehensively downloading your information might not actually download all your information? This, again, is outside the boundaries of knowledge.

“The solution to this is unfortunately exactly the work that was done to create the DYI file itself,” noted Zarashaw. “And the thing I struggle with here is in order to find gaps in what may not be in DYI file, you would by definition need to do even more work than was done to generate the DYI files in the first place.”

The systemic fogginess of Facebook’s data storage made answering even the most basic question futile. At another point, the special master asked how one could find out which systems actually contain user data that was created through machine inference.

“I don’t know,” answered Zarashaw. “It’s a rather difficult conundrum.”

I’m not surprised. These systems are so complex that no humans understand them anymore. That allows us to do things we couldn’t do otherwise, but it’s also a problem.

EDITED TO ADD: Another article.

Posted on September 8, 2022 at 10:14 AMView Comments

Vendors are Fixing Security Flaws Faster

Google’s Project Zero is reporting that software vendors are patching their code faster.

tl;dr

  • In 2021, vendors took an average of 52 days to fix security vulnerabilities reported from Project Zero. This is a significant acceleration from an average of about 80 days 3 years ago.
  • In addition to the average now being well below the 90-day deadline, we have also seen a dropoff in vendors missing the deadline (or the additional 14-day grace period). In 2021, only one bug exceeded its fix deadline, though 14% of bugs required the grace period.
  • Differences in the amount of time it takes a vendor/product to ship a fix to users reflects their product design, development practices, update cadence, and general processes towards security reports. We hope that this comparison can showcase best practices, and encourage vendors to experiment with new policies.
  • This data aggregation and analysis is relatively new for Project Zero, but we hope to do it more in the future. We encourage all vendors to consider publishing aggregate data on their time-to-fix and time-to-patch for externally reported vulnerabilities, as well as more data sharing and transparency in general.

Posted on February 16, 2022 at 7:00 AMView Comments

SpiderOak's Warrant Canary Died

BoingBoing has the story.

I have never quite trusted the idea of a warrant canary. But here it seems to have worked. (Presumably, if SpiderOak wanted to replace the warrant canary with a transparency report, they would have written something explaining their decision. To have it simply disappear is what we would expect if SpiderOak were being forced to comply with a US government request for personal data.)

EDITED TO ADD (8/9): SpiderOak has posted an explanation claiming that the warrant canary did not die—it just changed.

That’s obviously false, because it did die. And a change is the functional equivalent—that’s how they work. So either they have received a National Security Letter and now have to pretend they did not, or they completely misunderstood what a warrant canary is and how it works. No one knows.

I have never fully trusted warrant canaries—this EFF post explains why—and this is an illustration.

Posted on August 8, 2018 at 9:37 AMView Comments

The Fallibility of DNA Evidence

This is a good summary article on the fallibility of DNA evidence. Most interesting to me are the parts on the proprietary algorithms used in DNA matching:

William Thompson points out that Perlin has declined to make public the algorithm that drives the program. “You do have a black-box situation happening here,” Thompson told me. “The data go in, and out comes the solution, and we’re not fully informed of what happened in between.”

Last year, at a murder trial in Pennsylvania where TrueAllele evidence had been introduced, defense attorneys demanded that Perlin turn over the source code for his software, noting that “without it, [the defendant] will be unable to determine if TrueAllele does what Dr. Perlin claims it does.” The judge denied the request.

[…]

When I interviewed Perlin at Cybergenetics headquarters, I raised the matter of transparency. He was visibly annoyed. He noted that he’d published detailed papers on the theory behind TrueAllele, and filed patent applications, too: “We have disclosed not the trade secrets of the source code or the engineering details, but the basic math.”

It’s the same problem as any biometric: we need to know the rates of both false positives and false negatives. And if these algorithms are being used to determine guilt, we have a right to examine them.

EDITED TO ADD (6/13): Three more articles.

Posted on May 31, 2016 at 1:04 PMView Comments

Reddit's Warrant Canary Just Died

Reddit has received a National Security Letter.

I have long discounted warrant canaries. A gag order is serious, and this sort of high-school trick won’t fool judges for a minute. But so far they seem to be working.

Now we have another question: now what? We have one piece of information, but not a very useful one. We know that NSLs can affect anywhere from a single user to millions of users. Which kind was this? We have no idea. Is Reddit fighting? We have no idea. How long will this go on? We don’t know that, either. When I think about what we can do to be useful here, I can’t think of anything.

Posted on April 1, 2016 at 3:16 PMView Comments

The Need for Transparency in Surveillance

In Data and Goliath, I talk about the need for transparency, oversight, and accountability as the mechanism to allow surveillance when it is necessary, while preserving our security against excessive surveillance and surveillance abuse.

James Losey has a new paper that discusses the need for transparency in surveillance. His conclusion:

Available transparency reports from ICT companies demonstrate the rise in government requests to obtain user communications data. However, revelations on the surveillance capabilities of the United States, Sweden, the UK, and other countries demonstrate that the available data is insufficient and falls short of supporting rational debate. Companies can contribute by increasing granularity, particularly on the legal processes through which they are required to reveal user data. However, the greatest gaps remain in the information provided directly from governments. Current understanding of the scope of surveillance can be credited to whistleblowers risking prosecution in order to publicize illegitimate government activity. The lack of transparency on government access to communications data and the legal processes used undermines the legitimacy of the practices.

Transparency alone will not eliminate barriers to freedom of expression or harm to privacy resulting from overly broad surveillance. Transparency provides a window into the scope of current practices and additional measures are needed such as oversight and mechanisms for redress in cases of unlawful surveillance. Furthermore, international data collection results in the surveillance of individuals and communities beyond the scope of a national debate. Transparency offers a necessary first step, a foundation on which to examine current practices and contribute to a debate on human security and freedom. Transparency is not the sole responsibility of any one country, and governments, in addition to companies, are well positioned to provide accurate and timely data to support critical debate on policies and laws that result in censorship and surveillance. Supporting an informed debate should be the goal of all democratic nations.

Posted on October 27, 2015 at 9:52 AMView Comments

1 2 3 5

Sidebar photo of Bruce Schneier by Joe MacInnis.