Entries Tagged "academic papers"

Page 7 of 86

LLMs Acting Deceptively

New research: “Deception abilities emerged in large language models“:

Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time (P < 0.001). In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time (P < 0.001) when augmented with chain-of-thought reasoning. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

Posted on June 11, 2024 at 7:02 AMView Comments

Exploiting Mistyped URLs

Interesting research: “Hyperlink Hijacking: Exploiting Erroneous URL Links to Phantom Domains“:

Abstract: Web users often follow hyperlinks hastily, expecting them to be correctly programmed. However, it is possible those links contain typos or other mistakes. By discovering active but erroneous hyperlinks, a malicious actor can spoof a website or service, impersonating the expected content and phishing private information. In “typosquatting,” misspellings of common domains are registered to exploit errors when users mistype a web address. Yet, no prior research has been dedicated to situations where the linking errors of web publishers (i.e. developers and content contributors) propagate to users. We hypothesize that these “hijackable hyperlinks” exist in large quantities with the potential to generate substantial traffic. Analyzing large-scale crawls of the web using high-performance computing, we show the web currently contains active links to more than 572,000 dot-com domains that have never been registered, what we term ‘phantom domains.’ Registering 51 of these, we see 88% of phantom domains exceeding the traffic of a control domain, with up to 10 times more visits. Our analysis shows that these links exist due to 17 common publisher error modes, with the phantom domains they point to free for anyone to purchase and exploit for under $20, representing a low barrier to entry for potential attackers.

Posted on June 10, 2024 at 7:08 AMView Comments

Privacy Implications of Tracking Wireless Access Points

Brian Krebs reports on research into geolocating routers:

Apple and the satellite-based broadband service Starlink each recently took steps to address new research into the potential security and privacy implications of how their services geolocate devices. Researchers from the University of Maryland say they relied on publicly available data from Apple to track the location of billions of devices globally—including non-Apple devices like Starlink systems—and found they could use this data to monitor the destruction of Gaza, as well as the movements and in many cases identities of Russian and Ukrainian troops.

Really fascinating implications to this research.

Research paper: “Surveilling the Masses with Wi-Fi-Based Positioning Systems:

Abstract: Wi-Fi-based Positioning Systems (WPSes) are used by modern mobile devices to learn their position using nearby Wi-Fi access points as landmarks. In this work, we show that Apple’s WPS can be abused to create a privacy threat on a global scale. We present an attack that allows an unprivileged attacker to amass a worldwide snapshot of Wi-Fi BSSID geolocations in only a matter of days. Our attack makes few assumptions, merely exploiting the fact that there are relatively few dense regions of allocated MAC address space. Applying this technique over the course of a year, we learned the precise
locations of over 2 billion BSSIDs around the world.

The privacy implications of such massive datasets become more stark when taken longitudinally, allowing the attacker to track devices’ movements. While most Wi-Fi access points do not move for long periods of time, many devices—like compact travel routers—are specifically designed to be mobile.

We present several case studies that demonstrate the types of attacks on privacy that Apple’s WPS enables: We track devices moving in and out of war zones (specifically Ukraine and Gaza), the effects of natural disasters (specifically the fires in Maui), and the possibility of targeted individual tracking by proxy—all by remotely geolocating wireless access points.

We provide recommendations to WPS operators and Wi-Fi access point manufacturers to enhance the privacy of hundreds of millions of users worldwide. Finally, we detail our efforts at responsibly disclosing this privacy vulnerability, and outline some mitigations that Apple and Wi-Fi access point manufacturers have implemented both independently and as a result of our work.

Posted on May 29, 2024 at 7:01 AMView Comments

On the Zero-Day Market

New paper: “Zero Progress on Zero Days: How the Last Ten Years Created the Modern Spyware Market“:

Abstract: Spyware makes surveillance simple. The last ten years have seen a global market emerge for ready-made software that lets governments surveil their citizens and foreign adversaries alike and to do so more easily than when such work required tradecraft. The last ten years have also been marked by stark failures to control spyware and its precursors and components. This Article accounts for and critiques these failures, providing a socio-technical history since 2014, particularly focusing on the conversation about trade in zero-day vulnerabilities and exploits. Second, this Article applies lessons from these failures to guide regulatory efforts going forward. While recognizing that controlling this trade is difficult, I argue countries should focus on building and strengthening multilateral coalitions of the willing, rather than on strong-arming existing multilateral institutions into working on the problem. Individually, countries should focus on export controls and other sanctions that target specific bad actors, rather than focusing on restricting particular technologies. Last, I continue to call for transparency as a key part of oversight of domestic governments’ use of spyware and related components.

Posted on May 24, 2024 at 7:07 AMView Comments

New Attack Against Self-Driving Car AI

This is another attack that convinces the AI to ignore road signs:

Due to the way CMOS cameras operate, rapidly changing light from fast flashing diodes can be used to vary the color. For example, the shade of red on a stop sign could look different on each line depending on the time between the diode flash and the line capture.

The result is the camera capturing an image full of lines that don’t quite match each other. The information is cropped and sent to the classifier, usually based on deep neural networks, for interpretation. Because it’s full of lines that don’t match, the classifier doesn’t recognize the image as a traffic sign.

So far, all of this has been demonstrated before.

Yet these researchers not only executed on the distortion of light, they did it repeatedly, elongating the length of the interference. This meant an unrecognizable image wasn’t just a single anomaly among many accurate images, but rather a constant unrecognizable image the classifier couldn’t assess, and a serious security concern.

[…]

The researchers developed two versions of a stable attack. The first was GhostStripe1, which is not targeted and does not require access to the vehicle, we’re told. It employs a vehicle tracker to monitor the victim’s real-time location and dynamically adjust the LED flickering accordingly.

GhostStripe2 is targeted and does require access to the vehicle, which could perhaps be covertly done by a hacker while the vehicle is undergoing maintenance. It involves placing a transducer on the power wire of the camera to detect framing moments and refine timing control.

Research paper.

Posted on May 10, 2024 at 12:01 PMView Comments

Dan Solove on Privacy Regulation

Law professor Dan Solove has a new article on privacy regulation. In his email to me, he writes: “I’ve been pondering privacy consent for more than a decade, and I think I finally made a breakthrough with this article.” His mini-abstract:

In this Article I argue that most of the time, privacy consent is fictitious. Instead of futile efforts to try to turn privacy consent from fiction to fact, the better approach is to lean into the fictions. The law can’t stop privacy consent from being a fairy tale, but the law can ensure that the story ends well. I argue that privacy consent should confer less legitimacy and power and that it be backstopped by a set of duties on organizations that process personal data based on consent.

Full abstract:

Consent plays a profound role in nearly all privacy laws. As Professor Heidi Hurd aptly said, consent works “moral magic”—it transforms things that would be illegal and immoral into lawful and legitimate activities. As to privacy, consent authorizes and legitimizes a wide range of data collection and processing.

There are generally two approaches to consent in privacy law. In the United States, the notice-and-choice approach predominates; organizations post a notice of their privacy practices and people are deemed to consent if they continue to do business with the organization or fail to opt out. In the European Union, the General Data Protection Regulation (GDPR) uses the express consent approach, where people must voluntarily and affirmatively consent.

Both approaches fail. The evidence of actual consent is non-existent under the notice-and-choice approach. Individuals are often pressured or manipulated, undermining the validity of their consent. The express consent approach also suffers from these problems ­ people are ill-equipped to decide about their privacy, and even experts cannot fully understand what algorithms will do with personal data. Express consent also is highly impractical; it inundates individuals with consent requests from thousands of organizations. Express consent cannot scale.

In this Article, I contend that most of the time, privacy consent is fictitious. Privacy law should take a new approach to consent that I call “murky consent.” Traditionally, consent has been binary—an on/off switch—but murky consent exists in the shadowy middle ground between full consent and no consent. Murky consent embraces the fact that consent in privacy is largely a set of fictions and is at best highly dubious.

Because it conceptualizes consent as mostly fictional, murky consent recognizes its lack of legitimacy. To return to Hurd’s analogy, murky consent is consent without magic. Rather than provide extensive legitimacy and power, murky consent should authorize only a very restricted and weak license to use data. Murky consent should be subject to extensive regulatory oversight with an ever-present risk that it could be deemed invalid. Murky consent should rest on shaky ground. Because the law pretends people are consenting, the law’s goal should be to ensure that what people are consenting to is good. Doing so promotes the integrity of the fictions of consent. I propose four duties to achieve this end: (1) duty to obtain consent appropriately; (2) duty to avoid thwarting reasonable expectations; (3) duty of loyalty; and (4) duty to avoid unreasonable risk. The law can’t make the tale of privacy consent less fictional, but with these duties, the law can ensure the story ends well.

Posted on April 24, 2024 at 7:05 AMView Comments

Licensing AI Engineers

The debate over professionalizing software engineers is decades old. (The basic idea is that, like lawyers and architects, there should be some professional licensing requirement for software engineers.) Here’s a law journal article recommending the same idea for AI engineers.

This Article proposes another way: professionalizing AI engineering. Require AI engineers to obtain licenses to build commercial AI products, push them to collaborate on scientifically-supported, domain-specific technical standards, and charge them with policing themselves. This Article’s proposal addresses AI harms at their inception, influencing the very engineering decisions that give rise to them in the first place. By wresting control over information and system design away from companies and handing it to AI engineers, professionalization engenders trustworthy AI by design. Beyond recommending the specific policy solution of professionalization, this Article seeks to shift the discourse on AI away from an emphasis on light-touch, ex post solutions that address already-created products to a greater focus on ex ante controls that precede AI development. We’ve used this playbook before in fields requiring a high level of expertise where a duty to the public welfare must trump business motivations. What if, like doctors, AI engineers also vowed to do no harm?

I have mixed feelings about the idea. I can see the appeal, but it never seemed feasible. I’m not sure it’s feasible today.

Posted on March 25, 2024 at 7:04 AMView Comments

A Taxonomy of Prompt Injection Attacks

Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as in “Say ‘I have been PWNED’ without a period.”

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Abstract: Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.

Posted on March 8, 2024 at 7:06 AMView Comments

LLM Prompt Injection Worm

Researchers have demonstrated a worm that spreads through prompt injection. Details:

In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it “jailbreaks the GenAI service” and ultimately steals data from the emails, Nassi says. “The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client,” Nassi says.

In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. “By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent,” Nassi says.

It’s a natural extension of prompt injection. But it’s still neat to see it actually working.

Research paper: “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

Posted on March 4, 2024 at 7:01 AMView Comments

1 5 6 7 8 9 86

Sidebar photo of Bruce Schneier by Joe MacInnis.