Entries Tagged "academic papers"

Page 2 of 71

Hacking Digitally Signed PDF Files

Interesting paper: “Shadow Attacks: Hiding and Replacing Content in Signed PDFs“:

Abstract: Digitally signed PDFs are used in contracts and invoices to guarantee the authenticity and integrity of their content. A user opening a signed PDF expects to see a warning in case of any modification. In 2019, Mladenov et al. revealed various parsing vulnerabilities in PDF viewer implementations.They showed attacks that could modify PDF documents without invalidating the signature. As a consequence, affected vendors of PDF viewers implemented countermeasures preventing all attacks.

This paper introduces a novel class of attacks, which we call shadow attacks. The shadow attacks circumvent all existing countermeasures and break the integrity protection of digitally signed PDFs. Compared to previous attacks, the shadow attacks do not abuse implementation issues in a PDF viewer. In contrast, shadow attacks use the enormous flexibility provided by the PDF specification so that shadow documents remain standard-compliant. Since shadow attacks abuse only legitimate features,they are hard to mitigate.

Our results reveal that 16 (including Adobe Acrobat and Foxit Reader) of the 29 PDF viewers tested were vulnerable to shadow attacks. We introduce our tool PDF-Attacker which can automatically generate shadow attacks. In addition, we implemented PDF-Detector to prevent shadow documents from being signed or forensically detect exploits after being applied to signed PDFs.

EDITED TO ADD (3/12): This was written about last summer.

Posted on March 8, 2021 at 6:10 AMView Comments

Browser Tracking Using Favicons

Interesting research on persistent web tracking using favicons. (For those who don’t know, favicons are those tiny icons that appear in browser tabs next to the page name.)

Abstract: The privacy threats of online tracking have garnered considerable attention in recent years from researchers and practitioners alike. This has resulted in users becoming more privacy-cautious and browser vendors gradually adopting countermeasures to mitigate certain forms of cookie-based and cookie-less tracking. Nonetheless, the complexity and feature-rich nature of modern browsers often lead to the deployment of seemingly innocuous functionality that can be readily abused by adversaries. In this paper we introduce a novel tracking mechanism that misuses a simple yet ubiquitous browser feature: favicons. In more detail, a website can track users across browsing sessions by storing a tracking identifier as a set of entries in the browser’s dedicated favicon cache, where each entry corresponds to a specific subdomain. In subsequent user visits the website can reconstruct the identifier by observing which favicons are requested by the browser while the user is automatically and rapidly redirected through a series of subdomains. More importantly, the caching of favicons in modern browsers exhibits several unique characteristics that render this tracking vector particularly powerful, as it is persistent (not affected by users clearing their browser data), non-destructive (reconstructing the identifier in subsequent visits does not alter the existing combination of cached entries), and even crosses the isolation of the incognito mode. We experimentally evaluate several aspects of our attack, and present a series of optimization techniques that render our attack practical. We find that combining our favicon-based tracking technique with immutable browser-fingerprinting attributes that do not change over time allows a website to reconstruct a 32-bit tracking identifier in 2 seconds. Furthermore,our attack works in all major browsers that use a favicon cache, including Chrome and Safari. Due to the severity of our attack we propose changes to browsers’ favicon caching behavior that can prevent this form of tracking, and have disclosed our findings to browser vendors who are currently exploring appropriate mitigation strategies.

Another researcher has implemented this proof of concept:

Strehle has set up a website that demonstrates how easy it is to track a user online using a favicon. He said it’s for research purposes, has released his source code online, and detailed a lengthy explanation of how supercookies work on his website.

The scariest part of the favicon vulnerability is how easily it bypasses traditional methods people use to keep themselves private online. According to Strehle, the supercookie bypasses the “private” mode of Chrome, Safari, Edge, and Firefox. Clearing your cache, surfing behind a VPN, or using an ad-blocker won’t stop a malicious favicon from tracking you.

EDITED TO ADD (3/12): There was an issue about whether this paper was inappropriately disclosed, and it was briefly deleted from the NDSS website. It was later put back with a prefatory note from the NDSS.

Posted on February 17, 2021 at 6:05 AMView Comments

Friday Squid Blogging: Squids Don’t Like Pile-Driving Noises

New research:

Pile driving occurs during construction of marine platforms, including offshore windfarms, producing intense sounds that can adversely affect marine animals. We quantified how a commercially and economically important squid (Doryteuthis pealeii: Lesueur 1821) responded to pile driving sounds recorded from a windfarm installation within this species’ habitat. Fifteen-minute portions of these sounds were played to 16 individual squid. A subset of animals (n = 11) received a second exposure after a 24-h rest period. Body pattern changes, inking, jetting, and startle responses were observed and nearly all squid exhibited at least one response. These responses occurred primarily during the first 8 impulses and diminished quickly, indicating potential rapid, short-term habituation. Similar response rates were seen 24-h later, suggesting squid re-sensitized to the noise. Increased tolerance of anti-predatory alarm responses may alter squids’ ability to deter and evade predators. Noise exposure may also disrupt normal intraspecific communication and ecologically relevant responses to sound.

Press release.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Posted on January 29, 2021 at 4:06 PMView Comments

Extracting Personal Information from Large Language Models Like GPT-2

Researchers have been able to find all sorts of personal information within GPT-2. This information was part of the training data, and can be extracted with the right sorts of queries.

Paper: “Extracting Training Data from Large Language Models.”

Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.

From a blog post:

We generated a total of 600,000 samples by querying GPT-2 with three different sampling strategies. Each sample contains 256 tokens, or roughly 200 words on average. Among these samples, we selected 1,800 samples with abnormally high likelihood for manual inspection. Out of the 1,800 samples, we found 604 that contain text which is reproduced verbatim from the training set.

The rest of the blog post discusses the types of data they found.

Posted on January 7, 2021 at 6:14 AMView Comments

Eavesdropping on Phone Taps from Voice Assistants

The microphones on voice assistants are very sensitive, and can snoop on all sorts of data:

In Hey Alexa what did I just type? we show that when sitting up to half a meter away, a voice assistant can still hear the taps you make on your phone, even in presence of noise. Modern voice assistants have two to seven microphones, so they can do directional localisation, just as human ears do, but with greater sensitivity. We assess the risk and show that a lot more work is needed to understand the privacy implications of the always-on microphones that are increasingly infesting our work spaces and our homes.

From the paper:

Abstract: Voice assistants are now ubiquitous and listen in on our everyday lives. Ever since they became commercially available, privacy advocates worried that the data they collect can be abused: might private conversations be extracted by third parties? In this paper we show that privacy threats go beyond spoken conversations and include sensitive data typed on nearby smartphones. Using two different smartphones and a tablet we demonstrate that the attacker can extract PIN codes and text messages from recordings collected by a voice assistant located up to half a meter away. This shows that remote keyboard-inference attacks are not limited to physical keyboards but extend to virtual keyboards too. As our homes become full of always-on microphones, we need to work through the implications.

Posted on December 22, 2020 at 10:21 AMView Comments

Oblivious DNS-over-HTTPS

This new protocol, called Oblivious DNS-over-HTTPS (ODoH), hides the websites you visit from your ISP.

Here’s how it works: ODoH wraps a layer of encryption around the DNS query and passes it through a proxy server, which acts as a go-between the internet user and the website they want to visit. Because the DNS query is encrypted, the proxy can’t see what’s inside, but acts as a shield to prevent the DNS resolver from seeing who sent the query to begin with.

IETF memo.

The paper:

Abstract: The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) havebeen gaining traction, ostensibly protecting traffic and hiding content from on-lookers. However, one of the criticisms ofDoT and DoH is brought to bear by the small number of large-scale deployments (e.g., Comcast, Google, Cloudflare): DNS resolvers can associate query contents with client identities in the form of IP addresses. Oblivious DNS over HTTPS (ODoH) safeguards against this problem. In this paper we ask what it would take to make ODoH practical? We describe ODoH, a practical DNS protocol aimed at resolving this issue by both protecting the client’s content and identity. We implement and deploy the protocol, and perform measurements to show that ODoH has comparable performance to protocols like DoH and DoT which are gaining widespread adoption,while improving client privacy, making ODoH a practical privacy enhancing replacement for the usage of DNS.

Slashdot thread.

Posted on December 8, 2020 at 3:02 PMView Comments

Indistinguishability Obfuscation

Quanta magazine recently published a breathless article on indistinguishability obfuscation — calling it the “‘crown jewel’ of cryptography” — and saying that it had finally been achieved, based on a recently published paper. I want to add some caveats to the discussion.

Basically, obfuscation makes a computer program “unintelligible” by performing its functionality. Indistinguishability obfuscation is more relaxed. It just means that two different programs that perform the same functionality can’t be distinguished from each other. A good definition is in this paper.

This is a pretty amazing theoretical result, and one to be excited about. We can now do obfuscation, and we can do it using assumptions that make real-world sense. The proofs are kind of ugly, but that’s okay — it’s a start. What it means in theory is that we have a fundamental theoretical result that we can use to derive a whole bunch of other cryptographic primitives.

But — and this is a big one — this result is not even remotely close to being practical. We’re talking multiple days to perform pretty simple calculations, using massively large blocks of computer code. And this is likely to remain true for a very long time. Unless researchers increase performance by many orders of magnitude, nothing in the real world will make use of this work anytime soon.

But but, consider fully homomorphic encryption. It, too, was initially theoretically interesting and completely impractical. And now, after decades of work, it seems to be almost just-barely maybe approaching practically useful. This could very well be on the same trajectory, and perhaps in twenty to thirty years we will be celebrating this early theoretical result as the beginning of a new theory of cryptography.

Posted on November 23, 2020 at 6:04 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.