Interesting research on persistent web tracking using favicons. (For those who don’t know, favicons are those tiny icons that appear in browser tabs next to the page name.)
Abstract: The privacy threats of online tracking have garnered considerable attention in recent years from researchers and practitioners alike. This has resulted in users becoming more privacy-cautious and browser vendors gradually adopting countermeasures to mitigate certain forms of cookie-based and cookie-less tracking. Nonetheless, the complexity and feature-rich nature of modern browsers often lead to the deployment of seemingly innocuous functionality that can be readily abused by adversaries. In this paper we introduce a novel tracking mechanism that misuses a simple yet ubiquitous browser feature: favicons. In more detail, a website can track users across browsing sessions by storing a tracking identifier as a set of entries in the browser’s dedicated favicon cache, where each entry corresponds to a specific subdomain. In subsequent user visits the website can reconstruct the identifier by observing which favicons are requested by the browser while the user is automatically and rapidly redirected through a series of subdomains. More importantly, the caching of favicons in modern browsers exhibits several unique characteristics that render this tracking vector particularly powerful, as it is persistent (not affected by users clearing their browser data), non-destructive (reconstructing the identifier in subsequent visits does not alter the existing combination of cached entries), and even crosses the isolation of the incognito mode. We experimentally evaluate several aspects of our attack, and present a series of optimization techniques that render our attack practical. We find that combining our favicon-based tracking technique with immutable browser-fingerprinting attributes that do not change over time allows a website to reconstruct a 32-bit tracking identifier in 2 seconds. Furthermore,our attack works in all major browsers that use a favicon cache, including Chrome and Safari. Due to the severity of our attack we propose changes to browsers’ favicon caching behavior that can prevent this form of tracking, and have disclosed our findings to browser vendors who are currently exploring appropriate mitigation strategies.
Another researcher has implemented this proof of concept:
Strehle has set up a website that demonstrates how easy it is to track a user online using a favicon. He said it’s for research purposes, has released his source code online, and detailed a lengthy explanation of how supercookies work on his website.
The scariest part of the favicon vulnerability is how easily it bypasses traditional methods people use to keep themselves private online. According to Strehle, the supercookie bypasses the “private” mode of Chrome, Safari, Edge, and Firefox. Clearing your cache, surfing behind a VPN, or using an ad-blocker won’t stop a malicious favicon from tracking you.
Posted on February 17, 2021 at 6:05 AM •
Pile driving occurs during construction of marine platforms, including offshore windfarms, producing intense sounds that can adversely affect marine animals. We quantified how a commercially and economically important squid (Doryteuthis pealeii: Lesueur 1821) responded to pile driving sounds recorded from a windfarm installation within this species’ habitat. Fifteen-minute portions of these sounds were played to 16 individual squid. A subset of animals (n = 11) received a second exposure after a 24-h rest period. Body pattern changes, inking, jetting, and startle responses were observed and nearly all squid exhibited at least one response. These responses occurred primarily during the first 8 impulses and diminished quickly, indicating potential rapid, short-term habituation. Similar response rates were seen 24-h later, suggesting squid re-sensitized to the noise. Increased tolerance of anti-predatory alarm responses may alter squids’ ability to deter and evade predators. Noise exposure may also disrupt normal intraspecific communication and ecologically relevant responses to sound.
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Read my blog posting guidelines here.
Posted on January 29, 2021 at 4:06 PM •
Researchers have been able to find all sorts of personal information within GPT-2. This information was part of the training data, and can be extracted with the right sorts of queries.
Paper: “Extracting Training Data from Large Language Models.”
Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.
We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.
From a blog post:
We generated a total of 600,000 samples by querying GPT-2 with three different sampling strategies. Each sample contains 256 tokens, or roughly 200 words on average. Among these samples, we selected 1,800 samples with abnormally high likelihood for manual inspection. Out of the 1,800 samples, we found 604 that contain text which is reproduced verbatim from the training set.
The rest of the blog post discusses the types of data they found.
Posted on January 7, 2021 at 6:14 AM •
The microphones on voice assistants are very sensitive, and can snoop on all sorts of data:
In Hey Alexa what did I just type? we show that when sitting up to half a meter away, a voice assistant can still hear the taps you make on your phone, even in presence of noise. Modern voice assistants have two to seven microphones, so they can do directional localisation, just as human ears do, but with greater sensitivity. We assess the risk and show that a lot more work is needed to understand the privacy implications of the always-on microphones that are increasingly infesting our work spaces and our homes.
From the paper:
Abstract: Voice assistants are now ubiquitous and listen in on our everyday lives. Ever since they became commercially available, privacy advocates worried that the data they collect can be abused: might private conversations be extracted by third parties? In this paper we show that privacy threats go beyond spoken conversations and include sensitive data typed on nearby smartphones. Using two different smartphones and a tablet we demonstrate that the attacker can extract PIN codes and text messages from recordings collected by a voice assistant located up to half a meter away. This shows that remote keyboard-inference attacks are not limited to physical keyboards but extend to virtual keyboards too. As our homes become full of always-on microphones, we need to work through the implications.
Posted on December 22, 2020 at 10:21 AM •
This is a deep-diving species that “fed on small prey items such as squid.”
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Read my blog posting guidelines here.
Posted on December 11, 2020 at 4:10 PM •
This new protocol, called Oblivious DNS-over-HTTPS (ODoH), hides the websites you visit from your ISP.
Here’s how it works: ODoH wraps a layer of encryption around the DNS query and passes it through a proxy server, which acts as a go-between the internet user and the website they want to visit. Because the DNS query is encrypted, the proxy can’t see what’s inside, but acts as a shield to prevent the DNS resolver from seeing who sent the query to begin with.
Abstract: The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) havebeen gaining traction, ostensibly protecting traffic and hiding content from on-lookers. However, one of the criticisms ofDoT and DoH is brought to bear by the small number of large-scale deployments (e.g., Comcast, Google, Cloudflare): DNS resolvers can associate query contents with client identities in the form of IP addresses. Oblivious DNS over HTTPS (ODoH) safeguards against this problem. In this paper we ask what it would take to make ODoH practical? We describe ODoH, a practical DNS protocol aimed at resolving this issue by both protecting the client’s content and identity. We implement and deploy the protocol, and perform measurements to show that ODoH has comparable performance to protocols like DoH and DoT which are gaining widespread adoption,while improving client privacy, making ODoH a practical privacy enhancing replacement for the usage of DNS.
Posted on December 8, 2020 at 3:02 PM •
Quanta magazine recently published a breathless article on indistinguishability obfuscation — calling it the “‘crown jewel’ of cryptography” — and saying that it had finally been achieved, based on a recently published paper. I want to add some caveats to the discussion.
Basically, obfuscation makes a computer program “unintelligible” by performing its functionality. Indistinguishability obfuscation is more relaxed. It just means that two different programs that perform the same functionality can’t be distinguished from each other. A good definition is in this paper.
This is a pretty amazing theoretical result, and one to be excited about. We can now do obfuscation, and we can do it using assumptions that make real-world sense. The proofs are kind of ugly, but that’s okay — it’s a start. What it means in theory is that we have a fundamental theoretical result that we can use to derive a whole bunch of other cryptographic primitives.
But — and this is a big one — this result is not even remotely close to being practical. We’re talking multiple days to perform pretty simple calculations, using massively large blocks of computer code. And this is likely to remain true for a very long time. Unless researchers increase performance by many orders of magnitude, nothing in the real world will make use of this work anytime soon.
But but, consider fully homomorphic encryption. It, too, was initially theoretically interesting and completely impractical. And now, after decades of work, it seems to be almost just-barely maybe approaching practically useful. This could very well be on the same trajectory, and perhaps in twenty to thirty years we will be celebrating this early theoretical result as the beginning of a new theory of cryptography.
Posted on November 23, 2020 at 6:04 AM •
Blockchain voting is a spectacularly dumb idea for a whole bunch of reasons. I have generally quoted Matt Blaze:
Why is blockchain voting a dumb idea? Glad you asked.
- It doesn’t solve any problems civil elections actually have.
- It’s basically incompatible with “software independence”, considered an essential property.
- It can make ballot secrecy difficult or impossible.
I’ve also quoted this XKCD cartoon.
But now I have this excellent paper from MIT researchers:
“Going from Bad to Worse: From Internet Voting to Blockchain Voting”
Sunoo Park, Michael Specter, Neha Narula, and Ronald L. Rivest
Abstract: Voters are understandably concerned about election security. News reports of possible election interference by foreign powers, of unauthorized voting, of voter disenfranchisement, and of technological failures call into question the integrity of elections worldwide.This article examines the suggestions that “voting over the Internet” or “voting on the blockchain” would increase election security, and finds such claims to be wanting and misleading. While current election systems are far from perfect, Internet- and blockchain-based voting would greatly increase the risk of undetectable, nation-scale election failures.Online voting may seem appealing: voting from a computer or smart phone may seem convenient and accessible. However, studies have been inconclusive, showing that online voting may have little to no effect on turnout in practice, and it may even increase disenfranchisement. More importantly: given the current state of computer security, any turnout increase derived from with Internet- or blockchain-based voting would come at the cost of losing meaningful assurance that votes have been counted as they were cast, and not undetectably altered or discarded. This state of affairs will continue as long as standard tactics such as malware, zero days, and denial-of-service attacks continue to be effective.This article analyzes and systematizes prior research on the security risks of online and electronic voting, and show that not only do these risks persist in blockchain-based voting systems, but blockchains may introduce additional problems for voting systems. Finally, we suggest questions for critically assessing security risks of new voting system proposals.
You may have heard of Voatz, which uses blockchain for voting. It’s an insecure mess. And this is my general essay on blockchain. Short summary: it’s completely useless.
Posted on November 16, 2020 at 9:55 AM •
Research paper: Rick Wash, “How Experts Detect Phishing Scam Emails“:
Abstract: Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails received, they are not perfect and phishing remains one of the largest sources of security risk in technology and communication systems. To better understand the cognitive process that end users can use to identify phishing messages, I interviewed 21 IT experts about instances where they successfully identified emails as phishing in their own inboxes. IT experts naturally follow a three-stage process for identifying phishing emails. In the first stage, the email recipient tries to make sense of the email, and understand how it relates to other things in their life. As they do this, they notice discrepancies: little things that are “off” about the email. As the recipient notices more discrepancies, they feel a need for an alternative explanation for the email. At some point, some feature of the email — usually, the presence of a link requesting an action — triggers them to recognize that phishing is a possible alternative explanation. At this point, they become suspicious (stage two) and investigate the email by looking for technical details that can conclusively identify the email as phishing. Once they find such information, then they move to stage three and deal with the email by deleting it or reporting it. I discuss ways this process can fail, and implications for improving training of end users about phishing.
Posted on November 6, 2020 at 6:28 AM •
Accuracy isn’t great, but that it can be done at all is impressive.
Murtuza Jadiwala, a computer science professor heading the research project, said his team was able to identify the contents of texts by examining body movement of the participants. Specifically, they focused on the movement of their shoulders and arms to extrapolate the actions of their fingers as they typed.
Given the widespread use of high-resolution web cams during conference calls, Jadiwala was able to record and analyze slight pixel shifts around users’ shoulders to determine if they were moving left or right, forward or backward. He then created a software program that linked the movements to a list of commonly used words. He says the “text inference framework that uses the keystrokes detected from the video … predict[s] words that were most likely typed by the target user. We then comprehensively evaluate[d] both the keystroke/typing detection and text inference frameworks using data collected from a large number of participants.”
In a controlled setting, with specific chairs, keyboards and webcam, Jadiwala said he achieved an accuracy rate of 75 percent. However, in uncontrolled environments, accuracy dropped to only one out of every five words being correctly identified.
Other factors contribute to lower accuracy levels, he said, including whether long sleeve or short sleeve shirts were worn, and the length of a user’s hair. With long hair obstructing a clear view of the shoulders, accuracy plummeted.
Posted on November 4, 2020 at 10:28 AM •
Sidebar photo of Bruce Schneier by Joe MacInnis.