Entries Tagged "academic papers"

Page 3 of 77

Security Analysis of Threema

A group of Swiss researchers have published an impressive security analysis of Threema.

We provide an extensive cryptographic analysis of Threema, a Swiss-based encrypted messaging application with more than 10 million users and 7000 corporate customers. We present seven different attacks against the protocol in three different threat models. As one example, we present a cross-protocol attack which breaks authentication in Threema and which exploits the lack of proper key separation between different sub-protocols. As another, we demonstrate a compression-based side-channel attack that recovers users’ long-term private keys through observation of the size of Threema encrypted back-ups. We discuss remediations for our attacks and draw three wider lessons for developers of secure protocols.

From a news article:

Threema has more than 10 million users, which include the Swiss government, the Swiss army, German Chancellor Olaf Scholz, and other politicians in that country. Threema developers advertise it as a more secure alternative to Meta’s WhatsApp messenger. It’s among the top Android apps for a fee-based category in Switzerland, Germany, Austria, Canada, and Australia. The app uses a custom-designed encryption protocol in contravention of established cryptographic norms.

The company is performing the usual denials and deflections:

In a web post, Threema officials said the vulnerabilities applied to an old protocol that’s no longer in use. It also said the researchers were overselling their findings.

“While some of the findings presented in the paper may be interesting from a theoretical standpoint, none of them ever had any considerable real-world impact,” the post stated. “Most assume extensive and unrealistic prerequisites that would have far greater consequences than the respective finding itself.”

Left out of the statement is that the protocol the researchers analyzed is old because they disclosed the vulnerabilities to Threema, and Threema updated it.

Posted on January 19, 2023 at 7:21 AMView Comments

AI and Political Lobbying

Launched just weeks ago, ChatGPT is already threatening to upend how we draft everyday communications like emails, college essays and myriad other forms of writing.

Created by the company OpenAI, ChatGPT is a chatbot that can automatically respond to written prompts in a manner that is sometimes eerily close to human.

But for all the consternation over the potential for humans to be replaced by machines in formats like poetry and sitcom scripts, a far greater threat looms: artificial intelligence replacing humans in the democratic processes—not through voting, but through lobbying.

ChatGPT could automatically compose comments submitted in regulatory processes. It could write letters to the editor for publication in local newspapers. It could comment on news articles, blog entries and social media posts millions of times every day. It could mimic the work that the Russian Internet Research Agency did in its attempt to influence our 2016 elections, but without the agency’s reported multimillion-dollar budget and hundreds of employees.

Automatically generated comments aren’t a new problem. For some time, we have struggled with bots, machines that automatically post content. Five years ago, at least a million automatically drafted comments were believed to have been submitted to the Federal Communications Commission regarding proposed regulations on net neutrality. In 2019, a Harvard undergraduate, as a test, used a text-generation program to submit 1,001 comments in response to a government request for public input on a Medicaid issue. Back then, submitting comments was just a game of overwhelming numbers.

Platforms have gotten better at removing “coordinated inauthentic behavior.” Facebook, for example, has been removing over a billion fake accounts a year. But such messages are just the beginning. Rather than flooding legislators’ inboxes with supportive emails, or dominating the Capitol switchboard with synthetic voice calls, an AI system with the sophistication of ChatGPT but trained on relevant data could selectively target key legislators and influencers to identify the weakest points in the policymaking system and ruthlessly exploit them through direct communication, public relations campaigns, horse trading or other points of leverage.

When we humans do these things, we call it lobbying. Successful agents in this sphere pair precision message writing with smart targeting strategies. Right now, the only thing stopping a ChatGPT-equipped lobbyist from executing something resembling a rhetorical drone warfare campaign is a lack of precision targeting. AI could provide techniques for that as well.

A system that can understand political networks, if paired with the textual-generation capabilities of ChatGPT, could identify the member of Congress with the most leverage over a particular policy area—say, corporate taxation or military spending. Like human lobbyists, such a system could target undecided representatives sitting on committees controlling the policy of interest and then focus resources on members of the majority party when a bill moves toward a floor vote.

Once individuals and strategies are identified, an AI chatbot like ChatGPT could craft written messages to be used in letters, comments—anywhere text is useful. Human lobbyists could also target those individuals directly. It’s the combination that’s important: Editorial and social media comments only get you so far, and knowing which legislators to target isn’t itself enough.

This ability to understand and target actors within a network would create a tool for AI hacking, exploiting vulnerabilities in social, economic and political systems with incredible speed and scope. Legislative systems would be a particular target, because the motive for attacking policymaking systems is so strong, because the data for training such systems is so widely available and because the use of AI may be so hard to detect—particularly if it is being used strategically to guide human actors.

The data necessary to train such strategic targeting systems will only grow with time. Open societies generally make their democratic processes a matter of public record, and most legislators are eager—at least, performatively so—to accept and respond to messages that appear to be from their constituents.

Maybe an AI system could uncover which members of Congress have significant sway over leadership but still have low enough public profiles that there is only modest competition for their attention. It could then pinpoint the SuperPAC or public interest group with the greatest impact on that legislator’s public positions. Perhaps it could even calibrate the size of donation needed to influence that organization or direct targeted online advertisements carrying a strategic message to its members. For each policy end, the right audience; and for each audience, the right message at the right time.

What makes the threat of AI-powered lobbyists greater than the threat already posed by the high-priced lobbying firms on K Street is their potential for acceleration. Human lobbyists rely on decades of experience to find strategic solutions to achieve a policy outcome. That expertise is limited, and therefore expensive.

AI could, theoretically, do the same thing much more quickly and cheaply. Speed out of the gate is a huge advantage in an ecosystem in which public opinion and media narratives can become entrenched quickly, as is being nimble enough to shift rapidly in response to chaotic world events.

Moreover, the flexibility of AI could help achieve influence across many policies and jurisdictions simultaneously. Imagine an AI-assisted lobbying firm that can attempt to place legislation in every single bill moving in the US Congress, or even across all state legislatures. Lobbying firms tend to work within one state only, because there are such complex variations in law, procedure and political structure. With AI assistance in navigating these variations, it may become easier to exert power across political boundaries.

Just as teachers will have to change how they give students exams and essay assignments in light of ChatGPT, governments will have to change how they relate to lobbyists.

To be sure, there may also be benefits to this technology in the democracy space; the biggest one is accessibility. Not everyone can afford an experienced lobbyist, but a software interface to an AI system could be made available to anyone. If we’re lucky, maybe this kind of strategy-generating AI could revitalize the democratization of democracy by giving this kind of lobbying power to the powerless.

However, the biggest and most powerful institutions will likely use any AI lobbying techniques most successfully. After all, executing the best lobbying strategy still requires insiders—people who can walk the halls of the legislature—and money. Lobbying isn’t just about giving the right message to the right person at the right time; it’s also about giving money to the right person at the right time. And while an AI chatbot can identify who should be on the receiving end of those campaign contributions, humans will, for the foreseeable future, need to supply the cash. So while it’s impossible to predict what a future filled with AI lobbyists will look like, it will probably make the already influential and powerful even more so.

This essay was written with Nathan Sanders, and previously appeared in the New York Times.

Edited to Add: After writing this, we discovered that a research group is researching AI and lobbying:

We used autoregressive large language models (LLMs, the same type of model behind the now wildly popular ChatGPT) to systematically conduct the following steps. (The full code is available at this GitHub link: https://github.com/JohnNay/llm-lobbyist.)

  1. Summarize official U.S. Congressional bill summaries that are too long to fit into the context window of the LLM so the LLM can conduct steps 2 and 3.
  2. Using either the original official bill summary (if it was not too long), or the summarized version:
    1. Assess whether the bill may be relevant to a company based on a company’s description in its SEC 10K filing.
    2. Provide an explanation for why the bill is relevant or not.
    3. Provide a confidence level to the overall answer.
  3. If the bill is deemed relevant to the company by the LLM, draft a letter to the sponsor of the bill arguing for changes to the proposed legislation.

Here is the paper.

EDITED TO ADD (9/12): Emily Bender has a critique of this essay.

Posted on January 18, 2023 at 7:19 AMView Comments

Threats of Machine-Generated Text

With the release of ChatGPT, I’ve read many random articles about this or that threat from the technology. This paper is a good survey of the field: what the threats are, how we might detect machine-generated text, directions for future research. It’s a solid grounding amongst all of the hype.

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Abstract: Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.

Posted on January 13, 2023 at 7:13 AMView Comments

Breaking RSA with a Quantum Computer

A group of Chinese researchers have just published a paper claiming that they can—although they have not yet done so—break 2048-bit RSA. This is something to take seriously. It might not be correct, but it’s not obviously wrong.

We have long known from Shor’s algorithm that factoring with a quantum computer is easy. But it takes a big quantum computer, on the orders of millions of qbits, to factor anything resembling the key sizes we use today. What the researchers have done is combine classical lattice reduction factoring techniques with a quantum approximate optimization algorithm. This means that they only need a quantum computer with 372 qbits, which is well within what’s possible today. (The IBM Osprey is a 433-qbit quantum computer, for example. Others are on their way as well.)

The Chinese group didn’t have that large a quantum computer to work with. They were able to factor 48-bit numbers using a 10-qbit quantum computer. And while there are always potential problems when scaling something like this up by a factor of 50, there are no obvious barriers.

Honestly, most of the paper is over my head—both the lattice-reduction math and the quantum physics. And there’s the nagging question of why the Chinese government didn’t classify this research. But…wow…maybe…and yikes! Or not.

Factoring integers with sublinear resources on a superconducting quantum processor

Abstract: Shor’s algorithm has seriously challenged information security based on public key cryptosystems. However, to break the widely used RSA-2048 scheme, one needs millions of physical qubits, which is far beyond current technical capabilities. Here, we report a universal quantum algorithm for integer factorization by combining the classical lattice reduction with a quantum approximate optimization algorithm (QAOA). The number of qubits required is O(logN/loglogN ), which is sublinear in the bit length of the integer N , making it the most qubit-saving factorization algorithm to date. We demonstrate the algorithm experimentally by factoring integers up to 48 bits with 10 superconducting qubits, the largest integer factored on a quantum device. We estimate that a quantum circuit with 372 physical qubits and a depth of thousands is necessary to challenge RSA-2048 using our algorithm. Our study shows great promise in expediting the application of current noisy quantum computers, and paves the way to factor large integers of realistic cryptographic significance.

In email, Roger Grimes told me: “Apparently what happened is another guy who had previously announced he was able to break traditional asymmetric encryption using classical computers…but reviewers found a flaw in his algorithm and that guy had to retract his paper. But this Chinese team realized that the step that killed the whole thing could be solved by small quantum computers. So they tested and it worked.”

EDITED TO ADD: One of the issues with the algorithm is that it relies on a recent factoring paper by Claus Schnorr. It’s a controversial paper; and despite the “this destroys the RSA cryptosystem” claim in the abstract, it does nothing of the sort. Schnorr’s algorithm works well with smaller moduli—around the same order as ones the Chinese group has tested—but falls apart at larger sizes. At this point, nobody understands why. The Chinese paper claims that their quantum techniques get around this limitation (I think that’s what’s behind Grimes’s comment) but don’t give any details—and they haven’t tested it with larger moduli. So if it’s true that the Chinese paper depends on this Schnorr technique that doesn’t scale, the techniques in this Chinese paper won’t scale, either. (On the other hand, if it does scale then I think it also breaks a bunch of lattice-based public-key cryptosystems.)

I am much less worried that this technique will work now. But this is something the IBM quantum computing people can test right now.

EDITED TO ADD (1/4): A reporter just asked me my gut feel about this. I replied that I don’t think this will break RSA. Several times a year the cryptography community received “breakthroughs” from people outside the community. That’s why we created the RSA Factoring Challenge: to force people to provide proofs of their claims. In general, the smart bet is on the new techniques not working. But someday, that bet will be wrong. Is it today? Probably not. But it could be. We’re in the worst possible position right now: we don’t have the facts to know. Someone needs to implement the quantum algorithm and see.

EDITED TO ADD (1/5): Scott Aaronson’s take is a “no”:

In the new paper, the authors spend page after page saying-without-saying that it might soon become possible to break RSA-2048, using a NISQ (i.e., non-fault-tolerant) quantum computer. They do so via two time-tested strategems:

  1. the detailed exploration of irrelevancies (mostly, optimization of the number of qubits, while ignoring the number of gates), and
  2. complete silence about the one crucial point.

Then, finally, they come clean about the one crucial point in a single sentence of the Conclusion section:

It should be pointed out that the quantum speedup of the algorithm is unclear due to the ambiguous convergence of QAOA.

“Unclear” is an understatement here. It seems to me that a miracle would be required for the approach here to yield any benefit at all, compared to just running the classical Schnorr’s algorithm on your laptop. And if the latter were able to break RSA, it would’ve already done so.

All told, this is one of the most actively misleading quantum computing papers I’ve seen in 25 years, and I’ve seen … many.

EDITED TO ADD (1/7): More commentary. Again: no need to panic.

EDITED TO ADD (1/12): Peter Shor has suspicions.

Posted on January 3, 2023 at 12:38 PMView Comments

Recovering Smartphone Voice from the Accelerometer

Yet another smartphone side-channel attack: “EarSpy: Spying Caller Speech and Identity through Tiny Vibrations of Smartphone Ear Speakers“:

Abstract: Eavesdropping from the user’s smartphone is a well-known threat to the user’s safety and privacy. Existing studies show that loudspeaker reverberation can inject speech into motion sensor readings, leading to speech eavesdropping. While more devastating attacks on ear speakers, which produce much smaller scale vibrations, were believed impossible to eavesdrop with zero-permission motion sensors. In this work, we revisit this important line of reach. We explore recent trends in smartphone manufacturers that include extra/powerful speakers in place of small ear speakers, and demonstrate the feasibility of using motion sensors to capture such tiny speech vibrations. We investigate the impacts of these new ear speakers on built-in motion sensors and examine the potential to elicit private speech information from the minute vibrations. Our designed system EarSpy can successfully detect word regions, time, and frequency domain features and generate a spectrogram for each word region. We train and test the extracted data using classical machine learning algorithms and convolutional neural networks. We found up to 98.66% accuracy in gender detection, 92.6% detection in speaker detection, and 56.42% detection in digit detection (which is 5X more significant than the random selection (10%)). Our result unveils the potential threat of eavesdropping on phone conversations from ear speakers using motion sensors.

It’s not great, but it’s an impressive start.

Posted on December 30, 2022 at 7:18 AMView Comments

The Decoupling Principle

This is a really interesting paper that discusses what the authors call the Decoupling Principle:

The idea is simple, yet previously not clearly articulated: to ensure privacy, information should be divided architecturally and institutionally such that each entity has only the information they need to perform their relevant function. Architectural decoupling entails splitting functionality for different fundamental actions in a system, such as decoupling authentication (proving who is allowed to use the network) from connectivity (establishing session state for communicating). Institutional decoupling entails splitting what information remains between non-colluding entities, such as distinct companies or network operators, or between a user and network peers. This decoupling makes service providers individually breach-proof, as they each have little or no sensitive data that can be lost to hackers. Put simply, the Decoupling Principle suggests always separating who you are from what you do.

Lots of interesting details in the paper.

Posted on December 7, 2022 at 7:04 AMView Comments

Using Wi-FI to See through Walls

This technique measures device response time to determine distance:

The scientists tested the exploit by modifying an off-the-shelf drone to create a flying scanning device, the Wi-Peep. The robotic aircraft sends several messages to each device as it flies around, establishing the positions of devices in each room. A thief using the drone could find vulnerable areas in a home or office by checking for the absence of security cameras and other signs that a room is monitored or occupied. It could also be used to follow a security guard, or even to help rival hotels spy on each other by gauging the number of rooms in use.

There have been attempts to exploit similar WiFi problems before, but the team says these typically require bulky and costly devices that would give away attempts. Wi-Peep only requires a small drone and about $15 US in equipment that includes two WiFi modules and a voltage regulator. An intruder could quickly scan a building without revealing their presence.

Research paper.

Posted on November 8, 2022 at 6:15 AMView Comments

Adversarial ML Attack that Secretly Gives a Language Model a Point of View

Machine learning security is extraordinarily difficult because the attacks are so varied—and it seems that each new one is weirder than the last. Here’s the latest: a training-time attack that forces the model to exhibit a point of view: Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures.”

Abstract: We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to “spin” their outputs so as to support an adversary-chosen sentiment or point of view—but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization.

Model spinning introduces a “meta-backdoor” into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary.

Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims.

To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call “pseudo-words,” and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary’s meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.

This new attack dovetails with something I’ve been worried about for a while, something Latanya Sweeney has dubbed “persona bots.” This is what I wrote in my upcoming book (to be published in February):

One example of an extension of this technology is the “persona bot,” an AI posing as an individual on social media and other online groups. Persona bots have histories, personalities, and communication styles. They don’t constantly spew propaganda. They hang out in various interest groups: gardening, knitting, model railroading, whatever. They act as normal members of those communities, posting and commenting and discussing. Systems like GPT-3 will make it easy for those AIs to mine previous conversations and related Internet content and to appear knowledgeable. Then, once in a while, the AI might post something relevant to a political issue, maybe an article about a healthcare worker having an allergic reaction to the COVID-19 vaccine, with worried commentary. Or maybe it might offer its developer’s opinions about a recent election, or racial justice, or any other polarizing subject. One persona bot can’t move public opinion, but what if there were thousands of them? Millions?

These are chatbots on a very small scale. They would participate in small forums around the Internet: hobbyist groups, book groups, whatever. In general they would behave normally, participating in discussions like a person does. But occasionally they would say something partisan or political, depending on the desires of their owners. Because they’re all unique and only occasional, it would be hard for existing bot detection techniques to find them. And because they can be replicated by the millions across social media, they could have a greater effect. They would affect what we think, and—just as importantly—what we think others think. What we will see as robust political discussions would be persona bots arguing with other persona bots.

Attacks like these add another wrinkle to that sort of scenario.

Posted on October 21, 2022 at 6:53 AMView Comments

Inserting a Backdoor into a Machine-Learning System

Interesting research: “ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks, by Tim Clifford, Ilia Shumailov, Yiren Zhao, Ross Anderson, and Robert Mullins:

Abstract: Early backdoor attacks against machine learning set off an arms race in attack and defence development. Defences have since appeared demonstrating some ability to detect backdoors in models or even remove them. These defences work by inspecting the training data, the model, or the integrity of the training procedure. In this work, we show that backdoors can be added during compilation, circumventing any safeguards in the data preparation and model training stages. As an illustration, the attacker can insert weight-based backdoors during the hardware compilation step that will not be detected by any training or data-preparation process. Next, we demonstrate that some backdoors, such as ImpNet, can only be reliably detected at the stage where they are inserted and removing them anywhere else presents a significant challenge. We conclude that machine-learning model security requires assurance of provenance along the entire technical pipeline, including the data, model architecture, compiler, and hardware specification.

Ross Anderson explains the significance:

The trick is for the compiler to recognise what sort of model it’s compiling—whether it’s processing images or text, for example—and then devising trigger mechanisms for such models that are sufficiently covert and general. The takeaway message is that for a machine-learning model to be trustworthy, you need to assure the provenance of the whole chain: the model itself, the software tools used to compile it, the training data, the order in which the data are batched and presented—in short, everything.

Posted on October 11, 2022 at 7:18 AMView Comments

Detecting Deepfake Audio by Modeling the Human Acoustic Tract

This is interesting research:

In this paper, we develop a new mechanism for detecting audio deepfakes using techniques from the field of articulatory phonetics. Specifically, we apply fluid dynamics to estimate the arrangement of the human vocal tract during speech generation and show that deepfakes often model impossible or highly-unlikely anatomical arrangements. When parameterized to achieve 99.9% precision, our detection mechanism achieves a recall of 99.5%, correctly identifying all but one deepfake sample in our dataset.

From an article by two of the researchers:

The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract. Luckily scientists have techniques to estimate what someone—or some being such as a dinosaur—would sound like based on anatomical measurements of its vocal tract.

We did the reverse. By inverting many of these same techniques, we were able to extract an approximation of a speaker’s vocal tract during a segment of speech. This allowed us to effectively peer into the anatomy of the speaker who created the audio sample.

From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have. In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people.

Our testing results not only confirmed our hypothesis but revealed something interesting. When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect. For instance, it was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape.

This is, of course, not the last word. Deepfake generators will figure out how to use these techniques to create harder-to-detect fake voices. And the deepfake detectors will figure out another, better, detection technique. And the arms race will continue.

Slashdot thread.

Posted on October 3, 2022 at 6:25 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.