In 2017, some Android phones came with a backdoor pre-installed:
Criminals in 2017 managed to get an advanced backdoor preinstalled on Android devices before they left the factories of manufacturers, Google researchers confirmed on Thursday.
Triada first came to light in 2016 in articles published by Kaspersky here and here, the first of which said the malware was "one of the most advanced mobile Trojans" the security firm's analysts had ever encountered. Once installed, Triada's chief purpose was to install apps that could be used to send spam and display ads. It employed an impressive kit of tools, including rooting exploits that bypassed security protections built into Android and the means to modify the Android OS' all-powerful Zygote process. That meant the malware could directly tamper with every installed app. Triada also connected to no fewer than 17 command and control servers.
In July 2017, security firm Dr. Web reported that its researchers had found Triada built into the firmware of several Android devices, including the Leagoo M5 Plus, Leagoo M8, Nomu S10, and Nomu S20. The attackers used the backdoor to surreptitiously download and install modules. Because the backdoor was embedded into one of the OS libraries and located in the system section, it couldn't be deleted using standard methods, the report said.
On Thursday, Google confirmed the Dr. Web report, although it stopped short of naming the manufacturers. Thursday's report also said the supply chain attack was pulled off by one or more partners the manufacturers used in preparing the final firmware image used in the affected devices.
This is a supply chain attack. It seems to be the work of criminals, but it could just as easily have been a nation-state.
When the next pandemic strikes, we'll be fighting it on two fronts. The first is the one you immediately think about: understanding the disease, researching a cure and inoculating the population. The second is new, and one you might not have thought much about: fighting the deluge of rumors, misinformation and flat-out lies that will appear on the internet.
The second battle will be like the Russian disinformation campaigns during the 2016 presidential election, only with the addition of a deadly health crisis and possibly without a malicious government actor. But while the two problems -- misinformation affecting democracy and misinformation affecting public health -- will have similar solutions, the latter is much less political. If we work to solve the pandemic disinformation problem, any solutions are likely to also be applicable to the democracy one.
Pandemics are part of our future. They might be like the 1968 Hong Kong flu, which killed a million people, or the 1918 Spanish flu, which killed over 40 million. Yes, modern medicine makes pandemics less likely and less deadly. But global travel and trade, increased population density, decreased wildlife habitats, and increased animal farming to satisfy a growing and more affluent population have made them more likely. Experts agree that it's not a matter of if -- it's only a matter of when.
When the next pandemic strikes, accurate information will be just as important as effective treatments. We saw this in 2014, when the Nigerian government managed to contain a subcontinentwide Ebola epidemic to just 20 infections and eight fatalities. Part of that success was because of the ways officials communicated health information to all Nigerians, using government-sponsored videos, social media campaigns and international experts. Without that, the death toll in Lagos, a city of 21 million people, would have probably been greater than the 11,000 the rest of the continent experienced.
There's every reason to expect misinformation to be rampant during a pandemic. In the early hours and days, information will be scant and rumors will abound. Most of us are not health professionals or scientists. We won't be able to tell fact from fiction. Even worse, we'll be scared. Our brains work differently when we are scared, and they latch on to whatever makes us feel safer -- even if it's not true.
Rumors and misinformation could easily overwhelm legitimate news channels, as people share tweets, images and videos. Much of it will be well-intentioned but wrong -- like the misinformation spread by the anti-vaccination community today -- but some of it may be malicious. In the 1980s, the KGB ran a sophisticated disinformation campaign -- Operation Infektion -- to spread the rumor that HIV/AIDS was a result of an American biological weapon gone awry. It's reasonable to assume some group or country would deliberately spread intentional lies in an attempt to increase death and chaos.
It's not just misinformation about which treatments work (and are safe), and which treatments don't work (and are unsafe). Misinformation can affect society's ability to deal with a pandemic at many different levels. Right now, Ebola relief efforts in the Democratic Republic of Congo are being stymied by mistrust of health workers and government officials.
It doesn't take much to imagine how this can lead to disaster. Jay Walker, curator of the TEDMED conferences, laid out some of the possibilities in a 2016 essay: people overwhelming and even looting pharmacies trying to get some drug that is irrelevant or nonexistent, people needlessly fleeing cities and leaving them paralyzed, health workers not showing up for work, truck drivers and other essential people being afraid to enter infected areas, official sites like CDC.gov being hacked and discredited. This kind of thing can magnify the health effects of a pandemic many times over, and in extreme cases could lead to a total societal collapse.
This is going to be something that government health organizations, medical professionals, social media companies and the traditional media are going to have to work out together. There isn't any single solution; it will require many different interventions that will all need to work together. The interventions will look a lot like what we're already talking about with regard to government-run and other information influence campaigns that target our democratic processes: methods of visibly identifying false stories, the identification and deletion of fake posts and accounts, ways to promote official and accurate news, and so on. At the scale these are needed, they will have to be done automatically and in real time.
Since the 2016 presidential election, we have been talking about propaganda campaigns, and about how social media amplifies fake news and allows damaging messages to spread easily. It's a hard discussion to have in today's hyperpolarized political climate. After any election, the winning side has every incentive to downplay the role of fake news.
But pandemics are different; there's no political constituency in favor of people dying because of misinformation. Google doesn't want the results of peoples' well-intentioned searches to lead to fatalities. Facebook and Twitter don't want people on their platforms sharing misinformation that will result in either individual or mass deaths. Focusing on pandemics gives us an apolitical way to collectively approach the general problem of misinformation and fake news. And any solutions for pandemics are likely to also be applicable to the more general -- and more political -- problems.
Pandemics are inevitable. Bioterror is already possible, and will only get easier as the requisite technologies become cheaper and more common. We're experiencing the largest measles outbreak in 25 years thanks to the anti-vaccination movement, which has hijacked social media to amplify its messages; we seem unable to beat back the disinformation and pseudoscience surrounding the vaccine. Those same forces will dramatically increase death and social upheaval in the event of a pandemic.
Let the Russian propaganda attacks on the 2016 election serve as a wake-up call for this and other threats. We need to solve the problem of misinformation during pandemics together -- governments and industries in collaboration with medical officials, all across the world -- before there's a crisis. And the solutions will also help us shore up our democracy in the process.
This essay previously appeared in the New York Times.
Matthew Green intelligently speculates about how Apple's new "Find My" feature works.
If you haven't already been inspired by the description above, let me phrase the question you ought to be asking: how is this system going to avoid being a massive privacy nightmare?
Let me count the concerns:
- If your device is constantly emitting a BLE signal that uniquely identifies it, the whole world is going to have (yet another) way to track you. Marketers already use WiFi and Bluetooth MAC addresses to do this: Find My could create yet another tracking channel.
- It also exposes the phones who are doing the tracking. These people are now going to be sending their current location to Apple (which they may or may not already be doing). Now they'll also be potentially sharing this information with strangers who "lose" their devices. That could go badly.
- Scammers might also run active attacks in which they fake the location of your device. While this seems unlikely, people will always surprise you.
The good news is that Apple claims that their system actually does provide strong privacy, and that it accomplishes this using clever cryptography. But as is typical, they've declined to give out the details how they're going to do it. Andy Greenberg talked me through an incomplete technical description that Apple provided to Wired, so that provides many hints. Unfortunately, what Apple provided still leaves huge gaps. It's into those gaps that I'm going to fill in my best guess for what Apple is actually doing.
Security researchers Gabriel Campana and Jean-Baptiste Bédrune are giving a hardware security module (HSM) talk at BlackHat in August:
This highly technical presentation targets an HSM manufactured by a vendor whose solutions are usually found in major banks and large cloud service providers. It will demonstrate several attack paths, some of them allowing unauthenticated attackers to take full control of the HSM. The presented attacks allow retrieving all HSM secrets remotely, including cryptographic keys and administrator credentials. Finally, we exploit a cryptographic bug in the firmware signature verification to upload a modified firmware to the HSM. This firmware includes a persistent backdoor that survives a firmware update.
There were plenty of technical challenges to solve along the way, in what was clearly a thorough and professional piece of vulnerability research:
- They started by using legitimate SDK access to their test HSM to upload a firmware module that would give them a shell inside the HSM. Note that this SDK access was used to discover the attacks, but is not necessary to exploit them.
- They then used the shell to run a fuzzer on the internal implementation of PKCS#11 commands to find reliable, exploitable buffer overflows.
- They checked they could exploit these buffer overflows from outside the HSM, i.e. by just calling the PKCS#11 driver from the host machine
- They then wrote a payload that would override access control and, via another issue in the HSM, allow them to upload arbitrary (unsigned) firmware. It's important to note that this backdoor is persistent a subsequent update will not fix it.
- They then wrote a module that would dump all the HSM secrets, and uploaded it to the HSM.
Stuart Schechter writes about the security risks of using a password manager. It's a good piece, and nicely discusses the trade-offs around password managers: which one to choose, which passwords to store in it, and so on.
My own Password Safe is mentioned. My particular choices about security and risk is to only store passwords on my computer -- not on my phone -- and not to put anything in the cloud. In my way of thinking, that reduces the risks of a password manager considerably. Yes, there are losses in convenience.
Maciej Cegłowski has a really good essay explaining how to think about privacy today:
For the purposes of this essay, I'll call it "ambient privacy" -- the understanding that there is value in having our everyday interactions with one another remain outside the reach of monitoring, and that the small details of our daily lives should pass by unremembered. What we do at home, work, church, school, or in our leisure time does not belong in a permanent record. Not every conversation needs to be a deposition.
Until recently, ambient privacy was a simple fact of life. Recording something for posterity required making special arrangements, and most of our shared experience of the past was filtered through the attenuating haze of human memory. Even police states like East Germany, where one in seven citizens was an informer, were not able to keep tabs on their entire population. Today computers have given us that power. Authoritarian states like China and Saudi Arabia are using this newfound capacity as a tool of social control. Here in the United States, we're using it to show ads. But the infrastructure of total surveillance is everywhere the same, and everywhere being deployed at scale.
Ambient privacy is not a property of people, or of their data, but of the world around us. Just like you can't drop out of the oil economy by refusing to drive a car, you can't opt out of the surveillance economy by forswearing technology (and for many people, that choice is not an option). While there may be worthy reasons to take your life off the grid, the infrastructure will go up around you whether you use it or not.
Because our laws frame privacy as an individual right, we don't have a mechanism for deciding whether we want to live in a surveillance society. Congress has remained silent on the matter, with both parties content to watch Silicon Valley make up its own rules. The large tech companies point to our willing use of their services as proof that people don't really care about their privacy. But this is like arguing that inmates are happy to be in jail because they use the prison library. Confronted with the reality of a monitored world, people make the rational decision to make the best of it.
That is not consent.
Ambient privacy is particularly hard to protect where it extends into social and public spaces outside the reach of privacy law. If I'm subjected to facial recognition at the airport, or tagged on social media at a little league game, or my public library installs an always-on Alexa microphone, no one is violating my legal rights. But a portion of my life has been brought under the magnifying glass of software. Even if the data harvested from me is anonymized in strict conformity with the most fashionable data protection laws, I've lost something by the fact of being monitored.
He's not the first person to talk about privacy as a societal property, or to use pollution metaphors. But his framing is really cogent. And "ambient privacy" is new -- and a good phrasing.
According to foreign policy experts and the defense establishment, the United States is caught in an artificial intelligence arms race with China -- one with serious implications for national security. The conventional version of this story suggests that the United States is at a disadvantage because of self-imposed restraints on the collection of data and the privacy of its citizens, while China, an unrestrained surveillance state, is at an advantage. In this vision, the data that China collects will be fed into its systems, leading to more powerful AI with capabilities we can only imagine today. Since Western countries can't or won't reap such a comprehensive harvest of data from their citizens, China will win the AI arms race and dominate the next century.
This idea makes for a compelling narrative, especially for those trying to justify surveillance -- whether government- or corporate-run. But it ignores some fundamental realities about how AI works and how AI research is conducted.
Thanks to advances in machine learning, AI has flipped from theoretical to practical in recent years, and successes dominate public understanding of how it works. Machine learning systems can now diagnose pneumonia from X-rays, play the games of go and poker, and read human lips, all better than humans. They're increasingly watching surveillance video. They are at the core of self-driving car technology and are playing roles in both intelligence-gathering and military operations. These systems monitor our networks to detect intrusions and look for spam and malware in our email.
And it's true that there are differences in the way each country collects data. The United States pioneered "surveillance capitalism," to use the Harvard University professor Shoshana Zuboff's term, where data about the population is collected by hundreds of large and small companies for corporate advantage -- and mutually shared or sold for profit The state picks up on that data, in cases such as the Centers for Disease Control and Prevention's use of Google search data to map epidemics and evidence shared by alleged criminals on Facebook, but it isn't the primary user.
China, on the other hand, is far more centralized. Internet companies collect the same sort of data, but it is shared with the government, combined with government-collected data, and used for social control. Every Chinese citizen has a national ID number that is demanded by most services and allows data to easily be tied together. In the western region of Xinjiang, ubiquitous surveillance is used to oppress the Uighur ethnic minority -- although at this point there is still a lot of human labor making it all work. Everyone expects that this is a test bed for the entire country.
Data is increasingly becoming a part of control for the Chinese government. While many of these plans are aspirational at the moment -- there isn't, as some have claimed, a single "social credit score," but instead future plans to link up a wide variety of systems -- data collection is universally pushed as essential to the future of Chinese AI. One executive at search firm Baidu predicted that the country's connected population will provide them with the raw data necessary to become the world's preeminent tech power. China's official goal is to become the world AI leader by 2030, aided in part by all of this massive data collection and correlation.
This all sounds impressive, but turning massive databases into AI capabilities doesn't match technological reality. Current machine learning techniques aren't all that sophisticated. All modern AI systems follow the same basic methods. Using lots of computing power, different machine learning models are tried, altered, and tried again. These systems use a large amount of data (the training set) and an evaluation function to distinguish between those models and variations that work well and those that work less well. After trying a lot of models and variations, the system picks the one that works best. This iterative improvement continues even after the system has been fielded and is in use.
So, for example, a deep learning system trying to do facial recognition will have multiple layers (hence the notion of "deep") trying to do different parts of the facial recognition task. One layer will try to find features in the raw data of a picture that will help find a face, such as changes in color that will indicate an edge. The next layer might try to combine these lower layers into features like shapes, looking for round shapes inside of ovals that indicate eyes on a face. The different layers will try different features and will be compared by the evaluation function until the one that is able to give the best results is found, in a process that is only slightly more refined than trial and error.
Large data sets are essential to making this work, but that doesn't mean that more data is automatically better or that the system with the most data is automatically the best system. Train a facial recognition algorithm on a set that contains only faces of white men, and the algorithm will have trouble with any other kind of face. Use an evaluation function that is based on historical decisions, and any past bias is learned by the algorithm. For example, mortgage loan algorithms trained on historic decisions of human loan officers have been found to implement redlining. Similarly, hiring algorithms trained on historical data manifest the same sexism as human staff often have. Scientists are constantly learning about how to train machine learning systems, and while throwing a large amount of data and computing power at the problem can work, more subtle techniques are often more successful. All data isn't created equal, and for effective machine learning, data has to be both relevant and diverse in the right ways.
Future research advances in machine learning are focused on two areas. The first is in enhancing how these systems distinguish between variations of an algorithm. As different versions of an algorithm are run over the training data, there needs to be some way of deciding which version is "better." These evaluation functions need to balance the recognition of an improvement with not over-fitting to the particular training data. Getting functions that can automatically and accurately distinguish between two algorithms based on minor differences in the outputs is an art form that no amount of data can improve.
The second is in the machine learning algorithms themselves. While much of machine learning depends on trying different variations of an algorithm on large amounts of data to see which is most successful, the initial formulation of the algorithm is still vitally important. The way the algorithms interact, the types of variations attempted, and the mechanisms used to test and redirect the algorithms are all areas of active research. (An overview of some of this work can be found here; even trying to limit the research to 20 papers oversimplifies the work being done in the field.) None of these problems can be solved by throwing more data at the problem.
The British AI company DeepMind's success in teaching a computer to play the Chinese board game go is illustrative. Its AlphaGo computer program became a grandmaster in two steps. First, it was fed some enormous number of human-played games. Then, the game played itself an enormous number of times, improving its own play along the way. In 2016, AlphaGo beat the grandmaster Lee Sedol four games to one.
While the training data in this case, the human-played games, was valuable, even more important was the machine learning algorithm used and the function that evaluated the relative merits of different game positions. Just one year later, DeepMind was back with a follow-on system: AlphaZero. This go-playing computer dispensed entirely with the human-played games and just learned by playing against itself over and over again. It plays like an alien. (It also became a grandmaster in chess and shogi.)
These are abstract games, so it makes sense that a more abstract training process works well. But even something as visceral as facial recognition needs more than just a huge database of identified faces in order to work successfully. It needs the ability to separate a face from the background in a two-dimensional photo or video and to recognize the same face in spite of changes in angle, lighting, or shadows. Just adding more data may help, but not nearly as much as added research into what to do with the data once we have it.
Meanwhile, foreign-policy and defense experts are talking about AI as if it were the next nuclear arms race, with the country that figures it out best or first becoming the dominant superpower for the next century. But that didn't happen with nuclear weapons, despite research only being conducted by governments and in secret. It certainly won't happen with AI, no matter how much data different nations or companies scoop up.
It is true that China is investing a lot of money into artificial intelligence research: The Chinese government believes this will allow it to leapfrog other countries (and companies in those countries) and become a major force in this new and transformative area of computing -- and it may be right. On the other hand, much of this seems to be a wasteful boondoggle. Slapping "AI" on pretty much anything is how to get funding. The Chinese Ministry of Education, for instance, promises to produce "50 world-class AI textbooks," with no explanation of what that means.
In the democratic world, the government is neither the leading researcher nor the leading consumer of AI technologies. AI research is much more decentralized and academic, and it is conducted primarily in the public eye. Research teams keep their training data and models proprietary but freely publish their machine learning algorithms. If you wanted to work on machine learning right now, you could download Microsoft's Cognitive Toolkit, Google's Tensorflow, or Facebook's Pytorch. These aren't toy systems; these are the state-of-the art machine learning platforms.
AI is not analogous to the big science projects of the previous century that brought us the atom bomb and the moon landing. AI is a science that can be conducted by many different groups with a variety of different resources, making it closer to computer design than the space race or nuclear competition. It doesn't take a massive government-funded lab for AI research, nor the secrecy of the Manhattan Project. The research conducted in the open science literature will trump research done in secret because of the benefits of collaboration and the free exchange of ideas.
While the United States should certainly increase funding for AI research, it should continue to treat it as an open scientific endeavor. Surveillance is not justified by the needs of machine learning, and real progress in AI doesn't need it.
Basically, they thrive in a high CO2 environment, because it doesn't bother them and makes their prey weaker.
As usual, you can also use this squid post to talk about the security stories in the news that I haven't covered.
Read my blog posting guidelines here.
Sidebar photo of Bruce Schneier by Joe MacInnis.
Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Security.