Entries Tagged "incentives"

Page 1 of 14

The CrowdStrike Outage and Market-Driven Brittleness

Friday’s massive internet outage, caused by a mid-sized tech company called CrowdStrike, disrupted major airlines, hospitals, and banks. Nearly 7,000 flights were canceled. It took down 911 systems and factories, courthouses, and television stations. Tallying the total cost will take time. The outage affected more than 8.5 million Windows computers, and the cost will surely be in the billions of dollars­—easily matching the most costly previous cyberattacks, such as NotPetya.

The catastrophe is yet another reminder of how brittle global internet infrastructure is. It’s complex, deeply interconnected, and filled with single points of failure. As we experienced last week, a single problem in a small piece of software can take large swaths of the internet and global economy offline.

The brittleness of modern society isn’t confined to tech. We can see it in many parts of our infrastructure, from food to electricity, from finance to transportation. This is often a result of globalization and consolidation, but not always. In information technology, brittleness also results from the fact that hundreds of companies, none of which you’ve heard of, each perform a small but essential role in keeping the internet running. CrowdStrike is one of those companies.

This brittleness is a result of market incentives. In enterprise computing—as opposed to personal computing—a company that provides computing infrastructure to enterprise networks is incentivized to be as integral as possible, to have as deep access into their customers’ networks as possible, and to run as leanly as possible.

Redundancies are unprofitable. Being slow and careful is unprofitable. Being less embedded in and less essential and having less access to the customers’ networks and machines is unprofitable—at least in the short term, by which these companies are measured. This is true for companies like CrowdStrike. It’s also true for CrowdStrike’s customers, who also didn’t have resilience, redundancy, or backup systems in place for failures such as this because they are also an expense that affects short-term profitability.

But brittleness is profitable only when everything is working. When a brittle system fails, it fails badly. The cost of failure to a company like CrowdStrike is a fraction of the cost to the global economy. And there will be a next CrowdStrike, and one after that. The market rewards short-term profit-maximizing systems, and doesn’t sufficiently penalize such companies for the impact their mistakes can have. (Stock prices depress only temporarily. Regulatory penalties are minor. Class-action lawsuits settle. Insurance blunts financial losses.) It’s not even clear that the information technology industry could exist in its current form if it had to take into account all the risks such brittleness causes.

The asymmetry of costs is largely due to our complex interdependency on so many systems and technologies, any one of which can cause major failures. Each piece of software depends on dozens of others, typically written by other engineering teams sometimes years earlier on the other side of the planet. Some software systems have not been properly designed to contain the damage caused by a bug or a hack of some key software dependency.

These failures can take many forms. The CrowdStrike failure was the result of a buggy software update. The bug didn’t get caught in testing and was rolled out to CrowdStrike’s customers worldwide. Sometimes, failures are deliberate results of a cyberattack. Other failures are just random, the result of some unforeseen dependency between different pieces of critical software systems.

Imagine a house where the drywall, flooring, fireplace, and light fixtures are all made by companies that need continuous access and whose failures would cause the house to collapse. You’d never set foot in such a structure, yet that’s how software systems are built. It’s not that 100 percent of the system relies on each company all the time, but 100 percent of the system can fail if any one of them fails. But doing better is expensive and doesn’t immediately contribute to a company’s bottom line.

Economist Ronald Coase famously described the nature of the firm­—any business­—as a collection of contracts. Each contract has a cost. Performing the same function in-house also has a cost. When the costs of maintaining the contract are lower than the cost of doing the thing in-house, then it makes sense to outsource: to another firm down the street or, in an era of cheap communication and coordination, to another firm on the other side of the planet. The problem is that both the financial and risk costs of outsourcing can be hidden—delayed in time and masked by complexity—and can lead to a false sense of security when companies are actually entangled by these invisible dependencies. The ability to outsource software services became easy a little over a decade ago, due to ubiquitous global network connectivity, cloud and software-as-a-service business models, and an increase in industry- and government-led certifications and box-checking exercises.

This market force has led to the current global interdependence of systems, far and wide beyond their industry and original scope. It’s why flying planes depends on software that has nothing to do with the avionics. It’s why, in our connected internet-of-things world, we can imagine a similar bad software update resulting in our cars not starting one morning or our refrigerators failing.

This is not something we can dismantle overnight. We have built a society based on complex technology that we’re utterly dependent on, with no reliable way to manage that technology. Compare the internet with ecological systems. Both are complex, but ecological systems have deep complexity rather than just surface complexity. In ecological systems, there are fewer single points of failure: If any one thing fails in a healthy natural ecosystem, there are other things that will take over. That gives them a resilience that our tech systems lack.

We need deep complexity in our technological systems, and that will require changes in the market. Right now, the market incentives in tech are to focus on how things succeed: A company like CrowdStrike provides a key service that checks off required functionality on a compliance checklist, which makes it all about the features that they will deliver when everything is working. That’s exactly backward. We want our technological infrastructure to mimic nature in the way things fail. That will give us deep complexity rather than just surface complexity, and resilience rather than brittleness.

How do we accomplish this? There are examples in the technology world, but they are piecemeal. Netflix is famous for its Chaos Monkey tool, which intentionally causes failures to force the systems (and, really, the engineers) to be more resilient. The incentives don’t line up in the short term: It makes it harder for Netflix engineers to do their jobs and more expensive for them to run their systems. Over years, this kind of testing generates more stable systems. But it requires corporate leadership with foresight and a willingness to spend in the short term for possible long-term benefits.

Last week’s update wouldn’t have been a major failure if CrowdStrike had rolled out this change incrementally: first 1 percent of their users, then 10 percent, then everyone. But that’s much more expensive, because it requires a commitment of engineer time for monitoring, debugging, and iterating. And can take months to do correctly for complex and mission-critical software. An executive today will look at the market incentives and correctly conclude that it’s better for them to take the chance than to “waste” the time and money.

The usual tools of regulation and certification may be inadequate, because failure of complex systems is inherently also complex. We can’t describe the unknown unknowns involved in advance. Rather, what we need to codify are the processes by which failure testing must take place.

We know, for example, how to test whether cars fail well. The National Highway Traffic Safety Administration crashes cars to learn what happens to the people inside. But cars are relatively simple, and keeping people safe is straightforward. Software is different. It is diverse, is constantly changing, and has to continually adapt to novel circumstances. We can’t expect that a regulation that mandates a specific list of software crash tests would suffice. Again, security and resilience are achieved through the process by which we fail and fix, not through any specific checklist. Regulation has to codify that process.

Today’s internet systems are too complex to hope that if we are smart and build each piece correctly the sum total will work right. We have to deliberately break things and keep breaking them. This repeated process of breaking and fixing will make these systems reliable. And then a willingness to embrace inefficiencies will make these systems resilient. But the economic incentives point companies in the other direction, to build their systems as brittle as they can possibly get away with.

This essay was written with Barath Raghavan, and previously appeared on Lawfare.com.

Posted on July 25, 2024 at 2:37 PMView Comments

Microsoft and Security Incentives

Former senior White House cyber policy director A. J. Grotto talks about the economic incentives for companies to improve their security—in particular, Microsoft:

Grotto told us Microsoft had to be “dragged kicking and screaming” to provide logging capabilities to the government by default, and given the fact the mega-corp banked around $20 billion in revenue from security services last year, the concession was minimal at best.

[…]

“The government needs to focus on encouraging and catalyzing competition,” Grotto said. He believes it also needs to publicly scrutinize Microsoft and make sure everyone knows when it messes up.

“At the end of the day, Microsoft, any company, is going to respond most directly to market incentives,” Grotto told us. “Unless this scrutiny generates changed behavior among its customers who might want to look elsewhere, then the incentives for Microsoft to change are not going to be as strong as they should be.”

Breaking up the tech monopolies is one of the best things we can do for cybersecurity.

Posted on April 23, 2024 at 7:09 AMView Comments

Australia Increases Fines for Massive Data Breaches

After suffering two large, and embarrassing, data breaches in recent weeks, the Australian government increased the fine for serious data breaches from $2.2 million to a minimum of $50 million. (That’s $50 million AUD, or $32 million USD.)

This is a welcome change. The problem is one of incentives, and Australia has now increased the incentive for companies to secure the personal data or their users and customers.

EDITED TO ADD (10/15): I got the details wrong. One, this is a proposed increase. Two, the amount of $50 million AUD is only applicable in very few cases.

Posted on October 26, 2022 at 6:13 AMView Comments

On Vulnerability-Adjacent Vulnerabilities

At the virtual Enigma Conference, Google’s Project Zero’s Maggie Stone gave a talk about zero-day exploits in the wild. In it, she talked about how often vendors fix vulnerabilities only to have the attackers tweak their exploits to work again. From a MIT Technology Review article:

Soon after they were spotted, the researchers saw one exploit being used in the wild. Microsoft issued a patch and fixed the flaw, sort of. In September 2019, another similar vulnerability was found being exploited by the same hacking group.

More discoveries in November 2019, January 2020, and April 2020 added up to at least five zero-day vulnerabilities being exploited from the same bug class in short order. Microsoft issued multiple security updates: some failed to actually fix the vulnerability being targeted, while others required only slight changes that required just a line or two to change in the hacker’s code to make the exploit work again.

[…]

“What we saw cuts across the industry: Incomplete patches are making it easier for attackers to exploit users with zero-days,” Stone said on Tuesday at the security conference Enigma. “We’re not requiring attackers to come up with all new bug classes, develop brand new exploitation, look at code that has never been researched before. We’re allowing the reuse of lots of different vulnerabilities that we previously knew about.”

[…]

Why aren’t they being fixed? Most of the security teams working at software companies have limited time and resources, she suggests—and if their priorities and incentives are flawed, they only check that they’ve fixed the very specific vulnerability in front of them instead of addressing the bigger problems at the root of many vulnerabilities.

Another article on the talk.

This is an important insight. It’s not enough to patch existing vulnerabilities. We need to make it harder for attackers to find new vulnerabilities to exploit. Closing entire families of vulnerabilities, rather than individual vulnerabilities one at a time, is a good way to do that.

Posted on February 15, 2021 at 6:14 AMView Comments

Hacking Apple for Profit

Five researchers hacked Apple Computer’s networks—not their products—and found fifty-five vulnerabilities. So far, they have received $289K.

One of the worst of all the bugs they found would have allowed criminals to create a worm that would automatically steal all the photos, videos, and documents from someone’s iCloud account and then do the same to the victim’s contacts.

Lots of details in this blog post by one of the hackers.

Posted on October 12, 2020 at 5:58 AMView Comments

Programmers Who Don't Understand Security Are Poor at Security

A university study confirmed the obvious: if you pay a random bunch of freelance programmers a small amount of money to write security software, they’re not going to do a very good job at it.

In an experiment that involved 43 programmers hired via the Freelancer.com platform, University of Bonn academics have discovered that developers tend to take the easy way out and write code that stores user passwords in an unsafe manner.

For their study, the German academics asked a group of 260 Java programmers to write a user registration system for a fake social network.

Of the 260 developers, only 43 took up the job, which involved using technologies such as Java, JSF, Hibernate, and PostgreSQL to create the user registration component.

Of the 43, academics paid half of the group with €100, and the other half with €200, to determine if higher pay made a difference in the implementation of password security features.

Further, they divided the developer group a second time, prompting half of the developers to store passwords in a secure manner, and leaving the other half to store passwords in their preferred method—hence forming four quarters of developers paid €100 and prompted to use a secure password storage method (P100), developers paid €200 and prompted to use a secure password storage method (P200), devs paid €100 but not prompted for password security (N100), and those paid €200 but not prompted for password security (N200).

I don’t know why anyone would expect this group of people to implement a good secure password system. Look at how they were hired. Look at the scope of the project. Look at what they were paid. I’m sure they grabbed the first thing they found on GitHub that did the job.

I’m not very impressed with the study or its conclusions.

Posted on March 27, 2019 at 6:37 AMView Comments

NSA Collects MS Windows Error Information

Back in 2013, Der Spiegel reported that the NSA intercepts and collects Windows bug reports:

One example of the sheer creativity with which the TAO spies approach their work can be seen in a hacking method they use that exploits the error-proneness of Microsoft’s Windows. Every user of the operating system is familiar with the annoying window that occasionally pops up on screen when an internal problem is detected, an automatic message that prompts the user to report the bug to the manufacturer and to restart the program. These crash reports offer TAO specialists a welcome opportunity to spy on computers.

When TAO selects a computer somewhere in the world as a target and enters its unique identifiers (an IP address, for example) into the corresponding database, intelligence agents are then automatically notified any time the operating system of that computer crashes and its user receives the prompt to report the problem to Microsoft. An internal presentation suggests it is NSA’s powerful XKeyscore spying tool that is used to fish these crash reports out of the massive sea of Internet traffic.

The automated crash reports are a “neat way” to gain “passive access” to a machine, the presentation continues. Passive access means that, initially, only data the computer sends out into the Internet is captured and saved, but the computer itself is not yet manipulated. Still, even this passive access to error messages provides valuable insights into problems with a targeted person’s computer and, thus, information on security holes that might be exploitable for planting malware or spyware on the unwitting victim’s computer.

Although the method appears to have little importance in practical terms, the NSA’s agents still seem to enjoy it because it allows them to have a bit of a laugh at the expense of the Seattle-based software giant. In one internal graphic, they replaced the text of Microsoft’s original error message with one of their own reading, “This information may be intercepted by a foreign sigint system to gather detailed information and better exploit your machine.” (“Sigint” stands for “signals intelligence.”)

The article talks about the (limited) value of this information with regard to specific target computers, but I have another question: how valuable would this database be for finding new zero-day Windows vulnerabilities to exploit? Microsoft won’t have the incentive to examine and fix problems until they happen broadly among its user base. The NSA has a completely different incentive structure.

I don’t remember this being discussed back in 2013.

EDITED TO ADD (8/6): Slashdot thread.

EDITED TO ADD (8/14): Adam S, a former Microsoft employee, writes in a comment that this information is very helpful in finding zero-days, and cites this as an example. He also says that this information is now TLS encrypted, and has been since Windows 8 or 10.

Posted on August 1, 2017 at 6:00 AMView Comments

Surveillance Intermediaries

Interesting law-journal article: “Surveillance Intermediaries,” by Alan Z. Rozenshtein.

Abstract:Apple’s 2016 fight against a court order commanding it to help the FBI unlock the iPhone of one of the San Bernardino terrorists exemplifies how central the question of regulating government surveillance has become in American politics and law. But scholarly attempts to answer this question have suffered from a serious omission: scholars have ignored how government surveillance is checked by “surveillance intermediaries,” the companies like Apple, Google, and Facebook that dominate digital communications and data storage, and on whose cooperation government surveillance relies. This Article fills this gap in the scholarly literature, providing the first comprehensive analysis of how surveillance intermediaries constrain the surveillance executive. In so doing, it enhances our conceptual understanding of, and thus our ability to improve, the institutional design of government surveillance.

Surveillance intermediaries have the financial and ideological incentives to resist government requests for user data. Their techniques of resistance are: proceduralism and litigiousness that reject voluntary cooperation in favor of minimal compliance and aggressive litigation; technological unilateralism that designs products and services to make surveillance harder; and policy mobilization that rallies legislative and public opinion to limit surveillance. Surveillance intermediaries also enhance the “surveillance separation of powers”; they make the surveillance executive more subject to inter-branch constraints from Congress and the courts, and to intra-branch constraints from foreign-relations and economics agencies as well as the surveillance executive’s own surveillance-limiting components.

The normative implications of this descriptive account are important and cross-cutting. Surveillance intermediaries can both improve and worsen the “surveillance frontier”: the set of tradeoffs ­ between public safety, privacy, and economic growth ­ from which we choose surveillance policy. And while intermediaries enhance surveillance self-government when they mobilize public opinion and strengthen the surveillance separation of powers, they undermine it when their unilateral technological changes prevent the government from exercising its lawful surveillance authorities.

Posted on June 7, 2017 at 6:19 AMView Comments

1 2 3 14

Sidebar photo of Bruce Schneier by Joe MacInnis.