Are Data Breaches Getting Larger?

This research says that data breaches are not getting larger over time.

Hype and Heavy Tails: A Closer Look at Data Breaches,” by Benjamin Edwards, Steven Hofmeyr, and Stephanie Forrest:

Abstract: Recent widely publicized data breaches have exposed the
personal information of hundreds of millions of people. Some reports point to alarming increases in both the size and frequency of data breaches, spurring institutions around the world to address what appears to be a worsening situation. But, is the problem actually growing worse? In this paper, we study a popular public dataset and develop Bayesian Generalized Linear Models to investigate trends in data breaches. Analysis of the model shows that neither size nor frequency of data breaches has increased over the past decade. We find that the increases that have attracted attention can be explained by the heavy-tailed statistical distributions underlying the dataset. Specifically, we find that data breach size is log-normally distributed and that the daily frequency of breaches is described by a negative binomial distribution. These distributions may provide clues to the generative mechanisms that are responsible for the breaches. Additionally, our model predicts the likelihood of breaches of a particular size in the future. For example, we find that in the next year there is only a 31% chance of a breach of 10 million records or more in the US. Regardless of any trend, data breaches are costly, and we combine the model with two different cost models to project that in the next three years breaches could cost up to $55 billion.

The paper was presented at WEIS 2015.

Posted on August 25, 2015 at 6:27 AM23 Comments


just passin thru August 25, 2015 6:46 AM

I don’t believe it.

I am pretty paranoid about such things, and take plenty of steps to protect my identity/info. I’ve even got my wife on board as well.

Despite that, I’ve recently been notified of breaches by (1) the state of South Carolina because I pay taxes there, and (2) UCLA Health Services, and (3) another, I forget who. Before that, nada.

Maybe the authors aren’t measuring it, but I think the likelihood of any random adult’s info being branched is cumulatively going up over time.

Even the paranoid aren’t safe, and are unlikely to be until lawmakers are shamed into protecting citizens and consumers; protecting (regulating, HA!) businesses that trade in this information is what creates a market for this assault on the public.

Sam August 25, 2015 7:02 AM

@just passin thru

I think the likelihood of any random adult’s info being branched is cumulatively going up over time.

There are a number of other factors in play here:
1/ How well data breaches are communicated to you – if your data is leaked but the company doesn’t tell you about it, your data is still leaked.
2/ How much data an average random adult has on the internet – this is probably going up too, and the root cause might be “more info online” rather than “more and worse hacks”.

branched –> breached

This is oddly fitting though.

paul August 25, 2015 8:18 AM

That $55 billion number is both huge and small at the same time. On the order of $50 a year per US adult. On the other hand, compared to the amount companies are spending on securing the data, rather larger. (And if the notion that the FTC can actually impose liability holds up, the kind of risk any given company should take notice of. One of the byproducts of the FTC liability theory is that shareholders will be able to sue for losses in stock price caused by any negligence found by the agency. Which might finally put Bruce’s notion of monetary incentives for not being complete idiots into practice.)

Clive Robinson August 25, 2015 8:47 AM

I must be getting old… when I read,

Specifically, we find that data breach size is log-normally distributed and that the daily frequency of breaches is described by a negative binomial distribution.

I had to stop and mentaly picture it out…

parabarbarian August 25, 2015 9:57 AM

Looks like good news and bad news.

The good news is that breaches are not increasing.

The bad new is that breaches are not decreasing.

Dr. I. Needtob Athe August 25, 2015 10:32 AM

I guarantee you one thing: The amount of data that has been breached is getting larger.

Pat August 25, 2015 11:24 AM

Of course the unusually large data breaches get into the news. If we had 1,000,000+ records leaked every week, it wouldn’t be news anymore.

Also, I think the tails are going to become the norm. As datasets increase in size, system grow in complexity, and security changes, we will see more large scale, target breaches.

It is cold comfort to know that what is protecting my data is the sheer size of the data breaches compared to the number of criminals exploiting the data. (For fun, how long before the data necessary to steal from a bank (name, birthdate, SSN, etc) for 95% of the population is stolen? How will banking systems change to deal with this or when will they be forced to change?)

Chase Johnson August 25, 2015 12:28 PM

Interesting study, but I note that there is no examination of the contents of a given breach besides the number of records in it. As they say, the largest ever breach was in 2009: the Heartland Payments breach. But that breach only exposed credit card mag-stripe data according to: Whereas the OPM and Ashley Madison breaches revealed a great deal of personal information.

Perhaps breaches are not increasing in rate, or in size (as measured by record count), but are nevertheless increasing in severity, hence the increased news coverage and increased concerns on the part of security professionals. This paper does not appear to address whether or not this is the case.

CallMeLateForSupper August 25, 2015 1:01 PM

I just love seeing the little Privacy Badger icon stay GREEN the whole time I’m in this blog! It’s a little thing but means so much. Thank-you for this, Bruce. (Snoopy dance here)

Coyne Tibbets August 25, 2015 1:05 PM

Did bank robberies become larger when banks became bigger targets?

I think it’s a given that the more massive the trove of data held by a company, the bigger the breach and the more likely a breach will be attempted.

Especially when the data is protected by IDIOTS (“IDIOTS Developing Insecurely Obfuscated Technological Solutions”, apologies to cheong).

rgaff August 25, 2015 1:12 PM

@Chase Johnson

Indeed. The issue is, if some research comes out with the same thing everyone’s thinking, it’s not newsworthy, and nobody will hear about it. It’s only newsworthy if it’s wildly different from what seems pretty obvious to everyone. Then, as a researcher, you become famous. Kind of makes you wonder if that has any play in what’s addressed and what’s not addressed sometimes, doesn’t it…


Yeah, that little badger buddy does his work… he badgers us until we turn him green 🙂


Perhaps this is the general population’s future:

Chase Johnson August 25, 2015 1:39 PM


I’m not quite that cynical. No doubt there are incentives along those lines, but there are other incentives too, like wanting to be able to make the problem clear to dense, ignorant, or otherwise-incentivized parties like politicians or large software vendors. This paper is a good thing to have as part of the field, now we just need to see some work on the severity/depth/importance/impact of the same breaches.

Regarding flaws in the study in question, I am wondering if breach size (in records) is actually more of a reflection of the size of existing databases and the number of candidate records that could even exist. If we’re looking only at the US population, there can’t be more than about 400 million “records” anyway. A 32 million record breach, like AM, is already nearly 10% of the US population. How many databases even exist with more people in them than that?

On the other hand, that the rate of breaches in records per unit time seems to be flat is probably indicative of something. The “Red Queen Hypothesis” as they bring up seems plausible. It may well be the case that breaches are increasing in severity not because our defense capabilities relative to attack capabilities are decreasing, but simply because all extant databases have a larger proportion of private data than they did, say, a decade ago.

Clearly, plenty more work to be done, but I think this is a good start.

rgaff August 25, 2015 2:16 PM

@Chase Johnson

I hadn’t meant that as a direct accusation of this study in particular, only as a warning for studies in general… Especially when one follows where the funding money ultimately came from (discounting intermediaries designed to hide that) 🙂

I see this in the field of medicine more than technology so far… Who pays for the studies? Drug companies, of course. There’s no money in herbs that grow in your windowsill and other natural things, so it’s not funded much. So what do Medical schools have as a basis to teach Physicians from? The studies that exist… namely, studies on drugs, not herbs. So they all learn about drugs, not herbs, and then that’s naturally mostly all they push to their patients. It doesn’t even have to be cynically designed this way, that’s just how it works. Money, fame, and power talks at all levels and in all fields. This is the natural order of things, unless we fight hard to balance it out.

Bud Weiser August 25, 2015 2:24 PM

If they are getting bigger, lawyers will likely have a heyday.

AM is now getting sued by some users in USA and they are hoping to get their case registered as a class action lawsuit, representing some 37 million users.

Also, Ashley Madison is offering a $500,000 reward for information leading to the arrest of a group that hacked the site.

albert August 25, 2015 2:50 PM

Ah, statistics rears its ugly head yet again. As useless as tits on a boar hog. How or who does this ‘study’ help in any way? I want to know. All is does is keep 3 academics away from possibly productive work.
It is nearly impossible to quantify the real results of data breaches, and pulling a dollar value out of somewhere is the worst way to do it. What it does is move accountability from criminal liability (of which there is none, but should be) to financial liability (insurance, writeoffs, taxpayer dollars).
I am that cynical.
What I’d like to see are studies that show, for example, the exact nature of the attack, the hardware and OS involved, the types of security measures in place, etc. In other words, real forensic analysis. This is how you gain a measure of the problem, and insights into solutions.
. .. . .. o

tyr August 25, 2015 3:44 PM

I’d like to see the accountancy paper trail on the dollar
figures everyone seems to toss out whenever anything comp
related is referenced.

The Sun devil operation was chasing hacker access to a
Telco document valued at almost 80,000 dollars. Turned
out you could buy it for 13 dollars.

The valuation turned out to be the kitchen sink model of
adding up everything related to the documents creation.

The real problem is everyone storing everything they can
get hold of while failing to adequately defend it. That
it has to be on-line and accessible even if useless for
any reasonable purpose is the model that needs to be
tossed out. The real danger is cloud storage removes
the controls over data that are local. Once an attacker
copies it all to another location no amount of crying
will get control of it again. If Wikileaks decides to
publish the AM or OPM dump then getting those cats back
into the bag is going to be impossible. If AM hadn’t
been lying to their customers the problem would be a
lot smaller and a lot less smellier.

If you make the original collector responsible under law
for information that belongs to its owner then you’ll
see this start to turn around. As it is these collectors
routinely violate copyrights with impunity (as one small
example) and have no interest in safeguarding what you
have entrusted to them.

Tim August 25, 2015 7:09 PM

It’s worth noting that $50 per year per person is much much less than we spend on coffee.

Justin August 26, 2015 8:11 PM

Are data breaches getting larger?

In a larger sense, of course they must be, and those who say otherwise are in denial. Over the years it has become easier and cheaper to store, process, and transmit larger and larger amounts of data. Such data can thus contain correspondingly more and more pertinent information on correspondingly more and more people.

At the same time due diligence over the custody of such data has been lax and lagging behind. What do people really expect when they are so careless with others’ data?

the real cynical August 26, 2015 9:49 PM

In this days and age we willingly give and forgive our data privacy to aggregators, data breaches described thus is rather inconsequential to the grand cost scheme. Its like a telecom tax.

BoppingAround August 27, 2015 9:16 AM

Is it really willingly or is it the result of collective ignorance and peer pressure? I still cannot make that out.

My wager remains on ignorance. So far, each field day in whatever stint I’ve been doing has proved that if there’s anything in abundance on the planet, it is ignorance. And the extent of it, sometimes, gets absolutely ridiculous.

No need for a rant though. Richard Aldington pinpointed the whole thing about the ignorance nearly 90 years ago in his novel, Death of a Hero. I’d provide the excerpt but I cannot find the original English text.

M. August 28, 2015 11:03 AM


Yeah. Good luck with that.

1) Clinton and Bush tort reforms gutted American law, the size of these breaches doesn’t matter: the economic incentive to take effective countermeasures will simply never exist. It’s like a toddler trying to hit the speed of light in a tricycle; the laws of physics just aren’t on their side. And it’s a shame, because if there’s one thing I learned in school, it’s that punitive damages and class actions are criminal law for corporations.

2) I’ve heard Canadian class actions are still okayish.

That said, what’s the point? It didn’t have an IPO, so while it’s not exactly a blood from a stone situation…

3) I’m not sure it’s an econ thing so much as it is a PR and lobbying one.

It sounds like the old Cold War spiel to me. Michael Parenti describes it brilliantly in Against Empire and Make Believe Media: create a b.s. enemy to rachet fear levels up; wait for and/or invent a crisis; pass a new PATRIOT-esque bill with a rider that waters down an irritatingly effective — but unrelated — law that irritates the squagillionaires (e.g., the courts’ enforcement of EU antitrust damages/Sarbanes-Oxley requirements/blahblahblah). Wash, rinse, repeat.

4) Actually, you do bring up a good point: how do the ultra rich really go about protecting their financial and medical privacy?

I’ve always wondered why we only see people in the $10-75 m net worth range (e.g., celebrities and politicans) getting hacked, but not in the $100+ m range getting hacked. I couldn’t figure it out. At first, I thought it was because the $10m club hired their high school friends as “security professionals,” while the $100m club could hire real security professionals with real experience. And real professionals would use non-tech fail safes in case the phone was hacked, such as registering the phone to a company in the Caymans or a fake ID to stay out of Lexis/Acciom databases.

Which makes me wonder: there a way to easily — and cheaply — do this for the rest of us poor schlubs?

5) It also made me wonder: can organizational doxing/internal hacking somewhat replace unions and professions? It seems to be a totally new way of “packing and cracking” identity politics.

Despite their PR, American unions are dead. Ultimately, they’re hogtied by various forms of institutional and regulatory capture. Professionalization compensated a bit for some people, but even they’re seeing their gains erode.

What I found very, very odd about this case was that it got people who normally wouldn’t care about a presumably disgruntled ex-employee’s welfare adamantly encouraging company execs to pay him. The enemy of my enemy was, for a week, my friend.

I’m not saying this pseudo-solidarity would be genuine. The question is, if it happens repeatedly, do you get to like the people you’re fighting with? Or do you really, really resent how they’re using you and your data?

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.