Are Data Breaches Getting Larger?

This research says that data breaches are not getting larger over time.

“Hype and Heavy Tails: A Closer Look at Data Breaches,” by Benjamin Edwards, Steven Hofmeyr, and Stephanie Forrest:

Abstract: Recent widely publicized data breaches have exposed the
personal information of hundreds of millions of people. Some reports point to alarming increases in both the size and frequency of data breaches, spurring institutions around the world to address what appears to be a worsening situation. But, is the problem actually growing worse? In this paper, we study a popular public dataset and develop Bayesian Generalized Linear Models to investigate trends in data breaches. Analysis of the model shows that neither size nor frequency of data breaches has increased over the past decade. We find that the increases that have attracted attention can be explained by the heavy-tailed statistical distributions underlying the dataset. Specifically, we find that data breach size is log-normally distributed and that the daily frequency of breaches is described by a negative binomial distribution. These distributions may provide clues to the generative mechanisms that are responsible for the breaches. Additionally, our model predicts the likelihood of breaches of a particular size in the future. For example, we find that in the next year there is only a 31% chance of a breach of 10 million records or more in the US. Regardless of any trend, data breaches are costly, and we combine the model with two different cost models to project that in the next three years breaches could cost up to $55 billion.

The paper was presented at WEIS 2015.

Tags: academic papers, breaches, databases

Posted on August 25, 2015 at 6:27 AM • 23 Comments

Comments

just passin thru • August 25, 2015 6:46 AM

I don’t believe it.

I am pretty paranoid about such things, and take plenty of steps to protect my identity/info. I’ve even got my wife on board as well.

Despite that, I’ve recently been notified of breaches by (1) the state of South Carolina because I pay taxes there, and (2) UCLA Health Services, and (3) another, I forget who. Before that, nada.

Maybe the authors aren’t measuring it, but I think the likelihood of any random adult’s info being branched is cumulatively going up over time.

Even the paranoid aren’t safe, and are unlikely to be until lawmakers are shamed into protecting citizens and consumers; protecting (regulating, HA!) businesses that trade in this information is what creates a market for this assault on the public.

just passin thru • August 25, 2015 6:48 AM

oops
branched –> breached

Sam • August 25, 2015 7:02 AM

@just passin thru

I think the likelihood of any random adult’s info being branched is cumulatively going up over time.

There are a number of other factors in play here:
1/ How well data breaches are communicated to you – if your data is leaked but the company doesn’t tell you about it, your data is still leaked.
2/ How much data an average random adult has on the internet – this is probably going up too, and the root cause might be “more info online” rather than “more and worse hacks”.

branched –> breached

This is oddly fitting though.

paul • August 25, 2015 8:18 AM

That $55 billion number is both huge and small at the same time. On the order of $50 a year per US adult. On the other hand, compared to the amount companies are spending on securing the data, rather larger. (And if the notion that the FTC can actually impose liability holds up, the kind of risk any given company should take notice of. One of the byproducts of the FTC liability theory is that shareholders will be able to sue for losses in stock price caused by any negligence found by the agency. Which might finally put Bruce’s notion of monetary incentives for not being complete idiots into practice.)

Clive Robinson • August 25, 2015 8:47 AM

I must be getting old… when I read,

Specifically, we find that data breach size is log-normally distributed and that the daily frequency of breaches is described by a negative binomial distribution.

I had to stop and mentaly picture it out…

parabarbarian • August 25, 2015 9:57 AM

Looks like good news and bad news.

The good news is that breaches are not increasing.

The bad new is that breaches are not decreasing.

Dr. I. Needtob Athe • August 25, 2015 10:32 AM

I guarantee you one thing: The amount of data that has been breached is getting larger.

Pat • August 25, 2015 11:24 AM

Of course the unusually large data breaches get into the news. If we had 1,000,000+ records leaked every week, it wouldn’t be news anymore.

Also, I think the tails are going to become the norm. As datasets increase in size, system grow in complexity, and security changes, we will see more large scale, target breaches.

It is cold comfort to know that what is protecting my data is the sheer size of the data breaches compared to the number of criminals exploiting the data. (For fun, how long before the data necessary to steal from a bank (name, birthdate, SSN, etc) for 95% of the population is stolen? How will banking systems change to deal with this or when will they be forced to change?)

Chase Johnson • August 25, 2015 12:28 PM

Interesting study, but I note that there is no examination of the contents of a given breach besides the number of records in it. As they say, the largest ever breach was in 2009: the Heartland Payments breach. But that breach only exposed credit card mag-stripe data according to: http://voices.washingtonpost.com/securityfix/2009/01/payment_processor_breach_may_b.html. Whereas the OPM and Ashley Madison breaches revealed a great deal of personal information.

Perhaps breaches are not increasing in rate, or in size (as measured by record count), but are nevertheless increasing in severity, hence the increased news coverage and increased concerns on the part of security professionals. This paper does not appear to address whether or not this is the case.

CallMeLateForSupper • August 25, 2015 1:01 PM

I just love seeing the little Privacy Badger icon stay GREEN the whole time I’m in this blog! It’s a little thing but means so much. Thank-you for this, Bruce. (Snoopy dance here)

Coyne Tibbets • August 25, 2015 1:05 PM

Did bank robberies become larger when banks became bigger targets?

I think it’s a given that the more massive the trove of data held by a company, the bigger the breach and the more likely a breach will be attempted.

Especially when the data is protected by IDIOTS (“IDIOTS Developing Insecurely Obfuscated Technological Solutions”, apologies to cheong).

rgaff • August 25, 2015 1:12 PM

@Chase Johnson

Indeed. The issue is, if some research comes out with the same thing everyone’s thinking, it’s not newsworthy, and nobody will hear about it. It’s only newsworthy if it’s wildly different from what seems pretty obvious to everyone. Then, as a researcher, you become famous. Kind of makes you wonder if that has any play in what’s addressed and what’s not addressed sometimes, doesn’t it…

@CallMeLateForSupper

Yeah, that little badger buddy does his work… he badgers us until we turn him green 🙂

@Pat

Perhaps this is the general population’s future:

http://krebsonsecurity.com/2015/06/how-i-learned-to-stop-worrying-and-embrace-the-security-freeze/

Chase Johnson • August 25, 2015 1:39 PM

@rgaff

I’m not quite that cynical. No doubt there are incentives along those lines, but there are other incentives too, like wanting to be able to make the problem clear to dense, ignorant, or otherwise-incentivized parties like politicians or large software vendors. This paper is a good thing to have as part of the field, now we just need to see some work on the severity/depth/importance/impact of the same breaches.

Regarding flaws in the study in question, I am wondering if breach size (in records) is actually more of a reflection of the size of existing databases and the number of candidate records that could even exist. If we’re looking only at the US population, there can’t be more than about 400 million “records” anyway. A 32 million record breach, like AM, is already nearly 10% of the US population. How many databases even exist with more people in them than that?

On the other hand, that the rate of breaches in records per unit time seems to be flat is probably indicative of something. The “Red Queen Hypothesis” as they bring up seems plausible. It may well be the case that breaches are increasing in severity not because our defense capabilities relative to attack capabilities are decreasing, but simply because all extant databases have a larger proportion of private data than they did, say, a decade ago.

Clearly, plenty more work to be done, but I think this is a good start.

rgaff • August 25, 2015 2:16 PM

@Chase Johnson

I hadn’t meant that as a direct accusation of this study in particular, only as a warning for studies in general… Especially when one follows where the funding money ultimately came from (discounting intermediaries designed to hide that) 🙂

I see this in the field of medicine more than technology so far… Who pays for the studies? Drug companies, of course. There’s no money in herbs that grow in your windowsill and other natural things, so it’s not funded much. So what do Medical schools have as a basis to teach Physicians from? The studies that exist… namely, studies on drugs, not herbs. So they all learn about drugs, not herbs, and then that’s naturally mostly all they push to their patients. It doesn’t even have to be cynically designed this way, that’s just how it works. Money, fame, and power talks at all levels and in all fields. This is the natural order of things, unless we fight hard to balance it out.

Bud Weiser • August 25, 2015 2:24 PM

If they are getting bigger, lawyers will likely have a heyday.

AM is now getting sued by some users in USA and they are hoping to get their case registered as a class action lawsuit, representing some 37 million users.

Also, Ashley Madison is offering a $500,000 reward for information leading to the arrest of a group that hacked the site.

albert • August 25, 2015 2:50 PM

Ah, statistics rears its ugly head yet again. As useless as tits on a boar hog. How or who does this ‘study’ help in any way? I want to know. All is does is keep 3 academics away from possibly productive work.
.
It is nearly impossible to quantify the real results of data breaches, and pulling a dollar value out of somewhere is the worst way to do it. What it does is move accountability from criminal liability (of which there is none, but should be) to financial liability (insurance, writeoffs, taxpayer dollars).
.
I am that cynical.
.
What I’d like to see are studies that show, for example, the exact nature of the attack, the hardware and OS involved, the types of security measures in place, etc. In other words, real forensic analysis. This is how you gain a measure of the problem, and insights into solutions.
.
. .. . .. o

tyr • August 25, 2015 3:44 PM

I’d like to see the accountancy paper trail on the dollar
figures everyone seems to toss out whenever anything comp
related is referenced.

(historical)
The Sun devil operation was chasing hacker access to a
Telco document valued at almost 80,000 dollars. Turned
out you could buy it for 13 dollars.

The valuation turned out to be the kitchen sink model of
adding up everything related to the documents creation.

The real problem is everyone storing everything they can
get hold of while failing to adequately defend it. That
it has to be on-line and accessible even if useless for
any reasonable purpose is the model that needs to be
tossed out. The real danger is cloud storage removes
the controls over data that are local. Once an attacker
copies it all to another location no amount of crying
will get control of it again. If Wikileaks decides to
publish the AM or OPM dump then getting those cats back
into the bag is going to be impossible. If AM hadn’t
been lying to their customers the problem would be a
lot smaller and a lot less smellier.

If you make the original collector responsible under law
for information that belongs to its owner then you’ll
see this start to turn around. As it is these collectors
routinely violate copyrights with impunity (as one small
example) and have no interest in safeguarding what you
have entrusted to them.

Tim • August 25, 2015 7:09 PM

@paul
It’s worth noting that $50 per year per person is much much less than we spend on coffee.