Schneier on Security: Tagged academic papers

Entries Tagged "academic papers"

Page 73 of 80

Fear of Internet Predators Largely Unfounded

Does this really come as a surprise?

“There’s been some overreaction to the new technology, especially when it comes to the danger that strangers represent,” said Janis Wolak, a sociologist at the Crimes against Children Research Center at the University of New Hampshire in Durham.

“Actually, Internet-related sex crimes are a pretty small proportion of sex crimes that adolescents suffer,” Wolak added, based on three nationwide surveys conducted by the center.

[…]

In an article titled “Online ‘Predators’ and Their Victims,” which appears Tuesday in American Psychologist, the journal of the American Psychological Association, Wolak and co-researchers examined several fears that they concluded are myths:

Internet predators are driving up child sex crime rates.
Finding: Sex assaults on teens fell 52 percent from 1993 to 2005, according to the Justice Department’s National Crime Victimization Survey, the best measure of U.S. crime trends. “The Internet may not be as risky as a lot of other things that parents do without concern, such as driving kids to the mall and leaving them there for two hours,” Wolak said.

Internet predators are pedophiles.
Finding: Internet predators don’t hit on the prepubescent children whom pedophiles target. They target adolescents, who have more access to computers, more privacy and more interest in sex and romance, Wolak’s team determined from interviews with investigators.

Internet predators represent a new dimension of child sexual abuse.
Finding: The means of communication is new, according to Wolak, but most Internet-linked offenses are essentially statutory rape: nonforcible sex crimes against minors too young to consent to sexual relationships with adults.

Internet predators trick or abduct their victims.
Finding: Most victims meet online offenders face-to-face and go to those meetings expecting to engage in sex. Nearly three-quarters have sex with partners they met on the Internet more than once.

Internet predators meet their victims by posing online as other teens.
Finding: Only 5 percent of predators did that, according to the survey of investigators.

Online interactions with strangers are risky.
Finding: Many teens interact online all the time with people they don’t know. What’s risky, according to Wolak, is giving out names, phone numbers and pictures to strangers and talking online with them about sex.

Internet predators go after any child.
Finding: Usually their targets are adolescent girls or adolescent boys of uncertain sexual orientation, according to Wolak. Youths with histories of sexual abuse, sexual orientation concerns and patterns of off- and online risk-taking are especially at risk.

In January, I said this:

…there isn’t really any problem with child predators—just a tiny handful of highly publicized stories—on MySpace. It’s just security theater against a movie-plot threat. But we humans have a well-established cognitive bias that overestimates threats against our children, so it all makes sense.

EDITED TO ADD (3/7): A good essay.

Posted on February 26, 2008 at 6:30 AM • View Comments

Research on Malware Distribution

Interesting:

Among their conclusions are that the majority of malware distribution sites are hosted in China, and that 1.3% of Google searches return at least one link to a malicious site. The lead author, Niels Provos, wrote, ‘It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed.'”

Draft paper, and some data.

Posted on February 26, 2008 at 6:23 AM • View Comments

The Nugache Worm/Botnet

I’ve already written about the Storm worm, and how it represents a new generation of worm/botnets. And Scott Berinato has written an excellent article about the Gozi worm, another new-generation worm/botnet.

This article is about yet another new-generation worm-botnet: Nugache. Dave Dittrich thinks this is the most advanced worm/botnet yet:

But this new piece of malware, which came to be known as Nugache, was a game-changer. With no C&C server to target, bots capable of sending encrypted packets and the possibility of any peer on the network suddenly becoming the de facto leader of the botnet, Nugache, Dittrich knew, would be virtually impossible to stop.

[…]

Nugache, and its more famous cousin, the Storm Trojan, are not simply the next step in the evolution of malware. They represent a major step forward in both the quality of software that malware authors are producing and in the sophistication of their tactics. Although they’re often referred to as worms, Storm and Nugache are actually Trojans. The Storm creator, for example, sends out millions of spam messages on a semi-regular basis, each containing a link to content on some remote server, normally disguised in a fake pitch for a penny stock, Viagra or relief for victims of a recent natural disaster. When a user clicks on the link, the attacker’s server installs the Storm Trojan on the user’s PC and it’s off and running.

Various worms, viruses, bots and Trojans over the years have had one or two of the features that Storm, Nugache, Rbot and other such programs possess, but none has approached the breadth and depth of their feature sets. Rbot, for example, has more than 100 features that users can choose from when compiling the bot. This means that two different bots compiled from an identical source could have nearly identical feature sets, yet look completely different to an antivirus engine.

[…]

As scary as Storm and Nugache are, the scarier thing is that they represent just the tip of the iceberg. Experts say that there are several malware groups out there right now that are writing custom Trojans, rootkits and attack toolkits to the specifications of their customers. The customers are in turn using the malware not to build worldwide botnets a la Storm, but to attack small slices of a certain industry, such as financial services or health care.

Rizo, a variant of the venerable Rbot, is the poster child for this kind of attack. A Trojan in the style of Nugache and Storm, Rizo has been modified a number of times to meet the requirements of various different attack scenarios. Within the course of a few weeks, different versions of Rizo were used to attack customers of several different banks in South America. Once installed on a user’s PC, it monitors Internet activity and gathers login credentials for online banking sites, which it then sends back to the attacker. It’s standard behavior for these kinds of Trojans, but the amount of specificity and customization involved in the code and the ways in which the author changed it over time are what have researchers worried.

[…]

“I’m pretty sure that there are tactics being shared between the Nugache and Storm authors,” Dittrich said. “There’s a direct lineage from Sdbot to Rbot to Mytob to Bancos. These guys can just sell the Web front-end to these things and the customers can pick their options and then just hit go.”

See also: “Command and control structures in malware: From Handler/Agent to P2P,” by Dave Dittrich and Sven Dietrich, USENIX ;login:, vol. 32, no. 6, December 2007, and “Analysis of the Storm and Nugache Trojans: P2P is here,” Sam Stover, Dave Dittrich, John Hernandez, and Sven Dietrich, USENIX ;login:, vol. 32, no. 6, December 2007. The second link is available to USENIX members only, unfortunately.

Posted on December 31, 2007 at 7:19 AM • View Comments

Airport Security Study

Surprising nobody, a new study concludes that airport security isn’t helping:

A team at the Harvard School of Public Health could not find any studies showing whether the time-consuming process of X-raying carry-on luggage prevents hijackings or attacks.

They also found no evidence to suggest that making passengers take off their shoes and confiscating small items prevented any incidents.

[…]

The researchers said it would be interesting to apply medical standards to airport security. Screening programs for illnesses like cancer are usually not broadly instituted unless they have been shown to work.

Note the defense by the TSA:

“Even without clear evidence of the accuracy of testing, the Transportation Security Administration defended its measures by reporting that more than 13 million prohibited items were intercepted in one year,” the researchers added. “Most of these illegal items were lighters.”

This is where the TSA has it completely backwards. The goal isn’t to confiscate prohibited items. The goal is to prevent terrorism on airplanes. When the TSA confiscates millions of lighters from innocent people, that’s a security failure. The TSA is reacting to non-threats. The TSA is reacting to false alarms. Now you can argue that this level of failures is necessary to make people safer, but it’s certainly not evidence that people are safer.

For example, does anyone think that the TSA’s vigilance regarding pies is anything other than a joke?

Here’s the actual paper from the British Medical Journal:

Of course, we are not proposing that money spent on unconfirmed but politically comforting efforts to identify and seize water bottles and skin moisturisers should be diverted to research on cancer or malaria vaccines. But what would the National Screening Committee recommend on airport screening? Like mammography in the 1980s, or prostate specific antigen testing and computer tomography for detecting lung cancer more recently, we would like to open airport security screening to public and academic debate. Rigorously evaluating the current system is just the first step to building a future airport security programme that is more user friendly and cost effective, and that ultimately protects passengers from realistic threats.

I talked about airport security at length with Kip Hawley, the head of the TSA, here.

Posted on December 27, 2007 at 6:28 AM • View Comments

Anonymity and the Netflix Dataset

Last year, Netflix published 10 million movie rankings by 500,000 customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using. The data was anonymized by removing personal details and replacing names with random numbers, to protect the privacy of the recommenders.

Arvind Narayanan and Vitaly Shmatikov, researchers at the University of Texas at Austin, de-anonymized some of the Netflix data by comparing rankings and timestamps with public information in the Internet Movie Database, or IMDb.

Their research (.pdf) illustrates some inherent security problems with anonymous data, but first it’s important to explain what they did and did not do.

They did not reverse the anonymity of the entire Netflix dataset. What they did was reverse the anonymity of the Netflix dataset for those sampled users who also entered some movie rankings, under their own names, in the IMDb. (While IMDb’s records are public, crawling the site to get them is against the IMDb’s terms of service, so the researchers used a representative few to prove their algorithm.)

The point of the research was to demonstrate how little information is required to de-anonymize information in the Netflix dataset.

On one hand, isn’t that sort of obvious? The risks of anonymous databases have been written about before, such as in this 2001 paper published in an IEEE journal. The researchers working with the anonymous Netflix data didn’t painstakingly figure out people’s identities—as others did with the AOL search database last year—they just compared it with an already identified subset of similar data: a standard data-mining technique.

But as opportunities for this kind of analysis pop up more frequently, lots of anonymous data could end up at risk.

Someone with access to an anonymous dataset of telephone records, for example, might partially de-anonymize it by correlating it with a catalog merchants’ telephone order database. Or Amazon’s online book reviews could be the key to partially de-anonymizing a public database of credit card purchases, or a larger database of anonymous book reviews.

Google, with its database of users’ internet searches, could easily de-anonymize a public database of internet purchases, or zero in on searches of medical terms to de-anonymize a public health database. Merchants who maintain detailed customer and purchase information could use their data to partially de-anonymize any large search engine’s data, if it were released in an anonymized form. A data broker holding databases of several companies might be able to de-anonymize most of the records in those databases.

What the University of Texas researchers demonstrate is that this process isn’t hard, and doesn’t require a lot of data. It turns out that if you eliminate the top 100 movies everyone watches, our movie-watching habits are all pretty individual. This would certainly hold true for our book reading habits, our internet shopping habits, our telephone habits and our web searching habits.

The obvious countermeasures for this are, sadly, inadequate. Netflix could have randomized its dataset by removing a subset of the data, changing the timestamps or adding deliberate errors into the unique ID numbers it used to replace the names. It turns out, though, that this only makes the problem slightly harder. Narayanan’s and Shmatikov’s de-anonymization algorithm is surprisingly robust, and works with partial data, data that has been perturbed, even data with errors in it.

With only eight movie ratings (of which two may be completely wrong), and dates that may be up to two weeks in error, they can uniquely identify 99 percent of the records in the dataset. After that, all they need is a little bit of identifiable data: from the IMDb, from your blog, from anywhere. The moral is that it takes only a small named database for someone to pry the anonymity off a much larger anonymous database.

Other research reaches the same conclusion. Using public anonymous data from the 1990 census, Latanya Sweeney found that 87 percent of the population in the United States, 216 million of 248 million, could likely be uniquely identified by their five-digit ZIP code, combined with their gender and date of birth. About half of the U.S. population is likely identifiable by gender, date of birth and the city, town or municipality in which the person resides. Expanding the geographic scope to an entire county reduces that to a still-significant 18 percent. “In general,” the researchers wrote, “few characteristics are needed to uniquely identify a person.”

Stanford University researchers reported similar results using 2000 census data. It turns out that date of birth, which (unlike birthday month and day alone) sorts people into thousands of different buckets, is incredibly valuable in disambiguating people.

This has profound implications for releasing anonymous data. On one hand, anonymous data is an enormous boon for researchers—AOL did a good thing when it released its anonymous dataset for research purposes, and it’s sad that the CTO resigned and an entire research team was fired after the public outcry. Large anonymous databases of medical data are enormously valuable to society: for large-scale pharmacology studies, long-term follow-up studies and so on. Even anonymous telephone data makes for fascinating research.

On the other hand, in the age of wholesale surveillance, where everyone collects data on us all the time, anonymization is very fragile and riskier than it initially seems.

Like everything else in security, anonymity systems shouldn’t be fielded before being subjected to adversarial attacks. We all know that it’s folly to implement a cryptographic system before it’s rigorously attacked; why should we expect anonymity systems to be any different? And, like everything else in security, anonymity is a trade-off. There are benefits, and there are corresponding risks.

Narayanan and Shmatikov are currently working on developing algorithms and techniques that enable the secure release of anonymous datasets like Netflix’s. That’s a research result we can all benefit from.

This essay originally appeared on Wired.com.

Posted on December 18, 2007 at 5:53 AM • View Comments

Security-Breach Notification Laws

Interesting study on the effects of security-breach notification laws in the U.S.:

This study surveys the literature on changes in the information security world and significantly expands upon it with qualitative data from seven in-depth discussions with information security officers. These interviews focused on the most important factors driving security investment at their organizations and how security breach notification laws fit into that list. Often missing from the debate is that, regardless of the risk of identity theft and alleged consumer apathy towards notices, the simple fact of having to publicly notify causes organizations to implement stronger security standards that protect personal information.

The interviews showed that security breaches drive information exchange among security professionals, causing them to engage in discussions about information security issues that may arise at their and others’ organizations. For example, we found that some CSOs summarize news reports from breaches at other organizations and circulate them to staff with “lessons learned” from each incident. In some cases, organizations have a “that could have been us” moment, and patch systems with similar vulnerabilities to the entity that had a breach.

Breach notification laws have significantly contributed to heightened awareness of the importance of information security throughout all levels of a business organization and to development of a level of cooperation among different departments within an organization hat resulted from the need to monitor data access for the purposes of detecting, investigating, and reporting breaches. CSOs reported that breach notification duties empowered them to implement new access controls, auditing measures, and encryption. Aside from the organization’s own efforts at complying with notification laws, reports of breaches at other organizations help information officers maintain that sense of awareness.

Posted on December 12, 2007 at 1:53 PM • View Comments

The Strange Story of Dual_EC_DRBG

Random numbers are critical for cryptography: for encryption keys, random authentication challenges, initialization vectors, nonces, key-agreement schemes, generating prime numbers and so on. Break the random-number generator, and most of the time you break the entire security system. Which is why you should worry about a new random-number standard that includes an algorithm that is slow, badly designed and just might contain a backdoor for the National Security Agency.

Generating random numbers isn’t easy, and researchers have discovered lots of problems and attacks over the years. A recent paper found a flaw in the Windows 2000 random-number generator. Another paper found flaws in the Linux random-number generator. Back in 1996, an early version of SSL was broken because of flaws in its random-number generator. With John Kelsey and Niels Ferguson in 1999, I co-authored Yarrow, a random-number generator based on our own cryptanalysis work. I improved this design four years later—and renamed it Fortuna—in the book Practical Cryptography, which I co-authored with Ferguson.

The U.S. government released a new official standard for random-number generators this year, and it will likely be followed by software and hardware developers around the world. Called NIST Special Publication 800-90 (.pdf), the 130-page document contains four different approved techniques, called DRBGs, or “Deterministic Random Bit Generators.” All four are based on existing cryptographic primitives. One is based on hash functions, one on HMAC, one on block ciphers and one on elliptic curves. It’s smart cryptographic design to use only a few well-trusted cryptographic primitives, so building a random-number generator out of existing parts is a good thing.

But one of those generators—the one based on elliptic curves—is not like the others. Called Dual_EC_DRBG, not only is it a mouthful to say, it’s also three orders of magnitude slower than its peers. It’s in the standard only because it’s been championed by the NSA, which first proposed it years ago in a related standardization project at the American National Standards Institute.

The NSA has always been intimately involved in U.S. cryptography standards—it is, after all, expert in making and breaking secret codes. So the agency’s participation in the NIST (the U.S. Commerce Department’s National Institute of Standards and Technology) standard is not sinister in itself. It’s only when you look under the hood at the NSA’s contribution that questions arise.

Problems with Dual_EC_DRBG were first described in early 2006. The math is complicated, but the general point is that the random numbers it produces have a small bias. The problem isn’t large enough to make the algorithm unusable—and Appendix E of the NIST standard describes an optional work-around to avoid the issue—but it’s cause for concern. Cryptographers are a conservative bunch: We don’t like to use algorithms that have even a whiff of a problem.

But today there’s an even bigger stink brewing around Dual_EC_DRBG. In an informal presentation (.pdf) at the CRYPTO 2007 conference in August, Dan Shumow and Niels Ferguson showed that the algorithm contains a weakness that can only be described as a backdoor.

This is how it works: There are a bunch of constants—fixed numbers—in the standard used to define the algorithm’s elliptic curve. These constants are listed in Appendix A of the NIST publication, but nowhere is it explained where they came from.

What Shumow and Ferguson showed is that these numbers have a relationship with a second, secret set of numbers that can act as a kind of skeleton key. If you know the secret numbers, you can predict the output of the random-number generator after collecting just 32 bytes of its output. To put that in real terms, you only need to monitor one TLS internet encryption connection in order to crack the security of that protocol. If you know the secret numbers, you can completely break any instantiation of Dual_EC_DRBG.

The researchers don’t know what the secret numbers are. But because of the way the algorithm works, the person who produced the constants might know; he had the mathematical opportunity to produce the constants and the secret numbers in tandem.

Of course, we have no way of knowing whether the NSA knows the secret numbers that break Dual_EC-DRBG. We have no way of knowing whether an NSA employee working on his own came up with the constants—and has the secret numbers. We don’t know if someone from NIST, or someone in the ANSI working group, has them. Maybe nobody does.

We don’t know where the constants came from in the first place. We only know that whoever came up with them could have the key to this backdoor. And we know there’s no way for NIST—or anyone else—to prove otherwise.

This is scary stuff indeed.

Even if no one knows the secret numbers, the fact that the backdoor is present makes Dual_EC_DRBG very fragile. If someone were to solve just one instance of the algorithm’s elliptic-curve problem, he would effectively have the keys to the kingdom. He could then use it for whatever nefarious purpose he wanted. Or he could publish his result, and render every implementation of the random-number generator completely insecure.

It’s possible to implement Dual_EC_DRBG in such a way as to protect it against this backdoor, by generating new constants with another secure random-number generator and then publishing the seed. This method is even in the NIST document, in Appendix A. But the procedure is optional, and my guess is that most implementations of the Dual_EC_DRBG won’t bother.

If this story leaves you confused, join the club. I don’t understand why the NSA was so insistent about including Dual_EC_DRBG in the standard. It makes no sense as a trap door: It’s public, and rather obvious. It makes no sense from an engineering perspective: It’s too slow for anyone to willingly use it. And it makes no sense from a backwards-compatibility perspective: Swapping one random-number generator for another is easy.

My recommendation, if you’re in need of a random-number generator, is not to use Dual_EC_DRBG under any circumstances. If you have to use something in SP 800-90, use CTR_DRBG or Hash_DRBG.

In the meantime, both NIST and the NSA have some explaining to do.

This essay originally appeared on Wired.com.

Posted on November 15, 2007 at 6:08 AM • View Comments

Architecture and Anti-Terrorist Paranoia

This is really interesting:

(In)Security explores a new design vocabulary in direct response to the climate of fear and paranoia that currently drives the program and aesthetic of much contemporary urban design. The project addresses the current and future state of security in and around the Wall Street financial district, creating viable security alternatives while simultaneously questioning our nation’s current philosophy that security = freedom.

Full paper here.

Posted on November 1, 2007 at 11:47 AM • View Comments

Understanding the Black Market in Internet Crime

Here’s a interesting paper from Carnegie Mellon University: “An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants.”

The paper focuses on the large illicit market that specializes in the commoditization of activities in support of Internet-based crime. The main goal of the paper was to understand and measure how these markets function, and discuss the incentives of the various market entities. Using a dataset collected over seven months and comprising over 13 million messages, they were able to categorize the market’s participants, the goods and services advertised, and the asking prices for selected interesting goods.

Really cool stuff.

Unfortunately, the data is extremely noisy and so far the authors have no way to cross-validate it, so it is difficult to make any strong conclusions.

The press focused on just one thing: a discussion of general ways to disrupt the market. Contrary to the claims of the article, the authors have not built any tools to disrupt the markets.