data mining Archives - Schneier on Security

Entries Tagged "data mining"

Page 1 of 7

Excel Data Forensics

In this detailed article about academic plagiarism are some interesting details about how to do data forensics on Excel files. It really needs the graphics to understand, so see the description at the link.

(And, yes, an author of a paper on dishonesty is being accused of dishonesty. There’s more evidence.)

EDITED TO ADD (7/13): Guardian article.

Posted on June 26, 2023 at 11:36 AM • View Comments

Commercial Location Data Used to Out Priest

A Catholic priest was outed through commercially available surveillance data. Vice has a good analysis:

The news starkly demonstrates not only the inherent power of location data, but how the chance to wield that power has trickled down from corporations and intelligence agencies to essentially any sort of disgruntled, unscrupulous, or dangerous individual. A growing market of data brokers that collect and sell data from countless apps has made it so that anyone with a bit of cash and effort can figure out which phone in a so-called anonymized dataset belongs to a target, and abuse that information.

There is a whole industry devoted to re-identifying anonymized data. This was something that Snowden showed that the NSA could do. Now it’s available to everyone.

Posted on July 23, 2021 at 8:58 AM • View Comments

Companies that Scrape Your Email

Motherboard has a long article on apps—Edison, Slice, and Cleanfox—that spy on your email by scraping your screen, and then sell that information to others:

Some of the companies listed in the J.P. Morgan document sell data sourced from “personal inboxes,” the document adds. A spokesperson for J.P. Morgan Research, the part of the company that created the document, told Motherboard that the research “is intended for institutional clients.”

That document describes Edison as providing “consumer purchase metrics including brand loyalty, wallet share, purchase preferences, etc.” The document adds that the “source” of the data is the “Edison Email App.”

[…]

A dataset obtained by Motherboard shows what some of the information pulled from free email app users’ inboxes looks like. A spreadsheet containing data from Rakuten’s Slice, an app that scrapes a user’s inbox so they can better track packages or get their money back once a product goes down in price, contains the item that an app user bought from a specific brand, what they paid, and an unique identification code for each buyer.

Posted on February 12, 2020 at 10:26 AM • View Comments

How Political Campaigns Use Personal Data

Really interesting report from Tactical Tech.

Data-driven technologies are an inevitable feature of modern political campaigning. Some argue that they are a welcome addition to politics as normal and a necessary and modern approach to democratic processes; others say that they are corrosive and diminish trust in already flawed political systems. The use of these technologies in political campaigning is not going away; in fact, we can only expect their sophistication and prevalence to grow. For this reason, the techniques and methods need to be reviewed outside the dichotomy of ‘good’ or ‘bad’ and beyond the headlines of ‘disinformation campaigns’.

All the data-driven methods presented in this guide would not exist without the commercial digital marketing and advertising industry. From analysing behavioural data to A/B testing and from geotargeting to psychometric profiling, political parties are using the same techniques to sell political candidates to voters that companies use to sell shoes to consumers. The question is, is that appropriate? And what impact does it have not only on individual voters, who may or may not be persuad-ed, but on the political environment as a whole?

The practice of political strategists selling candidates as brands is not new. Vance Packard wrote about the ‘depth probing’ techniques of ‘political persuaders’ as early as 1957. In his book, ‘The Hidden Persuaders’, Packard described political strategies designed to sell candidates to voters ‘like toothpaste’, and how public relations directors at the time boasted that ‘scientific methods take the guesswork out of politics’.5 In this sense, what we have now is a logical progression of the digitisation of marketing techniques and political persuasion techniques.

Posted on April 3, 2019 at 6:26 AM • View Comments

Zeynep Tufekci on Facebook and Cambridge Analytica

Zeynep Tufekci is particularly cogent about Facebook and Cambridge Analytica.

Several news outlets asked me to write about this issue. I didn’t, because 1) my book manuscript is due on Monday (finally!), and 2) I knew Zeynep would say what I would say, only better.

Posted on March 23, 2018 at 2:21 PM • View Comments

Extracting Secrets from Machine Learning Systems

This is fascinating research about how the underlying training data for a machine-learning system can be inadvertently exposed. Basically, if a machine-learning system trains on a dataset that contains secret information, in some cases an attacker can query the system to extract that secret information. My guess is that there is a lot more research to be done here.

EDITED TO ADD (3/9): Some interesting links on the subject.

Posted on March 5, 2018 at 5:20 AM • View Comments

Tracking People Without GPS

Interesting research:

The trick in accurately tracking a person with this method is finding out what kind of activity they’re performing. Whether they’re walking, driving a car, or riding in a train or airplane, it’s pretty easy to figure out when you know what you’re looking for.

The sensors can determine how fast a person is traveling and what kind of movements they make. Moving at a slow pace in one direction indicates walking. Going a little bit quicker but turning at 90-degree angles means driving. Faster yet, we’re in train or airplane territory. Those are easy to figure out based on speed and air pressure.

After the app determines what you’re doing, it uses the information it collects from the sensors. The accelerometer relays your speed, the magnetometer tells your relation to true north, and the barometer offers up the air pressure around you and compares it to publicly available information. It checks in with The Weather Channel to compare air pressure data from the barometer to determine how far above sea level you are. Google Maps and data offered by the US Geological Survey Maps provide incredibly detailed elevation readings.

Once it has gathered all of this information and determined the mode of transportation you’re currently taking, it can then begin to narrow down where you are. For flights, four algorithms begin to estimate the target’s location and narrows down the possibilities until its error rate hits zero.

If you’re driving, it can be even easier. The app knows the time zone you’re in based on the information your phone has provided to it. It then accesses information from your barometer and magnetometer and compares it to information from publicly available maps and weather reports. After that, it keeps track of the turns you make. With each turn, the possible locations whittle down until it pinpoints exactly where you are.

To demonstrate how accurate it is, researchers did a test run in Philadelphia. It only took 12 turns before the app knew exactly where the car was.

This is a good example of how powerful synthesizing information from disparate data sources can be. We spend too much time worried about individual data collection systems, and not enough about analysis techniques of those systems.

Research paper.

Posted on December 15, 2017 at 6:18 AM • View Comments

Roombas will Spy on You

The company that sells the Roomba autonomous vacuum wants to sell the data about your home that it collects.

Some questions:

What happens if a Roomba user consents to the data collection and later sells his or her home—especially furnished—and now the buyers of the data have a map of a home that belongs to someone who didn’t consent, Mr. Gidari asked. How long is the data kept? If the house burns down, can the insurance company obtain the data and use it to identify possible causes? Can the police use it after a robbery?

EDITED TO ADD (6/29): Roomba is backtracking—for now.

Posted on July 26, 2017 at 6:06 AM • View Comments

Companies Not Saving Your Data

There’s a new trend in Silicon Valley startups; companies are not collecting and saving data on their customers:

In Silicon Valley, there’s a new emphasis on putting up barriers to government requests for data. The Apple-FBI case and its aftermath have tech firms racing to employ a variety of tools that would place customer information beyond the reach of a government-ordered search.

The trend is a striking reversal of a long-standing article of faith in the data-hungry tech industry, where companies including Google and the latest start-ups have predicated success on the ability to hoover up as much information as possible about consumers.

Now, some large tech firms are increasingly offering services to consumers that rely far less on collecting data. The sea change is even becoming evident among early-stage companies that see holding so much data as more of a liability than an asset, given the risk that cybercriminals or government investigators might come knocking.

Start-ups that once hesitated to invest in security are now repurposing limited resources to build technical systems to shed data, even if it hinders immediate growth.

The article also talks about companies providing customers with end-to-end encryption.

I believe that all this data isn’t nearly as valuable as the big-data people are promising. Now that companies are recognizing that it is also a liability, I think we’re going to see more rational trade-offs about what to keep—and for how long—and what to discard.

Posted on May 25, 2016 at 2:37 PM • View Comments

Observations on the Surveillance that Resulted in the Capture of Salah Abdeslam

Interesting analysis from The Grugq:

Bottom Line Up Front

Intelligence agencies must cooperate more rapidly and proactively to counter ISIS’ rapid and haphazard operational tempo.
Clandestine operatives must rely on support networks that include overt members of the public. These networks are easily mapped out based on metadata available to nation state level security forces.
Fugitives should learn to cook if they want to minimize their footprint and improve their security.
Exposure of clandestine networks is inevitable, given modern data sources. Only extremely disciplined non-organic organizations can hope to survive for long.

Details at the link.

That third item is related to the “unusually large” pizza order that alerted the police that there were more people in the house than should be.

The bottom bottom line is that tracking people, and tracing groups of people, has become easy because of all the unencrypted metadata we generate everywhere.

Posted on March 22, 2016 at 6:37 AM • View Comments

1 2 3 … 7 Next→

Sidebar photo of Bruce Schneier by Joe MacInnis.