Entries Tagged "data collection"

Page 4 of 7

Palantir's Surveillance Service for Law Enforcement

Motherboard got its hands on Palantir’s Gotham user’s manual, which is used by the police to get information on people:

The Palantir user guide shows that police can start with almost no information about a person of interest and instantly know extremely intimate details about their lives. The capabilities are staggering, according to the guide:

  • If police have a name that’s associated with a license plate, they can use automatic license plate reader data to find out where they’ve been, and when they’ve been there. This can give a complete account of where someone has driven over any time period.
  • With a name, police can also find a person’s email address, phone numbers, current and previous addresses, bank accounts, social security number(s), business relationships, family relationships, and license information like height, weight, and eye color, as long as it’s in the agency’s database.
  • The software can map out a person’s family members and business associates of a suspect, and theoretically, find the above information about them, too.

All of this information is aggregated and synthesized in a way that gives law enforcement nearly omniscient knowledge over any suspect they decide to surveil.

Read the whole article—it has a lot of details. This seems like a commercial version of the NSA’s XKEYSCORE.

Boing Boing post.

Meanwhile:

The FBI wants to gather more information from social media. Today, it issued a call for contracts for a new social media monitoring tool. According to a request-for-proposals (RFP), it’s looking for an “early alerting tool” that would help it monitor terrorist groups, domestic threats, criminal activity and the like.

The tool would provide the FBI with access to the full social media profiles of persons-of-interest. That could include information like user IDs, emails, IP addresses and telephone numbers. The tool would also allow the FBI to track people based on location, enable persistent keyword monitoring and provide access to personal social media history. According to the RFP, “The mission-critical exploitation of social media will enable the Bureau to detect, disrupt, and investigate an ever growing diverse range of threats to U.S. National interests.”

Posted on July 15, 2019 at 6:12 AMView Comments

iPhone Apps Surreptitiously Communicated with Unknown Servers

Long news article (alternate source) on iPhone privacy, specifically the enormous amount of data your apps are collecting without your knowledge. A lot of this happens in the middle of the night, when you’re probably not otherwise using your phone:

IPhone apps I discovered tracking me by passing information to third parties ­ just while I was asleep ­ include Microsoft OneDrive, Intuit’s Mint, Nike, Spotify, The Washington Post and IBM’s the Weather Channel. One app, the crime-alert service Citizen, shared personally identifiable information in violation of its published privacy policy.

And your iPhone doesn’t only feed data trackers while you sleep. In a single week, I encountered over 5,400 trackers, mostly in apps, not including the incessant Yelp traffic.

Posted on June 25, 2019 at 6:35 AMView Comments

Data, Surveillance, and the AI Arms Race

According to foreign policy experts and the defense establishment, the United States is caught in an artificial intelligence arms race with China—one with serious implications for national security. The conventional version of this story suggests that the United States is at a disadvantage because of self-imposed restraints on the collection of data and the privacy of its citizens, while China, an unrestrained surveillance state, is at an advantage. In this vision, the data that China collects will be fed into its systems, leading to more powerful AI with capabilities we can only imagine today. Since Western countries can’t or won’t reap such a comprehensive harvest of data from their citizens, China will win the AI arms race and dominate the next century.

This idea makes for a compelling narrative, especially for those trying to justify surveillance—whether government- or corporate-run. But it ignores some fundamental realities about how AI works and how AI research is conducted.

Thanks to advances in machine learning, AI has flipped from theoretical to practical in recent years, and successes dominate public understanding of how it works. Machine learning systems can now diagnose pneumonia from X-rays, play the games of go and poker, and read human lips, all better than humans. They’re increasingly watching surveillance video. They are at the core of self-driving car technology and are playing roles in both intelligence-gathering and military operations. These systems monitor our networks to detect intrusions and look for spam and malware in our email.

And it’s true that there are differences in the way each country collects data. The United States pioneered “surveillance capitalism,” to use the Harvard University professor Shoshana Zuboff’s term, where data about the population is collected by hundreds of large and small companies for corporate advantage—and mutually shared or sold for profit The state picks up on that data, in cases such as the Centers for Disease Control and Prevention’s use of Google search data to map epidemics and evidence shared by alleged criminals on Facebook, but it isn’t the primary user.

China, on the other hand, is far more centralized. Internet companies collect the same sort of data, but it is shared with the government, combined with government-collected data, and used for social control. Every Chinese citizen has a national ID number that is demanded by most services and allows data to easily be tied together. In the western region of Xinjiang, ubiquitous surveillance is used to oppress the Uighur ethnic minority—although at this point there is still a lot of human labor making it all work. Everyone expects that this is a test bed for the entire country.

Data is increasingly becoming a part of control for the Chinese government. While many of these plans are aspirational at the moment—there isn’t, as some have claimed, a single “social credit score,” but instead future plans to link up a wide variety of systems—data collection is universally pushed as essential to the future of Chinese AI. One executive at search firm Baidu predicted that the country’s connected population will provide them with the raw data necessary to become the world’s preeminent tech power. China’s official goal is to become the world AI leader by 2030, aided in part by all of this massive data collection and correlation.

This all sounds impressive, but turning massive databases into AI capabilities doesn’t match technological reality. Current machine learning techniques aren’t all that sophisticated. All modern AI systems follow the same basic methods. Using lots of computing power, different machine learning models are tried, altered, and tried again. These systems use a large amount of data (the training set) and an evaluation function to distinguish between those models and variations that work well and those that work less well. After trying a lot of models and variations, the system picks the one that works best. This iterative improvement continues even after the system has been fielded and is in use.

So, for example, a deep learning system trying to do facial recognition will have multiple layers (hence the notion of “deep”) trying to do different parts of the facial recognition task. One layer will try to find features in the raw data of a picture that will help find a face, such as changes in color that will indicate an edge. The next layer might try to combine these lower layers into features like shapes, looking for round shapes inside of ovals that indicate eyes on a face. The different layers will try different features and will be compared by the evaluation function until the one that is able to give the best results is found, in a process that is only slightly more refined than trial and error.

Large data sets are essential to making this work, but that doesn’t mean that more data is automatically better or that the system with the most data is automatically the best system. Train a facial recognition algorithm on a set that contains only faces of white men, and the algorithm will have trouble with any other kind of face. Use an evaluation function that is based on historical decisions, and any past bias is learned by the algorithm. For example, mortgage loan algorithms trained on historic decisions of human loan officers have been found to implement redlining. Similarly, hiring algorithms trained on historical data manifest the same sexism as human staff often have. Scientists are constantly learning about how to train machine learning systems, and while throwing a large amount of data and computing power at the problem can work, more subtle techniques are often more successful. All data isn’t created equal, and for effective machine learning, data has to be both relevant and diverse in the right ways.

Future research advances in machine learning are focused on two areas. The first is in enhancing how these systems distinguish between variations of an algorithm. As different versions of an algorithm are run over the training data, there needs to be some way of deciding which version is “better.” These evaluation functions need to balance the recognition of an improvement with not over-fitting to the particular training data. Getting functions that can automatically and accurately distinguish between two algorithms based on minor differences in the outputs is an art form that no amount of data can improve.

The second is in the machine learning algorithms themselves. While much of machine learning depends on trying different variations of an algorithm on large amounts of data to see which is most successful, the initial formulation of the algorithm is still vitally important. The way the algorithms interact, the types of variations attempted, and the mechanisms used to test and redirect the algorithms are all areas of active research. (An overview of some of this work can be found here; even trying to limit the research to 20 papers oversimplifies the work being done in the field.) None of these problems can be solved by throwing more data at the problem.

The British AI company DeepMind’s success in teaching a computer to play the Chinese board game go is illustrative. Its AlphaGo computer program became a grandmaster in two steps. First, it was fed some enormous number of human-played games. Then, the game played itself an enormous number of times, improving its own play along the way. In 2016, AlphaGo beat the grandmaster Lee Sedol four games to one.

While the training data in this case, the human-played games, was valuable, even more important was the machine learning algorithm used and the function that evaluated the relative merits of different game positions. Just one year later, DeepMind was back with a follow-on system: AlphaZero. This go-playing computer dispensed entirely with the human-played games and just learned by playing against itself over and over again. It plays like an alien. (It also became a grandmaster in chess and shogi.)

These are abstract games, so it makes sense that a more abstract training process works well. But even something as visceral as facial recognition needs more than just a huge database of identified faces in order to work successfully. It needs the ability to separate a face from the background in a two-dimensional photo or video and to recognize the same face in spite of changes in angle, lighting, or shadows. Just adding more data may help, but not nearly as much as added research into what to do with the data once we have it.

Meanwhile, foreign-policy and defense experts are talking about AI as if it were the next nuclear arms race, with the country that figures it out best or first becoming the dominant superpower for the next century. But that didn’t happen with nuclear weapons, despite research only being conducted by governments and in secret. It certainly won’t happen with AI, no matter how much data different nations or companies scoop up.

It is true that China is investing a lot of money into artificial intelligence research: The Chinese government believes this will allow it to leapfrog other countries (and companies in those countries) and become a major force in this new and transformative area of computing—and it may be right. On the other hand, much of this seems to be a wasteful boondoggle. Slapping “AI” on pretty much anything is how to get funding. The Chinese Ministry of Education, for instance, promises to produce “50 world-class AI textbooks,” with no explanation of what that means.

In the democratic world, the government is neither the leading researcher nor the leading consumer of AI technologies. AI research is much more decentralized and academic, and it is conducted primarily in the public eye. Research teams keep their training data and models proprietary but freely publish their machine learning algorithms. If you wanted to work on machine learning right now, you could download Microsoft’s Cognitive Toolkit, Google’s Tensorflow, or Facebook’s Pytorch. These aren’t toy systems; these are the state-of-the art machine learning platforms.

AI is not analogous to the big science projects of the previous century that brought us the atom bomb and the moon landing. AI is a science that can be conducted by many different groups with a variety of different resources, making it closer to computer design than the space race or nuclear competition. It doesn’t take a massive government-funded lab for AI research, nor the secrecy of the Manhattan Project. The research conducted in the open science literature will trump research done in secret because of the benefits of collaboration and the free exchange of ideas.

While the United States should certainly increase funding for AI research, it should continue to treat it as an open scientific endeavor. Surveillance is not justified by the needs of machine learning, and real progress in AI doesn’t need it.

This essay was written with Jim Waldo, and previously appeared in Foreign Policy.

Posted on June 17, 2019 at 5:52 AMView Comments

The Concept of "Return on Data"

This law review article by Noam Kolt, titled “Return on Data,” proposes an interesting new way of thinking of privacy law.

Abstract: Consumers routinely supply personal data to technology companies in exchange for services. Yet, the relationship between the utility (U) consumers gain and the data (D) they supply—”return on data” (ROD)—remains largely unexplored. Expressed as a ratio, ROD = U / D. While lawmakers strongly advocate protecting consumer privacy, they tend to overlook ROD. Are the benefits of the services enjoyed by consumers, such as social networking and predictive search, commensurate with the value of the data extracted from them? How can consumers compare competing data-for-services deals? Currently, the legal frameworks regulating these transactions, including privacy law, aim primarily to protect personal data. They treat data protection as a standalone issue, distinct from the benefits which consumers receive. This article suggests that privacy concerns should not be viewed in isolation, but as part of ROD. Just as companies can quantify return on investment (ROI) to optimize investment decisions, consumers should be able to assess ROD in order to better spend and invest personal data. Making data-for-services transactions more transparent will enable consumers to evaluate the merits of these deals, negotiate their terms and make more informed decisions. Pivoting from the privacy paradigm to ROD will both incentivize data-driven service providers to offer consumers higher ROD, as well as create opportunities for new market entrants.

Posted on May 20, 2019 at 1:30 PMView Comments

How Political Campaigns Use Personal Data

Really interesting report from Tactical Tech.

Data-driven technologies are an inevitable feature of modern political campaigning. Some argue that they are a welcome addition to politics as normal and a necessary and modern approach to democratic processes; others say that they are corrosive and diminish trust in already flawed political systems. The use of these technologies in political campaigning is not going away; in fact, we can only expect their sophistication and prevalence to grow. For this reason, the techniques and methods need to be reviewed outside the dichotomy of ‘good’ or ‘bad’ and beyond the headlines of ‘disinformation campaigns’.

All the data-driven methods presented in this guide would not exist without the commercial digital marketing and advertising industry. From analysing behavioural data to A/B testing and from geotargeting to psychometric profiling, political parties are using the same techniques to sell political candidates to voters that companies use to sell shoes to consumers. The question is, is that appropriate? And what impact does it have not only on individual voters, who may or may not be persuad-ed, but on the political environment as a whole?

The practice of political strategists selling candidates as brands is not new. Vance Packard wrote about the ‘depth probing’ techniques of ‘political persuaders’ as early as 1957. In his book, ‘The Hidden Persuaders’, Packard described political strategies designed to sell candidates to voters ‘like toothpaste’, and how public relations directors at the time boasted that ‘scientific methods take the guesswork out of politics’.5 In this sense, what we have now is a logical progression of the digitisation of marketing techniques and political persuasion techniques.

Posted on April 3, 2019 at 6:26 AMView Comments

Judging Facebook's Privacy Shift

Facebook is making a new and stronger commitment to privacy. Last month, the company hired three of its most vociferous critics and installed them in senior technical positions. And on Wednesday, Mark Zuckerberg wrote that the company will pivot to focus on private conversations over the public sharing that has long defined the platform, even while conceding that “frankly we don’t currently have a strong reputation for building privacy protective services.”

There is ample reason to question Zuckerberg’s pronouncement: The company has made—and broken—many privacy promises over the years. And if you read his 3,000-word post carefully, Zuckerberg says nothing about changing Facebook’s surveillance capitalism business model. All the post discusses is making private chats more central to the company, which seems to be a play for increased market dominance and to counter the Chinese company WeChat.

In security and privacy, the devil is always in the details—and Zuckerberg’s post provides none. But we’ll take him at his word and try to fill in some of the details here. What follows is a list of changes we should expect if Facebook is serious about changing its business model and improving user privacy.

How Facebook treats people on its platform

Increased transparency over advertiser and app accesses to user data. Today, Facebook users can download and view much of the data the company has about them. This is important, but it doesn’t go far enough. The company could be more transparent about what data it shares with advertisers and others and how it allows advertisers to select users they show ads to. Facebook could use its substantial skills in usability testing to help people understand the mechanisms advertisers use to show them ads or the reasoning behind what it chooses to show in user timelines. It could deliver on promises in this area.

Better—and more usable—privacy options. Facebook users have limited control over how their data is shared with other Facebook users and almost no control over how it is shared with Facebook’s advertisers, which are the company’s real customers. Moreover, the controls are buried deep behind complex and confusing menu options. To be fair, some of this is because privacy is complex, and it’s hard to understand the results of different options. But much of this is deliberate; Facebook doesn’t want its users to make their data private from other users.

The company could give people better control over how—and whether—their data is used, shared, and sold. For example, it could allow users to turn off individually targeted news and advertising. By this, we don’t mean simply making those advertisements invisible; we mean turning off the data flows into those tailoring systems. Finally, since most users stick to the default options when it comes to configuring their apps, a changing Facebook could tilt those defaults toward more privacy, requiring less tailoring most of the time.

More user protection from stalking. “Facebook stalking” is often thought of as “stalking light,” or “harmless.” But stalkers are rarely harmless. Facebook should acknowledge this class of misuse and work with experts to build tools that protect all of its users, especially its most vulnerable ones. Such tools should guide normal people away from creepiness and give victims power and flexibility to enlist aid from sources ranging from advocates to police.

Fully ending real-name enforcement. Facebook’s real-names policy, requiring people to use their actual legal names on the platform, hurts people such as activists, victims of intimate partner violence, police officers whose work makes them targets, and anyone with a public persona who wishes to have control over how they identify to the public. There are many ways Facebook can improve on this, from ending enforcement to allowing verifying pseudonyms for everyone­—not just celebrities like Lady Gaga. Doing so would mark a clear shift.

How Facebook runs its platform

Increased transparency of Facebook’s business practices. One of the hard things about evaluating Facebook is the effort needed to get good information about its business practices. When violations are exposed by the media, as they regularly are, we are all surprised at the different ways Facebook violates user privacy. Most recently, the company used phone numbers provided for two-factor authentication for advertising and networking purposes. Facebook needs to be both explicit and detailed about how and when it shares user data. In fact, a move from discussing “sharing” to discussing “transfers,” “access to raw information,” and “access to derived information” would be a visible improvement.

Increased transparency regarding censorship rules. Facebook makes choices about what content is acceptable on its site. Those choices are controversial, implemented by thousands of low-paid workers quickly implementing unclear rules. These are tremendously hard problems without clear solutions. Even obvious rules like banning hateful words run into challenges when people try to legitimately discuss certain important topics. Whatever Facebook does in this regard, the company needs be more transparent about its processes. It should allow regulators and the public to audit the company’s practices. Moreover, Facebook should share any innovative engineering solutions with the world, much as it currently shares its data center engineering.

Better security for collected user data. There have been numerous examples of attackers targeting cloud service platforms to gain access to user data. Facebook has a large and skilled product security team that says some of the right things. That team needs to be involved in the design trade-offs for features and not just review the near-final designs for flaws. Shutting down a feature based on internal security analysis would be a clear message.

Better data security so Facebook sees less. Facebook eavesdrops on almost every aspect of its users’ lives. On the other hand, WhatsApp—purchased by Facebook in 2014—provides users with end-to-end encrypted messaging. While Facebook knows who is messaging whom and how often, Facebook has no way of learning the contents of those messages. Recently, Facebook announced plans to combine WhatsApp, Facebook Messenger, and Instagram, extending WhatsApp’s security to the consolidated system. Changing course here would be a dramatic and negative signal.

Collecting less data from outside of Facebook. Facebook doesn’t just collect data about you when you’re on the platform. Because its “like” button is on so many other pages, the company can collect data about you when you’re not on Facebook. It even collects what it calls “shadow profiles“—data about you even if you’re not a Facebook user. This data is combined with other surveillance data the company buys, including health and financial data. Collecting and saving less of this data would be a strong indicator of a new direction for the company.

Better use of Facebook data to prevent violence. There is a trade-off between Facebook seeing less and Facebook doing more to prevent hateful and inflammatory speech. Dozens of people have been killed by mob violence because of fake news spread on WhatsApp. If Facebook were doing a convincing job of controlling fake news without end-to-end encryption, then we would expect to hear how it could use patterns in metadata to handle encrypted fake news.

How Facebook manages for privacy

Create a team measured on privacy and trust. Where companies spend their money tells you what matters to them. Facebook has a large and important growth team, but what team, if any, is responsible for privacy, not as a matter of compliance or pushing the rules, but for engineering? Transparency in how it is staffed relative to other teams would be telling.

Hire a senior executive responsible for trust. Facebook’s current team has been focused on growth and revenue. Its one chief security officer, Alex Stamos, was not replaced when he left in 2018, which may indicate that having an advocate for security on the leadership team led to debate and disagreement. Retaining a voice for security and privacy issues at the executive level, before those issues affected users, was a good thing. Now that responsibility is diffuse. It’s unclear how Facebook measures and assesses its own progress and who might be held accountable for failings. Facebook can begin the process of fixing this by designating a senior executive who is responsible for trust.

Engage with regulators. Much of Facebook’s posturing seems to be an attempt to forestall regulation. Facebook sends lobbyists to Washington and other capitals, and until recently the company sent support staff to politician’s offices. It has secret lobbying campaigns against privacy laws. And Facebook has repeatedly violated a 2011 Federal Trade Commission consent order regarding user privacy. Regulating big technical projects is not easy. Most of the people who understand how these systems work understand them because they build them. Societies will regulate Facebook, and the quality of that regulation requires real education of legislators and their staffs. While businesses often want to avoid regulation, any focus on privacy will require strong government oversight. If Facebook is serious about privacy being a real interest, it will accept both government regulation and community input.

User privacy is traditionally against Facebook’s core business interests. Advertising is its business model, and targeted ads sell better and more profitably—and that requires users to engage with the platform as much as possible. Increased pressure on Facebook to manage propaganda and hate speech could easily lead to more surveillance. But there is pressure in the other direction as well, as users equate privacy with increased control over how they present themselves on the platform.

We don’t expect Facebook to abandon its advertising business model, relent in its push for monopolistic dominance, or fundamentally alter its social networking platforms. But the company can give users important privacy protections and controls without abandoning surveillance capitalism. While some of these changes will reduce profits in the short term, we hope Facebook’s leadership realizes that they are in the best long-term interest of the company.

Facebook talks about community and bringing people together. These are admirable goals, and there’s plenty of value (and profit) in having a sustainable platform for connecting people. But as long as the most important measure of success is short-term profit, doing things that help strengthen communities will fall by the wayside. Surveillance, which allows individually targeted advertising, will be prioritized over user privacy. Outrage, which drives engagement, will be prioritized over feelings of belonging. And corporate secrecy, which allows Facebook to evade both regulators and its users, will be prioritized over societal oversight. If Facebook now truly believes that these latter options are critical to its long-term success as a company, we welcome the changes that are forthcoming.

This essay was co-authored with Adam Shostack, and originally appeared on Medium OneZero. We wrote a similar essay in 2002 about judging Microsoft’s then newfound commitment to security.

Posted on March 13, 2019 at 6:51 AMView Comments

Your Personal Data is Already Stolen

In an excellent blog post, Brian Krebs makes clear something I have been saying for a while:

Likewise for individuals, it pays to accept two unfortunate and harsh realities:

Reality #1: Bad guys already have access to personal data points that you may believe should be secret but which nevertheless aren’t, including your credit card information, Social Security number, mother’s maiden name, date of birth, address, previous addresses, phone number, and yes ­ even your credit file.

Reality #2: Any data point you share with a company will in all likelihood eventually be hacked, lost, leaked, stolen or sold ­ usually through no fault of your own. And if you’re an American, it means (at least for the time being) your recourse to do anything about that when it does happen is limited or nil.

[…]

Once you’ve owned both of these realities, you realize that expecting another company to safeguard your security is a fool’s errand, and that it makes far more sense to focus instead on doing everything you can to proactively prevent identity thieves, malicious hackers or other ne’er-do-wells from abusing access to said data.

His advice is good.

Posted on December 6, 2018 at 7:33 AMView Comments

Privacy and Security of Data at Universities

Interesting paper: “Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier,” by Christine Borgman:

Abstract: As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of “grey data” about individuals in their daily activities of research, teaching, learning, services, and administration. The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This Article explores the competing values inherent in data stewardship and makes recommendations for practice by drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.

Posted on November 9, 2018 at 6:04 AMView Comments

Are the Police Using Smart-Home IoT Devices to Spy on People?

IoT devices are surveillance devices, and manufacturers generally use them to collect data on their customers. Surveillance is still the business model of the Internet, and this data is used against the customers’ interests: either by the device manufacturer or by some third party the manufacturer sells the data to. Of course, this data can be used by the police as well; the purpose depends on the country.

None of this is new, and much of it was discussed in my book Data and Goliath. What is common is for Internet companies is to publish “transparency reports” that give at least general information about how police are using that data. IoT companies don’t publish those reports.

TechCrunch asked a bunch of companies about this, and basically found that no one is talking.

Boing Boing post.

Posted on October 22, 2018 at 8:13 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.