Paradoxes of Big Data

Interesting paper: "Three Paradoxes of Big Data," by Neil M. Richards and Jonathan H. King, Stanford Law Review Online, 2013.

Abstract: Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society's problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring to medicine to national security and crime prevention. But much of the rhetoric of big data contains no meaningful analysis of its potential perils, only the promise. We don't deny that big data holds substantial potential for the future, and that large dataset analysis has important uses today. But we would like to sound a cautionary note and pause to consider big data's potential more critically. In particular, we want to highlight three paradoxes in the current rhetoric about big data to help move us toward a more complete understanding of the big data picture. First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox. Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox. And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox. Recognizing the paradoxes of big data, which show its perils alongside its potential, will help us to better understand this revolution. It may also allow us to craft solutions to produce a revolution that will be as good as its evangelists predict.

EDITED TO ADD (10/11): Here's an HTML version of the paper.

Posted on September 26, 2013 at 6:58 AM • 47 Comments

Comments

BenSeptember 26, 2013 7:58 AM

Dangers of big data. See

IBM And the Holocaust



Only after Jews were identified -- a massive and complex task that Hitler wanted done immediately -- could they be targeted for efficient asset confiscation, ghettoization, deportation, enslaved labor, and, ultimately, annihilation. It was a cross-tabulation and organizational challenge so monumental, it called for a computer. Of course, in the 1930s no computer existed.

But IBM's Hollerith punch card technology did exist. Aided by the company's custom-designed and constantly updated Hollerith systems, Hitler was able to automate his persecution of the Jews. Historians have always been amazed at the speed and accuracy with which the Nazis were able to identify and locate European Jewry. Until now, the pieces of this puzzle have never been fully assembled.

(...)

Edwin Black has now uncovered one of the last great mysteries of Germany's war against the Jews -- how did Hitler get the names?

Z.Lozinski September 26, 2013 9:41 AM

Given both the topic of the paper, and most of the recent discussion on Bruce's blog, this error when I tried to download the full paper was .. ironic:

SSRN Alert

SSRN's Data Integrity System has observed an unusual download pattern
either from this computer's IP address or for this paper.

As part of SSRN's commitment to quality data, SSRN's Manager of Data Integrity investigates unusual download patterns to minimize system problems and identify attempts to corrupt or manipulate download statistics.

This notice can occur for a variety of legitimate reasons, including a user accessing SSRN through a proxy server or having problems downloading a paper.

You can avoid this Data Integrity message by allowing cookies or by signing in to SSRN HQ.

SoThatsHowItEndsSeptember 26, 2013 9:48 AM

Behind the collapse of the U.S.S.R. were a number of reasons, part of which was the inability to make use of information. Data about people, places, events, and other considerations, was essentially kept in the minds of the overlords on...shudder...paper. There were mounds of paper, rooms of paper, buildings full of paper. It was all useless, labor intense, inefficient, and could not supply what was needed - the ability to use that information (reference unavailable, from a book I read many years ago following the collapse).

But, the objective of all that paperwork at that time was the same as in the paper by Richards & King, primarily the Power Paradox. The transparency and identity didn't matter, the power put the fear in the hearts of man (then the transparency never mattered, there was little concern about identity of who was detained, sent to the gulag, or killed).

Today, we are faced with that same agenda, only now it seems more realistic in light of seemingly capable electronic data collection, storage, and analysis. But, is it?

There is a cry across the land about getting more people connected to the network (worldwide). They want the data!

There is a cry across the land about bandwidth being insufficient, which I believe is because in many locations, it can't handle the load of the data.

Today, there are acres of buildings being used to store the data, yet if this persists, that is not enough. Soon, I expect the ability to store the data will reach the same pinnacle, not enough space (including excessive costs and the inability to process it).

If the storage is potentially out of reach, then what of the analysis? Good grief, we are at the same point as all that paper in the U.S.S.R. It may be unusable.

This will push the limits of technology to acquire bigger, faster, and lower cost capabilities for the data collection, the storage, and the processing of all this data. And when a technology change comes along to make the next stop forward, what becomes of the "old" data? Is there time and space to bring it forward to the newer technology or does it just decay as those old piles of paper did? Witness the IRS using ages old technology not being able to keep up with its task. And, there is the push to keep such new technologies out of the hands of the common man so they are unable to "watch the watchers".

In all likelihood, we will simply run out of money to support this (gee, the same as in the collapsed U.S.S.R.) or just run out of reason for maintaining this questionable effort. Perhaps yet even in this century.

CuriousSeptember 26, 2013 10:03 AM

I don't like this paper one bit. I will have to finish reading it all and then come back to comment on it at some point.

Isn't It IronicSeptember 26, 2013 10:57 AM

A paradox is a statement that apparently contradicts itself and yet might be true. Wikipedia, Paradox

Paradoxes listed in the paper:

First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox.

This doesn't sound so much paradoxical as possibly hypocritical.
Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox.

I don't see a contradiction here at all: big data says there will be miraculous outcomes, but... it also seeks to identify? Those aren't opposites. Perhaps the latter is 'bad', and to some, not worth the former, but it's not a paradox.
And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox.

Again, not a contradiction. 'Big data' is powerful, perhaps too powerful, but there's no paradox here.

Based on only the abstract, it seems like the author is trying to sound clever by using the word 'paradox'. I think an actual paradox would be along the lines of, "Big data evangelists promise that advanced analytic techniques over big dataset will lead to better business decisions, yet we see companies unable to take decisive actions because there's always more data to analyze and more ways to cut the data that they're never satisfied." The author also ascribes motivations and points of view to 'big data' without defining what 'big data' is. Is 'big data' the companies that process and analyze large amounts of data? Is 'big data' the companies that sell products and services that allow others to analyze large amounts of data? Or is 'big data' the actual large amounts of data?

DanielSeptember 26, 2013 11:03 AM

There has been interesting discussion in the past few days on The Volokh Conspiracy about the role of complexity in government which has been given the truly horrible name of "kludgeocracy".

http://www.volokh.com/2013/09/25/...

One cogent point that has arisen in that discussion is that complex solutions to complex problems only seems to promote more complexity. This could be added to the list in the article and we shall call it The Complexity Paradox. To be clear, the complexity paradox goes beyond the claim that complex solutions create new or different or more problems than initially existed to stress that these new or different problems are more complex than before. Complexity seems to feed upon itself in an one-way ratchet whose only goal is more complexity and whose only reward is more complexity.

David LeppikSeptember 26, 2013 11:44 AM

Then there's the biggest problem of big data: AI just isn't as good as a human at finding patterns, if for no other reason than the fact that humans understand the context of the data.

I worked for a big-data-type company during the dot-com boom. Even with a big company's entire sales database, if we had enough data for a "discovery" to be statistically relevant, the pattern was already glaringly obvious to human experts. You simply don't get big insights from big data. At least not automatically. Your only hope is for a human look at the data in a context that wasn't previously worth doing. And that's mainly done by academic researchers.

What big data does allow is mass personalization: doing a level of analysis or tracking that you can't afford to hire a person to do. Movie recommendations or personalized advertising. But the whole suck-in-everything-and-hope-something-is-valuable mentality is just a way to generate more noise than signal.

Muddy RoadSeptember 26, 2013 11:45 AM

Whether it's paradoxical or hypocritical it still make me feel targeted and victimized.

Also, when the corporations and government gain power over the individual it's tyrannical and totalitarian.

Also, they get to decide whether the data collection is legal or illegal, moral or immoral. Clearly when the corporate/government regime takes data it's legal, by their standards, but if we the people take data it's a 20 year felony, maybe treason.

Frankly, what is being described is fascist police state tyranny in my way of thinking.

manwithnonymSeptember 26, 2013 11:46 AM

Bruce, everybody, there is something seriously screwed up going on with this website. It's not just "Reader" noticing weird shit going on, I cannot seem to access this blog from some kinds of anonymizers (time outs). It's been this way for probably at least a few weeks. Please investigate this strange goings on if you can because I don't have the technical ability to do so myself.

CuriousSeptember 26, 2013 12:06 PM

It looks like "Isn't It Ironic" above touched on the points I wanted to make. About what a given paradox might be and how things stop being perceived as being paradoxical.

I don't know the authors and if I did perhaps I would probably have had a favourable bias in reading their text, but I want to point out that is probably unfortunate that the authors actually points out how their text is supposed to be an essay. Somehow mixing in footnotes and leaving me to be puzzled by their motivations for writing this have this end up looking like some opinion piece oddly enough.

I suspect "Big data" here pretty much involves everything to do with how the internet works and how personal data is gathered, and with an emphasis on how companies and governments play a role in shaping things.

I am left with an impression that the authors aren't inclined to criticize the corporations and authorities as being "big data" entrepreneurs, and instead perhaps are inclinced to be more pragmatic about it all.

paulSeptember 26, 2013 12:23 PM

It seems anecdotally (!) that it's easier to use Big Data in ways that shift problems to someone else (insurance exclusions, no-fly lists, credit redlining...) than to use it in ways that fix a problem. Especially to shift the problem to someone who doesn't have the same data access and analysis tools.

Of course, we don't have enough Big Data about the uses of Big Data to really say what's going on. (Oh, and I think the transparency paradox is more than hypocrisy -- if everyone from whom the Big Data is being collected were aware of all the collection techniques, it's quite plausible that much of the data collected would become unreliable.)

privoxySeptember 26, 2013 1:01 PM

Bruce, everybody, there is something seriously screwed up going on with this website. It's not just "Reader" noticing weird shit going on, I cannot seem to access this blog from some kinds of anonymizers (time outs). It's been this way for probably at least a few weeks.

+1. Can't hit it from behind privoxy.

Wanda FishSeptember 26, 2013 1:02 PM

@manwithnonym, @privoxy I've had the same problems connecting using Tor. Changing identity in Tor eventually allows a connection to this site.

kashmarekSeptember 26, 2013 1:18 PM

@paul:

It depends on who uses the "big data" and how it is used. If it is being used for marketing, they only want to find targets that are easy to market to (some call those targets "prey"). However, I am inclined to think that the only part of the target they are after is your email or mail address. Thinking about what they should do is too much effort (and expense) on their part. Use of this data should (yeah, that's it, should) be using the data to identify those interested in a product with a decent likelihood of making a purchase.

On the other hand, it the user of "big data" is only after targets looking for "perps" or "un-subs" (potential perpatrators of crime or un-substantiated suspects of know criminal behavior), their use is biased to only looking for the bad stuff (and as a side line, ignoring, hiding, or deliberately discarding anything found that might free such targets).

Notice what is missing...any attempt to use "big data" to identify and "fix" known types of problems or problemmatic behaviour, such as who is dropping malware on computers, who is using such malware to illegally vacuum financial data from internet users, who is pushing pornography (or using it), illicit trades (such as slavery, controlled substances, pharmeceuticals, auction selling of stolen goods, money laundering, etc.)

Jason Richardson-WhiteSeptember 26, 2013 1:42 PM

On LinkedIn recently, I suggested that we need a concept of "data neutrality" parallel in crucial respects to that of "network neutrality". Just as society has an interest in requiring owners of big internet pipes not to sell speed to the highest bidder (thereby creating a permanently disadvantaged class of the slow, with feedback effects), so society has an interest in requiring owners of "large" datasets not to sell information giving the buyer privileges over those who cannot afford the same. Information asymmetry is at the root of most market failures, after all.

It could even be done from inside the free market. Suppose someone adopted the following business model: Internet Consumer, I will *pay* you to let me have access to your private emails, if you will willingly accept the implication that I will be selling your information to those who will attempt to impose their will on you in various ways -- in particular, to buy stuff. The idea here is to segment the market, causing a gap to grow between those who care about their privacy (and would therefore never be wiling to sell it) and those who don't mind the risks and would opt to be paid for the privilege.

The more general point is that it is *not* actually written in the stars somehow that, just because Google, Facebook, et al, can or have accumulated almost arbitrarily accurate, precise, and complete profiles on a given person, they therefore should be allowed to. The way that we architect society is still open to us, if we have the will to change it.

paulSeptember 26, 2013 1:55 PM

Jason Richardson-White:

That business model is already happening, just in a slightly covert and therefore inefficient manner. Google pays us by offering email, storage, search results and so forth. Facebooks pays us by allowing us, mostly, to stay in touch with friends and acquaintances. It's not a hard dollar amount, which precludes people from making thoughtful calculations about the value exchange, but it is an exchange of some kind of good/service for our personal information.

Sure, it might be better if that exchange were more transparent, but then the personal information would go all Heisenberg on them.

Someone from BulgariaSeptember 26, 2013 1:57 PM

@SoThatsHowItEnds

As someone that was born and lived almost all his life in a (former) communist bloc country, I'd disagree.

I don't think that the capability to handle information is among the (major) reasons for the collapse. Reasons were mostly economical, it was severely lagging behind, more and more dependant on western technology. industry very ineffective, populations starting to age, exports diminishing).

However, the regime did very well spying on everyone, collecting and analyzing data, in some aspects better than what todays agencies did. Just the conditions were radically different. The Darzhavna Sigurnost (State Security) basically had 1/4 or even more of the population basically working for them. It was the social model, since it was a communist, not a capitalist society, whatever you do, you'd get the same economic reward. But then, there were benefits and those benefits were accessible by collaborating with the DS reporting on colleagues, friends, even relatives, so people were motivated to help the state aparatus gather data. This was eased furthermore by the fact that most of the population lived (and still lives) in big commieblocks with poor noise isolation so you could basically hear what your neighbours talk loudly. People were also much more sociable (side-effect of the communism that was encouraged by the state).

Centralization is where today's 1984 fails. You see in COMECON, this data was not centralized and there was no need to process huge ammounts of data (paperwork), the state security had a huge apparatus and presence in basically all cities, towns, villages. Centrally it dealt with "important" cases only.

It's not that they did not have a huge central archive full of files. It is a curse BTW as to this very day, it is effectively used to hold say politicians under control. That's why I'm afraid that even if you succeed to overturn that growing totalitarianism, all that collected data will continue to be a curse that will haunt you for decades after that. But I wish you luck. From a historical perspective, we failed. Hungary failed, Romania failed. Czech Republic and Germany did much better, yet even they are haunted by that.

Jason Richardson-WhiteSeptember 26, 2013 2:07 PM

Paul,

I guess that is the point that I'm making. A person might actually have success by *selling* the transparency of their data collection & selling process, of what they're actually signing people up for. It would be a kind of tax on recklessness, though voluntary.

The question, then, isn't whether someone is already incentivizing risky internet behavior; as you pointed out rightly, they are. The question is whether someone could do it better, driving a more visible wedge between those who care about privacy and those who don't.

Jason Richardson-WhiteSeptember 26, 2013 2:12 PM

Uh, that was wrong, as I read it. Paying people to accept privacy risks is not "taxing" risky behavior, it's incentivizing it. Sorry. I was mentally comparing the situation with the lottery, which has a similar structure but with different valence.

Anyway, I think that the point is clear enough.

i'll take your coat -thank you- you're welcomeSeptember 26, 2013 2:25 PM

"Secret" 3G Intel Chip Gives Snoops Backdoor PC Access

vPro processors allow remote access even when computer is turned off

Paul Joseph Watson | Infowars.com | September 26, 2013

http://www.infowars.com/91497/

Intel Core vPro processors contain a "secret" 3G chip that allows remote disabling and backdoor access to any computer even when it is turned off.

Although the technology has actually been around for a while, the attendant privacy concerns are only just being aired. The "secret" 3G chip that Intel added to its processors in 2011 caused little consternation until the NSA spying issue exploded earlier this year as a result of Edward Snowden's revelations.

In a promotional video for the technology, Intel brags that the chips actually offer enhanced security because they don't require computers to be "powered on" and allow problems to be fixed remotely. The promo also highlights the ability for an administrator to shut down PCs remotely "even if the PC is not connected to the network," as well as the ability to bypass hard drive encryption.

"Intel actually embedded the 3G radio chip in order to enable its Anti Theft 3.0 technology. And since that technology is found on every Core i3/i5/i7 CPU after Sandy Bridge, that means a lot of CPUs, not just new vPro, might have a secret 3G connection nobody knew about until now,"reports Softpedia.

Jeff Marek, director of business client engineering for Intel, acknowledged that the company's Sandy Bridge" microprocessor, which was released in 2011, had "the ability to remotely kill and restore a lost or stolen PC via 3G."

"Core vPro processors contain a second physical processor embedded within the main processor which has it's own operating system embedded on the chip itself," writes Jim Stone. "As long as the power supply is available and and in working condition, it can be woken up by the Core vPro processor, which runs on the system's phantom power and is able to quietly turn individual hardware components on and access anything on them."

Although the technology is being promoted as a convenient way for IT experts to troubleshoot PC issues remotely, it also allows hackers or NSA snoops to view the entire contents of somebody's hard drive, even when the power is off and the computer is not connected to a wi-fi network.

It also allows third parties to remotely disable any computer via the "secret" 3G chip that is built into Intel's Sandy Bridge processors. Webcams could also be remotely accessed.

"This combination of hardware from Intel enables vPro access ports which operate independently of normal user operations," reports TG Daily. "These include out-of-band communications (communications that exist outside of the scope of anything the machine might be doing through an OS or hypervisor), monitoring and altering of incoming and outgoing network traffic. In short, it operates covertly and snoops and potentially manipulates data."

Not only does this represent a privacy nightmare, it also dramatically increases the risk of industrial espionage.

The ability for third parties to have remote 3G access to PCs would also allow unwanted content to be placed on somebody's hard drive, making it easier for intelligence agencies and corrupt law enforcement bodies to frame people.

"The bottom line? The Core vPro processor is the end of any pretend privacy," writes Stone. "If you think encryption, Norton, or anything else is going to ensure your privacy, including never hooking up to the web at all, think again. There is now more than just a ghost in the machine."

Facebook @ https://www.facebook.com/paul.j.watson.71
FOLLOW Paul Joseph Watson @ https://twitter.com/PrisonPlanet
---------------------------------------------
http://www.intel.com/content/www/us/en/...
http://news.softpedia.com/news/...
http://www.popularresistance.org/...
http://www.tgdaily.com/hardware-opinion/...
http://infowars.com/
http://prisonplanet.com/

AvayaSeptember 26, 2013 3:56 PM

How is this for remote access eh I mean product support...(about the new Kindle Fire from Amazon)...

Kindle Fire HDX also introduces the revolutionary new "Mayday" button. With a single tap, an Amazon expert will appear on your Fire HDX and can co-pilot you through any feature by drawing on your screen, walking you through how to do something yourself, or doing it for you—whatever works best. Mayday is available 24x7, 365 days a year, and it's free.

CallMeLateForSupperSeptember 26, 2013 4:21 PM

@ i'll take your coat -thank you- you're welcome
Post quoted someone named Stone:
"If you think encryption, Norton, or anything else is going to ensure your privacy, including never hooking up to the web at all, think again. There is now more than just a ghost in the machine."

So, my internet-facing, wifi-less, Pentium-D 'puter, which has a mechanical power switch, is immume to the 3G chip nastiness? Um-hum. Then prob'y the air-gapped 80287 system next to it is immune too. Thanks for clearing that up.

Good luck with your whiz-bang Core i-whatevers.

name.withheld.for.obvious.reasonsSeptember 26, 2013 4:27 PM

@Jason Richardson-White

Well said Jason, I believe you have stated and framed the issue of privacy in the Internet age. Oh how I long for the simpler days of gopher, WAIS, and Archie. That's just me waxing nostalgically when silicon dies could be predictable. These kind of statements need to be more obvious in the publics mind. It is the perception of the public that will make or break the civil contract that is the tenuous relationship between the individual, commerce, and the government.

I appreciate your sentiment.

CarpeSeptember 26, 2013 5:10 PM

Big data is largely a smoke screen. All big data means is lots of information, data, in storage. There are tons of problems and issues surrounding this, and though they may be related I think they are not as intertwined as some may think. So while yes, raid is quickly becoming hard to handle in with large amounts of data, ZFS and maybe HAMMER/BTRFS are stepping in. (hadoop etc too) But that's just storage. The bottom line comes down to gathering actually useful and correct data from the data. It's the same problem with genomics these days. Suddenly it no longer costs 1mil to get a whole genome, only 1-4k, but if you actually want meaningful data out of all those A, T, C's, and G's, it's all about the software that makes that data meaningful.

When it comes to surveillance, we need to stop pretending like they are using big data as some big prediction engine. They aren't. It's used as a device of target and control. If it ever becomes a working system like the one we pretend it is, everything will be lost by then. You will know it is "working" when pre-crime becomes reality.

MingoVSeptember 26, 2013 5:52 PM

If you want to turn big data into information, you have to ask a question that is likely to be answerable by the data. You then can use tools such as neural network-based models to see what components of the data give the best answer. But, to train such a model, you need to already know the answer for a subset of the population. So, if your question is "who is likely to be a terrorist?", you cannot get anywhere unless you already have data on a large sample of known terrorists. If you lack this information, then your massive pile of data is worthless for answering that question.

The NSA, for example, has exceedingly big data that is worthless for prospective analyses. Its only value is retrospective: after the bombs go off and the terrorist is caught, the NSA can review all its data and learn that various government agencies over the past six months missed half a dozen opportunities to identify the terrorist before the act. This retrospective review will be of no value in future terrorist attacks.

SmithSeptember 26, 2013 6:52 PM

@i'll take your coat -thank you- you're welcome

"Secret" 3G Intel Chip Gives Snoops Backdoor PC Access

vPro processors allow remote access even when computer is turned off

This is a good example on the sort of nonsense these discussions eventually generate. As to this particular case it should have been obvious to the poster that a computer that is turned off is disconnected from its power source. In that state any embedded 3G-chip would be useless.

A similar situation developed with any discussions that attempted to (for a good reason) question the official 9-11 story: fantastic claims without merit that only served to discredit such discussions.

ScottSeptember 26, 2013 7:09 PM

@Smith

A computer that is turned off is not disconnected from its power source. The idea of a 3G device powering on your computer is no different than Wake On Lan.

murraySeptember 26, 2013 7:46 PM

@Scott

"wake" implies that it is only snoozing.
Smith's assertion was that "off" means "OFF".

I think that people may have different concepts of what "off" actually means. I blame the "standby" switch on TV's.

Dirk PraetSeptember 26, 2013 8:22 PM

@ i'll take your coat -thank you- you're welcome, @ murray, @ Scott, @ Smith

Secret 3G Intel Chip Gives Snoops Backdoor PC Access

2nd generation Intel Core vPro processors with Anti-Theft 3.0 technology allow out-of-band locating, bricking and unlocking of a machine by sending it a simple text message (SMS). See https://www.youtube.com/watch?v=GBAo1vkFFGE for a demo of that feature.

There is however no evidence for a claim of Wake-on-3G by an alien zero power consumption chip when the machine is fully powered off and disconnected from its power source. Unless @ Clive has recently invented one, I call felgercarb.

FigureitoutSeptember 26, 2013 8:55 PM

Dirk Praet
--I wouldn't be so quick to call felgercarb. I've experienced something eerily similar; of course I haven't been in real control of my pc for pretty much most of its life b/c there's too many peripherals and I can't even trace all its activity by hand, impossible. The people deploying it don't know how it works but it was a tool given to them by someone who does, and I want to find them. Not sure if it was an entire OS but basically out in "tha country" in between "normal boots" what looked like windows XP booted up. Either that or a hidden wifi router is nearby; or there's another peripheral that is unknown to me...such untrustworthy hardware.

I also wouldn't discount the possibility of hidden batteries or some sort of hack on the battery microprocessor.

amanfromMarsSeptember 26, 2013 11:40 PM

Great IntelAIgent Mind Games are AI Leading IT Systems.

There is no escaping the following inexcusable fact[s] which be also Registered and shared on this informative thread ..... http://forums.theregister.co.uk/forum/1/2013/09/...

There's a very fat line between should not and is not.
It's not a "sysadmin" problem, it's a human problem.
…. John Smith 19 Posted Thursday 26th September 2013 18:59 GMT

Quite so, John Smith 19

The human component link in any and all Command and Control SCADA chains is always the weakest and simplest and most remarkably easy to break as in hack and driver protocol crack and re-engineer/reverse engineer. IT aint rocket science and difficult.

Clive RobinsonSeptember 26, 2013 11:58 PM

@ Dirk Praet, Figureitout,

    Unless @ Clive has recently invented one, I call felgercarb.

Not an unreasonable position to take for a number of reasons. :-)

Two obvious problems being power source and antenna structure another being the equivalent of a SIM.

Whilst it would be possible to hide an antenna on the PCB the many PCB designers would have to "be in on the secret" one way or another. Whilst I would not rule out an on chip antenna it would be very ineficient. Likewise it coupling capacitivly etc to some kind of cooling structure.

If the antenna is inefficient this reflects back into field strength issues which effect receiver sensitivity and transmit power. The latter having a significant effect on power source issues, esspecialy with the requirments for EMC giving rise in most cases to metal surounding the computer motherboard. Again there might be "slot radiators" but these would have to be of certain dimensions to be efficient and the issue of "in on the secret" arises again this time with case designers.

Whilst I would not rule out powered down trickle feed from the mains PSU as used with network cards I would be quite skeptical of "hidden batteries". Even the best of battery designs have very poor power density so with the antenna ineficiency you would be looking at hiding a quite large --in comparison to other components-- volume. And again there is the "in on the secret issue".

Then you need to consider how 3G networks work and the requirments of the FCC and other national approvals bodies work.

All in all I do not think all the pieces are in place in general computer hardware, and whilst I would not rule it out for the likes of DELL doing it for high end servers, it still leaves to many "in on the secret issues".

Whilst I see no reason for Intel to not put 3G radio components on a chip for laptop / netbook / pad systems integration you still have to "join the dots" for it all to work.

But as it's been suggested it would not take to much effort to track the "in on the secrets" issues down.

But if you think back to the Sony Root Kit and CarrierIQ they were found by researchers due to "tell-tale"symptoms and they were only possible because all the hard ware dots had already been joined together.

Are the symptoms there but have sofar been missed or as in DRNG standards missed?

I don't know, paranoia is rife at the moment, I kind of expect things to be noticed. Some will be real some will be faux where dots have been joined in peoples heads not in reality. Either way I now expect some aspiring researcher to go looking because it would make their name well known in the industry if they found an example. And if found I would kind of expect it to be in a server or portable device not a desktop, and to have been designed to a finished product by a single organisation like DELL.

WaelSeptember 27, 2013 1:10 AM

@ Clive Robinson, @ Dirk Praet, @ Figureitout

Unless @ Clive has recently invented one, I call felgercarb...

Such functionality can exist without the need for a 3G SOC. Manufactures cannot afford pennies, they would rather do this in software. However, an antenna can be hidden, some people will know, but so what? The problem I see is that some servers are well shielded and located in areas without cell coverage. There are also micro strip line antennas that are nothing but traces on the pc board, a layout engineer can be directed to put such an antenna in a suitable place embedded at one of the inner layers of the pc-board. It can even be an array of antennas with phase controlled beam steering :) it will not look any different than some of the traces. Only an RF engineer would recognize it if it's on the outer layer. Take an x-Ray of the board, and you may see the inner layer antenna. I think @ RobertT would detect that in a heart beat, but how many of him are there?

FilbySeptember 27, 2013 1:14 AM

@Clive

Two obvious problems being power source and antenna structure another being the equivalent of a SIM.

Whilst it would be possible to hide an antenna on the PCB the many PCB designers would have to "be in on the secret" one way or another. Whilst I would not rule out an on chip antenna it would be very ineficient. Likewise it coupling capacitivly etc to some kind of cooling structure.

Looking at the YouTube video linked by Dirk Praet it seems like they must already have an antenna on the chip? After all the OS is not running when the device is receiving those SMS messages.

The laptop in that video seems to be turned on (at least near the end when the laptop is again enabled) though. So perhaps it does _need_ to be "on" (or at least have the main battery plugged in even if it now is in some kind of "sleep" or "hibernation")...

So here they supposedly have built an antenna on the chip to receive those 3G SMS messages. And if that is possible, would it then be possible to make the antenna get its power from the 3G network itself?

(I would not think the power from this would be sufficient for much more than processing the msg though. The system would have to attempt to turn on the main battery in order to use e.g. the HD. But then again modern "sleep" states are not really "zero code execution" states anyway.)

BobSeptember 27, 2013 1:19 AM

re vPro:
A lot of Dell laptops have built-in modems and SIM card _slots_.
My understanding is that the user (corporate customer normally) needs to install a (paid) SIM in these laptops in order to use these features.

BenSeptember 27, 2013 4:00 AM

A SIM is nothing special. Phones work without SIMs - you can make 911 calls without a SIM.

The purpose of the SIM = Subscriber Identity Module is to allow easy switchout of hardware, including failed hardware and to allow the functioning of a secondhand market in handsets.

Secondly, a SIM is nothing special. It could be faked up in software.

The question is: Can this only be activated after the adversary already has code-execute access to the machine? Or can it be activated, without a SIM, from a faked up femtocell? I call "Yes" to the latter. Mobile phones without sims can make 911 calls. Do they remain dormant until you make the call or do they maintain contact with the base stations just in case? If the latter, it's game over.

The antenna could be in the chip package itself. It's big enough.

Clive RobinsonSeptember 27, 2013 4:25 AM

@ Bob,

A SIM is not required for a GSM phone to connect to a network, but it is most definatly "normal" for anything other than calling emergancy services.

In theory the basic parts of a SIM could be built into the chip BUT the GSM standard has a minimum "re-registration" time atthe end of which a phone is required to talk to the network or go into "out of range" or "disconeted" mode (ie equivelent of off or aircraft mode).

BUT without a network applicable SIM it would be in the equivalent of "roaming mode" which is generaly expensive, and to be in "roaming mode" the foreign network center would have to exist to do key exchange etc (see the GSM spec for this as it's long and complicated). So the chip supplier or motherboard seller would have to set one up (or out source) which is what these "low cost-call abroad" companies do. All of which leaves a very visable foot print in the SS7 routing directories of all the network service suppliers.

So far from secret but could easily be obscure to the point of being unworthy of notice. There are a lot of "private network" companies ranging from traffic light control, fleet managment, electronic hording advertising companies, emergancy services organisations, electronic book suppliers, etc etc.

@ Filby,

Antennas are usually pasive and would be an etched line or slot in a plain or even patch on the PCB traditionaly the dimensions relate to the wavelength in use. However due to the issues to do wih the disparate bands around the world antennas for mobile phones have stated to use "fractal designs" which end up with a line/slot/pad/plane with funny folded up shapes, the more kinks and runs generaly the broader band they are. Antenna theory is complex and even now with modern modal analys still somewhat of a black art.

However whilst an RX antenna can be any old bit of bent wire or slot between two metal planes as a VSWR of 20:1 makes little or no difference to effective performance a TX antenna generaly needs to be 3:1 or better match to the transmitter output stage and likewise the antenna needs to be a reasonable match to the medium (often "free space") it is radiating into. Which means there are definate constraints on how it would be built into a PCB.

In the unlikely situation it was on chip it would be electricaly short which would make it very low impedance well well below "free space" which calls for all sorts of constraints to get it efficient and not draw large amounts of current that would be wasted via IR losses as heat. Loop and Spiral antennas are suitable but tend to be highly directional and cursed by coupling into adjacent conductors causing all sorts of problems. Either way they are going to chew up expensive silicon real estate.

The general solution in lapops is to put the antenna somewhere it can "see the world" and up the sides of the display has been used.

I've not watched the video as I'm away from base currently but I susspect 3G is a "built in" option for the range it comes from and thus the "dots are joined" and all we are realy talking about is "wake on LAN" behaviour slightly modified for 3G use. Which begs the use of the 3G OTA SIM updating in the design. We know from previous court submissions that LEOs have "subverted the process" to force certain behaviour like re-registering to a fake network the LEOs run to force "unencrypted" operation for "wire tapping". So many things are possible including passing "command line boot options".

@ Wael,

In a modern laptop from a branded manufacture like DELL I would expect the dots to be joined.

Basicaly you would design the PCB and mechanical construction for the top of the range product, and in the past only populate what was required for the model line. However with SoC design these days the number of components you could leave off would be minimal and any cost savings by doing so would be totaly swallowed up by increased inventory/test/production costs. So they sell you the top of the range model with the high end features disabled by a hidden BIOS setting or equivalent.

As I said in a server or laptop/notebook/pad/slab from a branded supplier I would expect the hardware dots to be joined and 3G has OTA programing of the SIM micro computer (often a Java engine with poor encryption/ authentication).

So yes it boils down to how the equivelent of how "wake on LAN" functionality works.

It's why I don't use Branded servers, desktops or laptops/notebooks/pads/slabs for any security related work, because the days of "inbuilt WiFi" signaled the death knell of Air-Gap security years ago. As I've said before a lot of my secure stuff is done on off the shelf components and importantly programable microcontoler development boards, and aging "desktop" hrdware including 50MHz 486SX boards as terminals runing odd versions of old OSs etc that don't have inbuilt network stacks or windows, oh and no hard disks or writable CD drives.

John CampbellSeptember 27, 2013 10:23 AM

A long time ago under an employer far, far away...

"With spreadsheets we no longer need to kill millions to turn any man into a statistic".

It seems I didn't consider "Big Data" (or is that "Big Dada"?) at the time.

TRXSeptember 27, 2013 12:38 PM

> What became of the data stored on tape?

A lot of it is thrown away or allowed to rot. Most famously, much of the data from the early space programs, both US and Soviet.

And of course, some of the early episodes of "Doctor Who"...

RobertTSeptember 27, 2013 5:56 PM

We all expect WiFi and BlueTooth functionality on any laptop we buy, and even desktops and servers support Bluetooth keyboards so the suitable RF components Antennas, PA's and LNA's are already present.

. So we more or less have what is needed all we have to add is the ability to retune the 2500MHz antenna/Mixer to 2100Mhz where most countries are deploying 3G services (1800 MHZ and 1900Mhz would also be useful but probably not essential)

Probably the Mixer is the most difficult part, however if we really dont care about efficiency, such as would be the case if we only do intermittent Transmit and even then only when the Rx RSSI (Receive signal strength) is excellent, then we can simply sample the 2100Mhz RSSI until the PC happens to be near a tower (or pico cell, possibly rouge pico-cell).

OK so if my system could reuse BT hardware and antennas, only work at 2100Mhz and only function intermittently when close to a basestation then I believe I could hide this functionality so that its power use signature did not immediately give away the underlying functionality.

Even complex 3G HSDPA phone systems have built in fall back to support simple modulation I believe QPSK is the most basic modulation supported by 3G systems so generating the comms signal modulation is not very difficult and it is similar to whats already used in Bluetooth.

Bottom line: Definitely a doable task, doing it surreptitiously is a little harder but since the comms links to the BT chip are already there its not so hard. It would really just require a special mode in the chip to support 2100Mhz operation. Personally I'd hide the function in the specification of the system filters by intentionally locating a filter lobe at the 2100Mhz point.


If I had to do this without reusing existing RF comms hardware, it'd be possible but creating / locating the antenna in a suitable location is definitely a hard task, I'd be tempted to use something like the laptop display as a phase array antenna. One of the things to remember is that modern computer PCB's are typically not designed by manufacturer rather they are supplied by the chip vendor or developed in cooperation with the chip vendor. So intentionally adding a correctly tuned 21000Mhz PCB antenna would not be something you as the chip vendor ever really needed to acknowledge. You simply supply the Gerbers to the manufacturer and tell them to use AS-IS. This is very common practice for critical signals around the CPU.

FigureitoutSeptember 27, 2013 6:52 PM

RobertT
--I don't expect wifi or bluetooth, I want to desolder it off the board. This makes me weary of the ubiquitous wireless.
In-car infotainment services such as news, weather, social networking and music streaming--It's gross fanboism/trendy...

BigRedSeptember 30, 2013 2:26 PM

I find it amusing to read that the paper presents this as a new discovery. The Data Mining community, i.e. the people whose 20-year long academic research is currently simplied, hyped, and sold as "Big Data revolution", has identified at least "paradoxes" 1 and 2 and has worked for the last couple years on "Privacy-aware/Privacy-preserving" Data Mining/Knowledge Discovery.
Now, I personally remain convinced that these efforts need legal flanking and even then will likely fail, but it's not as if this is a new and unexpected problem.

For a data miner's perspective, see here (.pdf). Some of the things she has to say might also relate quite strongly to @Jason Richardson-White's concerns. Admittedly, none of those won't work without a legal framework though.

---

Another point:

@David Leppik:

Then there's the biggest problem of big data: AI just isn't as good as a human at finding patterns, if for no other reason than the fact that humans understand the context of the data.

Well, humans are horrible at spotting patterns and very good at misidentifying them :) But yes, context is important, and even more important is knowing which question one wants to ask, as @MingoV points out.

However, what he describes is supervised modeling and unsupervised modeling, e.g. outlier detection or building models of "normal" vs "abnormal", is also very much in existence. The latter, however, is mostly hypothesis generating, which brings us back to the humans who need to take a look at the results, direct the next steps, or take things into the real world to check them against reality.

vas pupOctober 2, 2013 11:20 AM

May be same protection for laptop computer being turned off against unautorized access/activation as for smart phone: Faraday Cage but larger size than for phone. When work offline, Faraday Cage should have size of the whole room to prevent any SIGINT collection from your computer.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc..