Clever System of Secure Distributed Computation

This is really clever:

Enigma's technique -- what cryptographers call "secure multiparty computation" -- works by mimicking a few of the features of bitcoin's decentralized network architecture: It encrypts data by splitting it up into pieces and randomly distributing indecipherable chunks of it to hundreds of computers in the Enigma network known as "nodes." Each node performs calculations on its discrete chunk of information before the user recombines the results to derive an unencrypted answer. Thanks to some mathematical tricks the Enigma creators implemented, the nodes are able to collectively perform every kind of computation that computers normally do, but without accessing any other portion of the data except the tiny chunk they were assigned.

To keep track of who owns what data -- and where any given data's pieces have been distributed -- Enigma stores that metadata in the bitcoin blockchain, the unforgeable record of messages copied to thousands of computers to prevent counterfeit and fraud in the bitcoin economy.

It's not homomorphic encryption. But it is really clever. Paper here.

Posted on July 3, 2015 at 6:38 AM • 23 Comments

Comments

martyJuly 3, 2015 9:46 AM

@John Galt III - why would you say such a thing? Is it because you don't understand the math? Crawl back to your cave.

JeffJuly 3, 2015 1:25 PM

Please, someone explain how a "computation" (which the paper says could be a database lookup) works on encrypted data, and just a piece of the data. For example, if the database contained employee records, how would the computation find employees of a particular age? How would the nodes even know the file format or schema of the database?

dbmJuly 3, 2015 4:13 PM

I have to wonder about whether this kind of massive distributed computing offers more opportunities for DOS attacks than otherwise? The security and privacy would remain, but it might become impossible to make progress toward goals?

Clive RobinsonJuly 3, 2015 5:07 PM

@ Jeff,

Please, someone explain how a "computation" (which the paper says could be a database lookup) works on encrypted data, and just a piece of the data.

By simple comparison, as normal.

Simplisticaly, if the fields in any given DB record are encrypted independently of the others this is not a hard problem to solve as no mater what the encryption algorithm it acts as a substitution cipher.

For things like an age range, you just compare with all the substitutions in the date range of interest.

The "computation" does not even need to see the "whole record" just the table for the required comparison. The computation then just returns the positions in the table where the comparison has succeeded.

If you use a "stream cipher" that is additive across the entire field (ie not XOR on bits within the field) it's posible to do addition and subtraction on a limited portion of a record. Likewise "bit fliping" for changing the state of flags in a bit based XOR stream cipher.

The important thing to note is that with field not record encryption you don't need to send the entire record just the field table to be computed on. With the addition of a couple of other simple measures you can show that the likely hood / probability of data leakage is quite low.

CuriousJuly 4, 2015 2:15 AM

I guess the following is far fetched and for someone like me that doesn't really know much about computer networking I still can imagine that I wouldn't want to see some future networking "standard" based on this, that in addition to simply help transferring data around the globe, that it also is labeling (injecting meta-data )and recording everyone's connections and transfers live. As if some kind of advanced program for super data retention, not only storing traffic data, but also including anything NSA have a label for so to speak. Or even, secretly tagging data associated with a connection, and secretly tagging the data having been stored everywhere.

What if there was a way to secretly tag data, and then, the only way to unscramble that encryption is to be an ISP that by some quantum computer magic, shuffles bazillion pieces of data, and then have the ability to decrypt some damning meta-data associated to someone or something?

Also, I can imagine it would be equally in my future scenario, if labeling and recording were only enabled for a short period, or for a few.

CuriousJuly 4, 2015 2:21 AM

I could imagine there was a future scenario, in which everything was encrypted, but as a sole computer user, there would be no way to unscramble anything of the data in your computer or on your network.

I was fantasizing some time ago (re fully homomorphic encryption), that it would be bad is you had to relate to some data having been encrypted and you would have no idea if there were layers of encryption that you had no control over; even though you owned or stored some piece of encrypted data having been encrypted with some fully homomorphic encryption system.

CuriousJuly 4, 2015 2:34 AM

Hm, I am wondering now if maybe the futuristic system I fantasized about might in ways be analogous to how Tor (Onion routing system) works.

ThothJuly 4, 2015 8:01 AM

@Curious
If the protocol for data is not satisfying, use another one. The endpoints of all networks resides in machines and network is simply machine to machine interaction. If you can secure the endpoint so that you have some kind of publicly shareable routing network like TOR or Enigma on one segment and also your own controlled network protocols on another segment, you would effectively have a segregation capability ... that is assuming you have control over your endpoints.

DanJuly 4, 2015 1:04 PM

@ Bruce
By my read the gist is an existing homomorphic scheme. Not to say there isn't some value in the layering on top of and around block chains, DHTs, etc.

It looks like the actual computation part is supposed to leverage some "Linear Secret-Sharing Scheme" which is additively homomorphic.

They build on or at least use a concept by a 2013 paper (Cohen et al [20]) to reduce the communication overhead from O(n^2) to ~ O(n), which seems to be a critical part in making such a system scalable.

NateJuly 4, 2015 10:11 PM

@Clive Robinson:
"Simplisticaly, if the fields in any given DB record are encrypted independently of the others this is not a hard problem to solve as no mater what the encryption algorithm it acts as a substitution cipher.

For things like an age range, you just compare with all the substitutions in the date range of interest."


And by doing that, haven't you just triggered a known-plaintext attack against yourself?

Assume that a compute node is going to receive a lot of different encrypted fields and encrypted values to test against. If each field is encrypted independently, and say an age field might contain '25-35' = 'sjhfyw', then eventually you're going to receive 'sjhfyw' from multiple places and in multiple contexts.

It seems like a compute node could do traffic/network analysis and work out what encrypted values are associated with others, and work from there to try to build up a map of what the whole database looks like.

This kind of distributed blind computation is intriguing to me, but I'm worried that we might be assuming much more security than actually exists.

NateJuly 4, 2015 11:28 PM

From the Enigma white paper:

"Code evaluated in our system is guaranteed not to leak any information unless a dishonest majority colludes."

And the time from launch to the US security-computational complex becoming the majority host of Enigma nodes will be measured in [days|hours|seconds]....

DanielJuly 5, 2015 5:05 AM

@Jeff, July 3

While no proof, this aims to briefly indicate how computations on encrypted data might be possible:

Basically, any running software fundamentally consists of processor instructions (MOV ADD MUL etc). A processor instruction can fundamentally be described using a set of logic gates (AND OR NOT etc). Using substitutions (AND -> NOT OR etc) any logic gate can be replaced with a set of NAND (or NOR).

Thus, any software can basically be expressed as a (very large and complex) set of NAND operations. Basically, all we need then is a cipher such that we can perform NAND operations on encrypted data. This means we must be able to compute the encrypted result of a NAND operation on encrypted inputs.

While far from trivial, if we can solve this problem, we can do any computations on encrypted data.

NateJuly 5, 2015 8:53 PM

@Daniel:
"Thus, any software can basically be expressed as a (very large and complex) set of NAND operations. "

NAND operations *plus* compares, surely, for useful computation?

And at the point that you introduce a compare, wouldn't you have to give the untrusted compute node a pre-encrypted value that you're comparing against? And by doing that, aren't you feeding your attacker a whole lot of crib texts? Potentially as much as your entire encrypted computation? (Depending on what percentage of operations involve compares).

My worry is that over time, given enough of these, a sufficiently competent attacker running a sufficiently large subset of nodes (not even a straight majority) would be able to figure out what it is that they're processing for whom - or at least what the general 'shape' of the database is, how each tiny part relates to the whole. Same basic principle as network analysis on encrypted transmissions.

I'd love to see an analysis that explains why this isn't a problem, but I can't even parse what the Enigma white paper is saying about its core crypto primitives (if indeed it's saying anything).

That and that Enigma relies on Bitcoin, which seems fundamentally unsustainable (for energy cost and data storage reasons) as well as a classic example of how a distributed system that *seems* theoretically secure can in fact collapse with terrifying speed into a small group of oligarchs controlling all the computation. This makes me question the good judgement of the protocol designers.

Clive RobinsonJuly 6, 2015 3:40 AM

@ Nate,

NAND operations *plus* compares, surely, for usefu computation?

Four NAND gates can be configured to be an XOR gate, which is what many use not just for compare but a half adder which forms the guts of the various full adders.

But I suspect that you did not want your lead in comment read that way (because it happens to me from time to time).

Cryptographicaly the problem of compares leaking information requires a bit of sideways thinking.

The simple case is a single value match in a field reveals four things, firstly the encrypted value of intrest to the searcher, secondly the number of matched records, thirdly the returned record identifier and consequently the relation or order of those records with respect to each other.

Now think of a search on a range of numbers, you get a number of encrypted search values, that have some relationship to each other. Thus the number of encrypted values gives not just an indicator on the search being carried out but also the resolution of the data stored in the database prior to encryption.

The limit on what can be determined is thus actually based on the independence or lack there of of the compute nodes.

Which brings up the question of how you ensure the independence of the nodes, one aspect of which is communications. Whilst you can use a different node for each compute function, the network by which you communicate is still visable to some extent. Thus even if you encrypt the communications to the node, an observer can still do traffic analysis...

Back in the 90's when I was looking for an intresting research problem for a PhD, I looked into how to distribute a large DB for multiple geographically seperated clients. I hit a snag, when I mentioned the security aspects (the bit that interested me) I could not find a "reader" that new sufficient about both subjects... so it was not possible to research into that area... I later had a social chat with a reader at the London School of Economics who was also a reader, and they laughed and said "You have seen first hand academic masturbation at work". They went on to describe the problem and pointed out that by far the majority of PhD students were in effect unpaid lab rats, doing research for their supervisor... Thus the "happy way" to a PhD is to find a reader / supervisor who you can get on with and be their lab rat.

John Galt IIIJuly 6, 2015 2:55 PM

crawled out from under my rock and realized that I have better links to the urbit concept. the description that I link here is on point and would have required some diligent effort to reach from the inferior link that I posted before. while I don't grasp the urbit concept in whole, I can speak with authority on the fact that Popehat are nothing less than totally awesome.

http://popehat.com/2013/10/03/what-is-urbit/
...
In short, Urbit is ... an operating system, a programming language, and a formal virtual machine.

http://popehat.com/2013/12/06/nock-hoon-etc-for-non-vulcans-why-urbit-matters/
...
No, there are times when a man must spit on his hands, hoist the black flag, and hoist the non-compatibility flag that forever demarcates the Ancien regime from the new.
...
Every Nock program is a tree, or a pyramid. Every subsection of the tree is also a tree. Meaning that each subsection of a Nock program is a smaller Nock program that can operate on any machine in the world, at any time, without caring what the phase of the moon is. Meaning that a Nock program can be sliced up with a high carbon steel blade, tossed to the winds, and the partial results reassembled when they arrive back wafted on the wings of unreliable data transport.
Nock programs – and parts of programs – operate without side effects. You can calculate something a thousand times without changing the state of the world. Meaning that if you're unsure if you've got good network connectivity, you can delegate this chunk of your program not just to one other machine, but to a thousand other machines and wait for any one of them to succeed.
Nock supports and assumes full encryption of data channels, so not only can you spread computation across the three machines in your home office, you can spread it across three thousand machines across the world.

some recent history regarding the author of the urbit concept
Popehat
A Group Complaint about Law, Liberty, and Leisure
...
http://popehat.com/2015/06/10/two-kinds-of-freedom-of-speech-or-strangeloop-vs-curtis-yarvin/
...
Your circuit's dead, there's something wrong. Can you hear me, Alex Miller?
Despite several tweets asking for confirmation, Alex never responded to me. (Or at least that's my belief – I checked my mentions closely, but it's possible that a response slipped through.)
However the next day I saw a link being tweeted around; Alex, it seemed, had finally responded.

http://moronlab.blogspot.com/2010/01/urbit-functional-programming-from.html
...
Back in the early days of the internet when Usenet was cutting edge, there was a gent by the name of Timothy C May who formed the cypherpunk mailing list.
His signature block at the time read
Timothy C. May, Crypto Anarchy: encryption, digital money, anonymous networks, digital pseudonyms, zero knowledge, reputations, information markets, black markets, collapse of government.
I bring up his sig block because in list form it functions like an avalanche. The first few nouns are obvious and unimportant – a few grains of snow sliding. The next few are derived from the first in a strict syllogism-like fashion, and then the train / avalanche / whatever gains speed, and finally we've got black markets, and soon after that we've got the collapse of government. And it all started with a single snowflake landing at the beginning of the sig block.
Timothy C May saw Bitcoin. He saw Tor. He didn't know the name that Anonymous would take, and he didn't know that the Dread Pirate Roberts would run Silkroad, and he didn't know that Chelsea Manning would release those documents. …but he knew that something like that would happen. And, make no mistake, we're still only seeing small patches of hillside snow give way. Despite the ominous slippages of snowbanks, Timothy C May's real avalanche hasn't even started.

NateJuly 8, 2015 7:52 PM

@John Galt III:

Urbit is an interesting *idea*. Or at least contains a number of interesting ideas, as well as what I consider to be some very poor ideas, all mixed together. I don't really think it's especially good in the mixture that currently exists however.

The good bits, imo:
* Trying to define 'a functional assembly language for the Internet' is a really great idea and we should do it eventually.
* A global namespace for function-level cloud computation is excellent, we should do that, but do think about security.
* Encryption everywhere: a good idea, but how does it actually work in practice?
* The selection opcode for picking a single cell out of a binary tree based on an integer: quite nice, simpler than Lisp's loads of cars/cdrs/caaadrs etc.
* Inventing new programming languages is a lot of fun and a great learning experience.

The bad bits:
* Putting the history of all prior computations on a blockchain. Seriously, lose this obsession with the blockchain. It's the technically *worst* part of Bitcoin, not the best. Grows forever, enforces serial computation on a fundamentally parallel process, creates massive inefficiency, destroys privacy, and doesn't buy any actual security in the end.
* A serialisation syntax for binary trees that isn't compatible with S-expressions. Seriously? I don't see it as an improvement, sorry.
* Using ASCII symbols instead of words for names of functions: can't see a win there. Just use the standard names for standard functions defined in the rest of the functional language community, job done, and build your actual new ideas from there.
* 'Jets'. Hahaha nope. You've built a functional language from first principles (known to all of computer science since the 1950s) and now it's too slow and you're going to bypass it completely? That's exactly what the rest of the computing industry (that you're going out of your way to avoid reproducing) did and that broke our security model. See also: Windows graphics drivers in the kernel, the Linux monolithic kernel, and the Java web client. The correct answer: Write and prove correct a compiler IN your safe functional language and then compile.


The weird bits:
* Neologisms for EVERYTHING. Doesn't really help.
* One-syllable neologisms for well-known ASCII special characters that already have well-known names. Cute, maybe useful, but obscures communication and shouldn't be mandatory.
* Strange ad-hoc mechanisms for rebuilding the entirety of functional programming from list manipulation primitives: possibly interesting, maybe worth research, but so obscured it's hard to know what's even being said let alone if it's self-consistent and works in the wild.

GregProgJuly 15, 2015 11:22 AM

I wonder whether encryption (or anything else) is sufficient to protect the individual records in a database on which statistical queries can be made. I'm sure someone proved 3 or 4 decades ago that it is always possible to reconstruct individual data values from a database by using the results of a crafted set of statistical queries. Sorry, details lost in the mists of time. This is very similar to the business of de-anonymising "anonymous" data, which has been more recently in the news.

What this leads to is that if you want to protect individual data within a database while still allowing statistical queries then you need to do something more. For example, you mask the true values by introducing errors that tend to cancel out in most statistics, but mess up the trick of reconstructing the original values.

My guess is that some encryption or other direct protection of the database is necessary, even though not sufficient. So the techniques discussed here are valuable, just not the whole answer.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.