Schneier on Security
A blog covering security and security technology.
« Spanish Police Foil Remote-Controlled Zeppelin Jailbreak |
| The ATM Vulnerability You Won't Hear About »
July 9, 2009
Homomorphic Encryption Breakthrough
Last month, IBM made some pretty brash claims about homomorphic encryption and the future of security. I hate to be the one to throw cold water on the whole thing -- as cool as the new discovery is -- but it's important to separate the theoretical from the practical.
Homomorphic cryptosystems are ones where mathematical operations on the ciphertext have regular effects on the plaintext. A normal symmetric cipher -- DES, AES, or whatever -- is not homomorphic. Assume you have a plaintext P, and you encrypt it with AES to get a corresponding ciphertext C. If you multiply that ciphertext by 2, and then decrypt 2C, you get random gibberish instead of P. If you got something else, like 2P, that would imply some pretty strong nonrandomness properties of AES and no one would trust its security.
The RSA algorithm is different. Encrypt P to get C, multiply C by 2, and then decrypt 2C -- and you get 2P. That's a homomorphism: perform some mathematical operation to the ciphertext, and that operation is reflected in the plaintext. The RSA algorithm is homomorphic with respect to multiplication, something that has to be taken into account when evaluating the security of a security system that uses RSA.
This isn't anything new. RSA's homomorphism was known in the 1970s, and other algorithms that are homomorphic with respect to addition have been known since the 1980s. But what has eluded cryptographers is a fully homomorphic cryptosystem: one that is homomorphic under both addition and multiplication and yet still secure. And that's what IBM researcher Craig Gentry has discovered.
This is a bigger deal than might appear at first glance. Any computation can be expressed as a Boolean circuit: a series of additions and multiplications. Your computer consists of a zillion Boolean circuits, and you can run programs to do anything on your computer. This algorithm means you can perform arbitrary computations on homomorphically encrypted data. More concretely: if you encrypt data in a fully homomorphic cryptosystem, you can ship that encrypted data to an untrusted person and that person can perform arbitrary computations on that data without being able to decrypt the data itself. Imagine what that would mean for cloud computing, or any outsourcing infrastructure: you no longer have to trust the outsourcer with the data.
Unfortunately -- you knew that was coming, right? -- Gentry’s scheme is completely impractical. It uses something called an ideal lattice as the basis for the encryption scheme, and both the size of the ciphertext and the complexity of the encryption and decryption operations grow enormously with the number of operations you need to perform on the ciphertext -- and that number needs to be fixed in advance. And converting a computer program, even a simple one, into a Boolean circuit requires an enormous number of operations. These aren't impracticalities that can be solved with some clever optimization techniques and a few turns of Moore's Law; this is an inherent limitation in the algorithm. In one article, Gentry estimates that performing a Google search with encrypted keywords -- a perfectly reasonable simple application of this algorithm -- would increase the amount of computing time by about a trillion. Moore’s law calculates that it would be 40 years before that homomorphic search would be as efficient as a search today, and I think he’s being optimistic with even this most simple of examples.
Despite this, IBM’s PR machine has been in overdrive about the discovery. Its press release makes it sound like this new homomorphic scheme is going to rewrite the business of computing: not just cloud computing, but "enabling filters to identify spam, even in encrypted email, or protection information contained in electronic medical records." Maybe someday, but not in my lifetime.
This is not to take anything away anything from Gentry or his discovery. Visions of a fully homomorphic cryptosystem have been dancing in cryptographers' heads for thirty years. I never expected to see one. It will be years before a sufficient number of cryptographers examine the algorithm that we can have any confidence that the scheme is secure, but -- practicality be damned -- this is an amazing piece of work.
Posted on July 9, 2009 at 6:36 AM
• 54 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
>>>Maybe someday, but not in my lifetime.
May be expand this in new technologies for online banking, WPA3, etc?
So... if you can filter for spam in encrypted email, does that mean you can detect financial fraud in encrypted ledgers?
Imagine the fraud case; "No, your honor, we don't know what he did, but we know it's illegal!"
Still, the problem has cracked to analysis.
Once the crytographers have wiggled it around a bit it will only improve. Attacks only get better, never worse. ;-)
I have to say, I'm struggling to think of a practical use for this even if the computational requirements can be brought to heel.
Peter Wayner wrote a book, Translucent Databases, about doing things with encrypted data in databases without needing to decrypt it. What you can do in the examples are fairly limited, but looking over the book may give you better insight into why being able to do even more powerful things is desirable.
'Imagine the fraud case; "No, your honor, we don't know what he did, but we know it's illegal!"'
This is actually not quite as daft as it sounds.
With hand cipher systems it often possible with no other information to tell the language the plaintext is in, without doing very much work.
So yes that might just happen.
And with Gentry's system definatly as although the data befor and after are encrypted the transformation process is not.
The ultimate system would be not just to be able to perform logical and mathmatical (yeah I know they are but one) operations on encrypted data but to have the program also be effectivly encrypted as well so that there is not even the posability of saying,
"No, your honor, we don't know what he did, but we know it's illegal!"
Is it possible well actualy yes I think it is with Gentry's system but as Bruce notes it is not going to be practical.
"I'm struggling to think of a practical use for this even if the computational requirements can be brought to heel."
Have a look at the book Cryptovirology by Adam Young and Moti Yung, some of the things they have thought up may give you (white hair or) pause for thought.
You can't actually tell from the encrypted result what the answer was. Instead you would have some encrypted value that when decrypted would give the algorithm's answer. So you could ship your email encrypted to some service, which would run its filter algorithm on it, then ship it back to you. You then decrypt it and put it either in the "spam" or "not-spam" folder. At no time does the filtering company know whether the message is spam or not.
Calum: You must not be trying very hard. If the computational requirements were low enough, I can come up with a lot of potential uses. Think about things like Google Docs, TurboTax, etc. For example, Google's word processor doesn't really want to send the client its whole dictionary for spellchecking, but it wants to be able to spellcheck your document. You as a client would prefer that Google didn't actually know the contents of your document if they do not need to. So the document can be stored encrypted on Google's servers and transmitted to you. Your client can decrypt it and transmit encrypted words back to Google, which can spellcheck the words and send them back to your client. The client decrypts the results and puts red squigglies on the incorrect words.
Of course, as Bruce says, this is unlikely in the near future. But if it was cheap enough, and secure enough, there are many possibilities for turning this into real improvements in security and privacy for users of network applications.
Could somebody explain the example of doing a Google search with encrypted keywords?
Is the assumption that the pages being sought are also homomorphically encrypted? Or does this imply that Google would maintain an encrypted copy of everything in its index to be ready to respond to these searches? Either way, I don't see how this keeps Google from knowing what you're searching, or at least what results you got...
> The RSA algorithm is different. Encrypt P to get C, multiply C by 2, and then decrypt 2C -- and you get 2P.
I don't think this is entirely correct. If you decrypt 2C, you get (2^d) P (mod N), where d is your decryption exponent, and N the modulus.
What is meant is, if P' is the encryption of 2, then decrypting P' * P gives you 2C
Scratch "either way" from my last sentence... I can see how it works the first way (encrypted searches for encrypted pages) but not the second.
While we can express any computation as a Boolean circuit, the class of circuits that can be encoded by the algorithm is limited to NC1 (circuits of polynomial complexity, but at most logarithmic depth).
Basically stuff that is intrinsically parallelizable.
And keep in mind, NC1 basically rules out recursion (you can only unroll to a bounded depth using the fanout) so its hard to envision an application that can fit all of the limitations of this algorithm.
In essence your Boolean circuit can have no feedback loops.
Lets simplify this to a search that might return one of two results; either page A, or page B.
You start with a document that is [search query]; [this represents an encrypted document that you cannot access].
You run a calculation that converts search queries to relevancies, the document now looks like [(relevance A, relevance B)]
You can now set the document to:
A * (relevance A >= relevance B) + B * (relevance A
The document now is either [A] or [B]. You send that result back to the user, who decrypts it and gets the correct page.
I'm not quite following how google would be able to perform a spell check when effectively given nothing but homomorphic encrypted words.
So the document can be stored encrypted on Google's servers and transmitted to you.
OK. This part works. Nothing preventing google storing an encrypted document.
Your client can decrypt it
OK. This part also works.
and transmit encrypted words back to Google, which can spellcheck the words and send them back to your client.
The only way this could possibily work would be if a spell check could be performed on a word solely as a mathmatical operation. The entire point of this encryption is so that an untrusted party (google) could perform mathmatical operations on a piece of data without knowing what the contents of that data is.
So you don't trust the outsource machine with your data...but some how you trust that it will do the required calculations on the data correctly?
@ Edward Kmett,
"In essence your Boolean circuit can have no feedback loops."
True but it does not stop their being either intermediate storage or counters or for that matter data comparison (provided you know the encrypted value to compare against)
So you could make it do generalised computing.
@Jesse McNelis: Perhaps you just want a 3rd party to sign some sensitive data. Like a notary, or a trusted timestamp, but without revealing to them what they are signing.
I don't understand anyone's intention to render ciphertext searchable, yet without deciphering it. I DON'T WANT ciphertext to be searchable. In many cases all that an adversary may wish to do is determine if ciphertext does or does not contain a particular semantic instance. Who cares about grammar ... if one determines the presence of a dozen subjects in the ciphertext, won't this compromise security? If you want to search, this should be done using a completely different system.
Perhaps I don't understand the utility of this, perhaps it is intended for storage of merely confidential information.
Yet in fact, this issue often arises with full-disk encryption vendors. One explained to me that their software doesn't have to decrypt to search. Huh??? Either it's not much of a search and it's decrypting headers only, or it does decrypt then re-encrypt.
"One explained to me that their software doesn't have to decrypt to search. Huh??? Either it's not much of a search and it's decrypting headers only, or it does decrypt then re-encrypt."
There is another option in that they have used a simple cipher book as opposed to cipher chaining encryption on the disk. To search that you just have to encrypt your (expanded) search term(s) and do a direct compare (assuming block aligning etc).
However which ever way they are doing it if the salesman is correct (which he probably is not) then it does not inspire confidence in the product at all.
My guess is that the sales person does not understand that the encrypt/decrypt is done "transparently" in hardware at a rate that does not significantly effect the HD access times.
They aren't trying to sell anything yet, they're trying to make some interesting news that will get more researchers to look at homomorphic systems. Of course PR is going to exaggerate everything, but if more people start working on it, this might well turn into something very practical. Maybe not what the way they're currently predicting, of course.
I'm puzzled by two aspects of the press on this genuinely cool cryptosystem:
1. Why the focus on searching encrypted data / oblivious querying? Both have been solved for over a decade. There are plenty of semi-practical protocols for private information retrieval (PIR) and searching encrypted databases.
2. Isn't Craig Gentry at Stanford? (I know, this isn't that important.)
This seems interesting, but since this is all based on his dissertation, and his dissertation is unpublished, I can't tell if this is real or made-up. Mind you, I've let a lot of my math skills rust away, so it is possible that even if the dissertation was published, that I wouldn't be able to tell.
If, however, you're able to get a copy, please pop me an email.
@slakin & Clive Robinson
The point isn't searching encrypted data for plaintext words - that ability would obviously mean a broken cryptosystem. The point is searching encrypted data for encrypted words, yielding an encrypted result.
Re: abusing comparison and intermediate storage to recover arbitrary computation. It isn't that simple.
You have multiplication and addition giving you effectively ands and ors.
Regarding external comparison for equality, the ciphertext for a given plaintext isn't necessarily the same. After all, you have the ever decaying pool of corrections, which is what limits your evaluation depth in the first place. The representation has redundancies.
Of course you can perform comparisons within the system. if/then/else can be implemented with true and false as 0 and 1, and returning then * cond + else * (1 - cond), but that consumes part of your error correction pool! Also note, that it requires evaluation of both sides of the conditional. You can't change the course of 'what you will do' based on accumulated information.
In Haskell, the analogy would be to an Applicative vs. a Monad. There are a lot of kinds of computations you just can't do with the former. Not only that, this is weaker still because of the lack of recursion.
Ultimately, f you want to try to abuse this for general purpose computation beyond NC1 you have to come up for air and ask the source to decrypt and re-encrypt a value to refresh your error pool, because the pool is of bounded size and you can't change the 'shape' of future computation. And with that, you have now leaked timing information. Now attacks start to arise that look like the power-draw monitoring side channel attacks that have been made against smart cards, etc.
As a poorly thought out aside, I'm somewhat curious if an attack vector could use known plaintext and detailed knowledge of the decryption algorithm to try to recover some information about the operations performed to obtain the value.
Er.. true and false as 1 and 0 obviously.
I think a copy is out at STOC or some such conference. I've seen a preprint of the article, and we did a seminar on it at school. It's genuine, though I think they have a few (not so consequential) holes to deal with.
People are focusing on database and web, but there are many opportunities with this type of encryption. How about secure teleconferencing over untrusted teleconferencing systems? Each handset encrypts its signal, the phone system combines all the input signals and sends out the multiplexed signal which is decrypted by the handsets. The system is never privy to the actual contents of the call.
Hrmm. That should be possible using much lighter weight mechanisms. I designed and sold a voice compression codec a long time ago optimized for cheap server remixing by xoring streams (explicitly for a scenarios like teleconferencing) but where each source was excluded from the audio stream that came back to him. It seems to me that this could probably be done with just homomorphic addition, though that said it would probably be subject to manipulation without careful design. i.e. the server deciding to include the same audio stream twice and dropping one other one, allowing it to manipulate what would be heard by the participants.
Though, I guess the efficacy of that attack depends on how stateful the stream is.
@ Edward Kmett,
I think you have mis understood what I was getting at.
In most programs looping is bassed on counters etc not on what is happening to the data. Branching likewise is often not on the data.
If you split out the data related actions from program control you would find that you don't need to be using the "ideal lattice PK" for anything other than actions pertaining directly to the data.
The only time the two cross over is when the program control is dictated by the result of a compare on the data if it is a direct comparison (ie ==) and not a greater or less than then the comparison can be perormed against a precomputed and encrypted value.
@ Edward Kmett,
Sorry I should have given a real world example.
In DSP work you manipulate data to do filtering etc, with the basic MAD instruction all of the data is effectivly multiplied by constants etc the actual control is independent of the operations on the data.
Provided care is taken to prevent hitting the end stops at no point does the control program have to do anything based on the data.
There are also one heck of a lot of other programs that likewise manipulate data "blind" and have no reason to see it.
@ Jesse McNelis,
"but some how you trust that it will do the required calculations on the data correctly?"
To use the CIA moto "In God we trust, all others we check".
The system doing the work has no knowledge of the data just the operations it is performing on it.
If you include an "engineering" or "check" channel into the data then you will know if the data has been correctly processed.
So let us say you have a thousand items to be processed you add check channels in as Hamming weighted participants.
You know from precomputation etc what the results should be for the check channels. If (and only if) they are all correct then you can be reasonably certain that all the other data has been processed correctly.
@ Edward Kmett,
"As a poorly thought out aside, I'm somewhat curious if an attack vector could use known plaintext and detailed knowledge of the decryption algorithm to try to recover some information about the operations performed to obtain the value."
That thought had also occured to me which I mentioned in my first post on this page (@July 9, 2009 7:46 AM) as "The ultimate system".
And my first thought solution was to split the program into two or more parts and combine the results.
And oddly the method I thought of was a shared secret style system, the simplest of which uses the XOR function in a very similar way to the system you losley described to Eric...
Oh when you corrected you true/false 0/1 error you also forgot to correct your ands and ors to Nands and Nors, to do addition or multiplication you need to use the XOR function and that needs multiple invertions to work in the usuall simplest case,
XOR(A,B) = !( !(A . !(A.B)) . !(B . !(A.B)) )
I have yet to read the paper/dissertation, but does anyone know whether the slow-down is a polynomial one or an exponential one? i.e. Is it theoretically slow or just practically slow (while 40 iterations of moores law is a lot, if by the 43rd or 44th iteration it becomes feasible, this is still a major breakthrough)?
If he has proposed an exponential increase in complexity to do homomorphic computation, is this inherently true for the problem or is it just his method of doing it that has the slow-down?
I think the important part of this discovery is that it's possible at all. It isn't practical, but this will cause two things to happen:
1. People will search for more practical homomorphic encryption systems (previously a fools errand).
2. People will develop uses for homomorphic encryption.
Now, internet searches and spell checking are really hard to express mathematically. On the other hand, the calculations that happen on mint.com might be within the realm of possibility within 10 years.
Yes, you can effectively unroll any loop counter that doesn't care about making any decisions about the data, basically doing partial evaluation and leaving the addition/multiplication operations as the residue to be applied in the the homomorphic ring. But since you're limited to NC1, I can only evaluate O(log n) multiplications or additions in series.
If you view the computation as a directed acyclic graph of data elements at the leaves, annotated with additions and multiplications, your graph can't have more than a logarithmic depth. So you can't just, for instance, iterate over the elements.
You could search for matching elements (by replicating a small recognition circuit that outputted a boolean result) and add up the match count monoidally, but there isn't much you can do beyond that.
So if you view the resulting system as homomorphic map/reduce, you can map a small circuit over each element, but your reduce operation also has to be monoidal and circuit-sized as well (or at least work over a regular structure).
My point was that it doesn't cover general purpose computation, not that there is nothing you can use it for. Common highly-parallelizable DSP stuff fits this paradigm well, because it already fits the feed-forward circuit paradigm well.
I think the choice of meaning for 'general purpose computation' is where we got tangled up. ;)
As a non-cryptographer i cannot see a way that homomorphic encryption will ever allow efficient retrieval or search of data.
I imagine the encrypted data to be in a locked box. One can inject data into it and run algorithms on the data and the encrypted value. It is not possible to extract any information out of the box. (Without the key)
If someone wanted to serve an encrypted search, he would have to inject the whole body of data into the box for every search. No information about which bits of the data are really needed could be extracted out of the box. This is like a linear search without any benefits of indexing.
Hrmm, well off the cuff, if you have both addition and multiplication available to you. I suppose if you provided a 1 and a 0, you could allow someone to construct an arbitrary encrypted numeral. But regardless, yes, you are likely to have to traverse every element uniformly, so a lot of logarithmic or better times will be replaced with something linear in terms of the size of your data set.
"As a non-cryptographer i cannot see a way that homomorphic encryption will ever allow efficient retrieval or search of data."
In it's present form no neither can most people.
However efficient searching in encrypted data is like seeking the "Holy Grail" for various groups. And has been noted earlier working homomorphic encryption is but the first step on the pathway.
There are several issues with homomorphic encryption that appear to fly in the face of reasonable behaviour for most uses because,
1, You have to encrypt
2, You have to send it out for processing
3, You have to pull the results back
4, You have to decrypt the results.
All of which is considerably more expensive in CPU cycles and time than performing the equivalent cycle/time trade off on your own hardware.
Then on top of this the current system has a depth bounds issue which renders it effectivly impracticle for anything other than significaly limited use that would be well within the capabilities of the hardware requirments for just one of the steps above.
However there is one area where the "cost" in CPU cycles and time is irrelevant and that is "malware".
For instance if you have malware that tailors it's propagation behaviour by how many systems it's already infected (for say stelth reasons) the simplest way would be with a counter. However having that counter "unencrypted" in the code would effectivly make stoping propergation easy in that defenders would just have to seed the "kill limit" with an appropriate value.
Using homomorphic encryption within the virus would render that form of attack against it just a matter of chance.
If you have a google for [cryptovirology yung young] you should get links to their work in this area including the cryptovirology website and book.
The book is quite thought provoking in this area and written in a way that makes it a fairly easy read for anybody with an interest and little more than grade school math (and even thats not required).
Any encryption system homomorpic over addition seems trivially insecure?
C1 = E(P1, K)
C2 = E(P2, K)
C2 - P2 + P1 = E(P2-P2+P1, K) = C1
C1 - C2 = P1 - P2
A system with this property would seem useless - what te heck am I missing here?
"enabling filters to identify spam, even in encrypted email..."
From my experience, SPAMMERS are practically more intelligent than people who claim the above line.
Your line #3 is wrong:
C2 - P2 + P1 = E(P2-P2+P1, K) = C1
since the encryption of P2 isn't itself (and the same for P1). The correct line would be
C2 - C2 + C1 = E(P2-P2+P1, K) = C1
which gives no problems.
Ignoring the combinatorial-explosion issues, wouldn't homomorphic encryption make possible a lot of arbitrary (as on "not considered when the data was encrypted) zero-knowledge-proof sorts of things?
Encrypted search might also be exactly what you would want in certain kinds of legal proceedings; it should be possible to verify or disprove the presence of certain agreed-on-by-both-sides'-lawyers bit patterns in the plaintext with much less damage to privacy than simply turning over the whole plaintext.
you're right. There is a typo in Bruce's article. If you do an RSA decryption of 2*C, then you do not get 2*P, you get (2^d)*P.
One thing Schneier didn't mention: this would allow for searches that don't reveal the searched-for term.
I don't see why Google would ever do it - they live off that information after all - but hypothetically it'd allow for interesting increases in privacy.
From this post:
"the size of the ciphertext and the complexity of the encryption and decryption operations grow enormously with the number of operations you need to perform on the ciphertext"
The entire point that makes fully homomorphic encryption so amazing is that this is exactly NOT the case.
Encryption/Decryption time and the ciphertext length only depend on the length of the encrypted message (and of course some
security parameter), but NOT on the size of the circuit. Schemes where the ciphertext length can depend on the circuit size have long been known.
Edward and others:
...the class of circuits that can be encoded by the algorithm is limited to NC1...
Not correct. The "basic" scheme in Gentry's paper is limited to NC1, but he then shows how to extend it to circuits of arbitrary depth.
I don't know you, but in my lifetime I have seen far more complex problems being solved that once sounded impractical.
Take Craig Venter's genome discoveries.
It is entirely reasonable that if this discovery needs 1THz processors to be viable, then we can start preparing for those. Less than 20 years ago we had 1Mhz, so 1THz is not such far feteched. Just today, NVIDIA announced their supercomputer.
Don't underestimate technology
Well. maybe it is not practical in many ways. I do believe that in the near future it would be practical. :) Hope me be a contributor
The March issue of CACM has an article by Gentry, "Computing Arbitrary Functions of Encrypted Data" that draws on a recent paper by van Dijk, Gentry, Halevi and Vaikuntanathan, "Fully Homomorphic Encryption over the Integers."
Can anyone comment on the likelihood that these schemes will result in some usable code?
There is a problem with any homomorphic encryption system; it implies that a given unit of data is encrypted the same way each time. Any such system that works on small units of data (letters, words, columns in a DB, etc.) may be subject to cracking by frequency analysis (like the cryptogram puzzle in the newspaper). For cyphertext to be searchable, it would need to be performed on discreet units of data (words or letters), and therefore would be subject to cracking by frequency analysis.
For data that has no known frequency distribution (e.g. sales numbers, etc.), such a system might be useful, but for any known language, searchable text is simply not going to be secure.
A correction to my post above. Homomorphic encryption doesn't necessarily require a given unit of data be encrypted the same way each time (for a given key), however, for the cyphertext to be searchable, that must be true.
to Carlos Osuna:
Quantum Computers would be helpful in making these types of operations IBM is hyping about possible. But that brings with it the scariness that the human brain can be modeled in a large quantum computer.
That scares me, having seen the Terminator series of movies.
Gentry's work is definitely a breakthrough, but there are few (inherent and vital) limitations for its practical usage as many articles published so far have shown already. Still there is hope in cryptographers' minds that it will be realized one day in future.
Does HElib's release warrant an update or addendum to your article, Bruce?
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc.