I want to know about the practicality of above idea before submitting my thesis proposal to university.Is it really possible to play with HElib in Openstack environment and make some application?

Help would be greatly appreciated!!!!!

]]>Quantum Computers would be helpful in making these types of operations IBM is hyping about possible. But that brings with it the scariness that the human brain can be modeled in a large quantum computer.

That scares me, having seen the Terminator series of movies.]]>

For data that has no known frequency distribution (e.g. sales numbers, etc.), such a system might be useful, but for any known language, searchable text is simply not going to be secure.

]]>Can anyone comment on the likelihood that these schemes will result in some usable code?

]]>Take Craig Venter's genome discoveries.

It is entirely reasonable that if this discovery needs 1THz processors to be viable, then we can start preparing for those. Less than 20 years ago we had 1Mhz, so 1THz is not such far feteched. Just today, NVIDIA announced their supercomputer.

Don't underestimate technology

]]>*...the class of circuits that can be encoded by the algorithm is limited to NC1...*

Not correct. The "basic" scheme in Gentry's paper is limited to NC1, but he then shows how to extend it to circuits of arbitrary depth.

]]>"the size of the ciphertext and the complexity of the encryption and decryption operations grow enormously with the number of operations you need to perform on the ciphertext"

The entire point that makes fully homomorphic encryption so amazing is that this is exactly NOT the case.

Encryption/Decryption time and the ciphertext length only depend on the length of the encrypted message (and of course some

security parameter), but NOT on the size of the circuit. Schemes where the ciphertext length can depend on the circuit size have long been known.

I don't see why Google would ever do it - they live off that information after all - but hypothetically it'd allow for interesting increases in privacy.

]]>you're right. There is a typo in Bruce's article. If you do an RSA decryption of 2*C, then you do not get 2*P, you get (2^d)*P.

]]>Encrypted search might also be exactly what you would want in certain kinds of legal proceedings; it should be possible to verify or disprove the presence of certain agreed-on-by-both-sides'-lawyers bit patterns in the plaintext with much less damage to privacy than simply turning over the whole plaintext.

]]>Your line #3 is wrong:

C2 - P2 + P1 = E(P2-P2+P1, K) = C1

since the encryption of P2 isn't itself (and the same for P1). The correct line would be

C2 - C2 + C1 = E(P2-P2+P1, K) = C1

which gives no problems.

]]>From my experience, SPAMMERS are practically more intelligent than people who claim the above line.

]]>C1 = E(P1, K)

C2 = E(P2, K)

C2 - P2 + P1 = E(P2-P2+P1, K) = C1

C1 - C2 = P1 - P2

A system with this property would seem useless - what te heck am I missing here?

"As a non-cryptographer i cannot see a way that homomorphic encryption will ever allow efficient retrieval or search of data."

In it's present form no neither can most people.

However efficient searching in encrypted data is like seeking the "Holy Grail" for various groups. And has been noted earlier working homomorphic encryption is but the first step on the pathway.

There are several issues with homomorphic encryption that appear to fly in the face of reasonable behaviour for most uses because,

1, You have to encrypt

2, You have to send it out for processing

3, You have to pull the results back

4, You have to decrypt the results.

All of which is considerably more expensive in CPU cycles and time than performing the equivalent cycle/time trade off on your own hardware.

Then on top of this the current system has a depth bounds issue which renders it effectivly impracticle for anything other than significaly limited use that would be well within the capabilities of the hardware requirments for just one of the steps above.

However there is one area where the "cost" in CPU cycles and time is irrelevant and that is "malware".

For instance if you have malware that tailors it's propagation behaviour by how many systems it's already infected (for say stelth reasons) the simplest way would be with a counter. However having that counter "unencrypted" in the code would effectivly make stoping propergation easy in that defenders would just have to seed the "kill limit" with an appropriate value.

Using homomorphic encryption within the virus would render that form of attack against it just a matter of chance.

If you have a google for [cryptovirology yung young] you should get links to their work in this area including the cryptovirology website and book.

The book is quite thought provoking in this area and written in a way that makes it a fairly easy read for anybody with an interest and little more than grade school math (and even thats not required).

]]>Hrmm, well off the cuff, if you have both addition and multiplication available to you. I suppose if you provided a 1 and a 0, you could allow someone to construct an arbitrary encrypted numeral. But regardless, yes, you are likely to have to traverse every element uniformly, so a lot of logarithmic or better times will be replaced with something linear in terms of the size of your data set.

]]>I imagine the encrypted data to be in a locked box. One can inject data into it and run algorithms on the data and the encrypted value. It is not possible to extract any information out of the box. (Without the key)

If someone wanted to serve an encrypted search, he would have to inject the whole body of data into the box for every search. No information about which bits of the data are really needed could be extracted out of the box. This is like a linear search without any benefits of indexing.

]]>Yes, you can effectively unroll any loop counter that doesn't care about making any decisions about the data, basically doing partial evaluation and leaving the addition/multiplication operations as the residue to be applied in the the homomorphic ring. But since you're limited to NC1, I can only evaluate O(log n) multiplications or additions in series.

If you view the computation as a directed acyclic graph of data elements at the leaves, annotated with additions and multiplications, your graph can't have more than a logarithmic depth. So you can't just, for instance, iterate over the elements.

You could search for matching elements (by replicating a small recognition circuit that outputted a boolean result) and add up the match count monoidally, but there isn't much you can do beyond that.

So if you view the resulting system as homomorphic map/reduce, you can map a small circuit over each element, but your reduce operation also has to be monoidal and circuit-sized as well (or at least work over a regular structure).

My point was that it doesn't cover general purpose computation, not that there is nothing you can use it for. Common highly-parallelizable DSP stuff fits this paradigm well, because it already fits the feed-forward circuit paradigm well.

I think the choice of meaning for 'general purpose computation' is where we got tangled up. ;)

]]>1. People will search for more practical homomorphic encryption systems (previously a fools errand).

2. People will develop uses for homomorphic encryption.

Now, internet searches and spell checking are really hard to express mathematically. On the other hand, the calculations that happen on mint.com might be within the realm of possibility within 10 years.

]]>If he has proposed an exponential increase in complexity to do homomorphic computation, is this inherently true for the problem or is it just his method of doing it that has the slow-down?

]]>"As a poorly thought out aside, I'm somewhat curious if an attack vector could use known plaintext and detailed knowledge of the decryption algorithm to try to recover some information about the operations performed to obtain the value."

That thought had also occured to me which I mentioned in my first post on this page (@July 9, 2009 7:46 AM) as "The ultimate system".

And my first thought solution was to split the program into two or more parts and combine the results.

And oddly the method I thought of was a shared secret style system, the simplest of which uses the XOR function in a very similar way to the system you losley described to Eric...

Oh when you corrected you true/false 0/1 error you also forgot to correct your ands and ors to Nands and Nors, to do addition or multiplication you need to use the XOR function and that needs multiple invertions to work in the usuall simplest case,

XOR(A,B) = !( !(A . !(A.B)) . !(B . !(A.B)) )

]]>"but some how you trust that it will do the required calculations on the data correctly?"

To use the CIA moto "In God we trust, all others we check".

The system doing the work has no knowledge of the data just the operations it is performing on it.

If you include an "engineering" or "check" channel into the data then you will know if the data has been correctly processed.

So let us say you have a thousand items to be processed you add check channels in as Hamming weighted participants.

You know from precomputation etc what the results should be for the check channels. If (and only if) they are all correct then you can be reasonably certain that all the other data has been processed correctly.

]]>Sorry I should have given a real world example.

In DSP work you manipulate data to do filtering etc, with the basic MAD instruction all of the data is effectivly multiplied by constants etc the actual control is independent of the operations on the data.

Provided care is taken to prevent hitting the end stops at no point does the control program have to do anything based on the data.

There are also one heck of a lot of other programs that likewise manipulate data "blind" and have no reason to see it.

]]>I think you have mis understood what I was getting at.

In most programs looping is bassed on counters etc not on what is happening to the data. Branching likewise is often not on the data.

If you split out the data related actions from program control you would find that you don't need to be using the "ideal lattice PK" for anything other than actions pertaining directly to the data.

The only time the two cross over is when the program control is dictated by the result of a compare on the data if it is a direct comparison (ie ==) and not a greater or less than then the comparison can be perormed against a precomputed and encrypted value.

]]>Hrmm. That should be possible using much lighter weight mechanisms. I designed and sold a voice compression codec a long time ago optimized for cheap server remixing by xoring streams (explicitly for a scenarios like teleconferencing) but where each source was excluded from the audio stream that came back to him. It seems to me that this could probably be done with just homomorphic addition, though that said it would probably be subject to manipulation without careful design. i.e. the server deciding to include the same audio stream twice and dropping one other one, allowing it to manipulate what would be heard by the participants.

Though, I guess the efficacy of that attack depends on how stateful the stream is.

]]>Re: abusing comparison and intermediate storage to recover arbitrary computation. It isn't that simple.

You have multiplication and addition giving you effectively ands and ors.

Regarding external comparison for equality, the ciphertext for a given plaintext isn't necessarily the same. After all, you have the ever decaying pool of corrections, which is what limits your evaluation depth in the first place. The representation has redundancies.

Of course you can perform comparisons within the system. if/then/else can be implemented with true and false as 0 and 1, and returning then * cond + else * (1 - cond), but that consumes part of your error correction pool! Also note, that it requires evaluation of both sides of the conditional. You can't change the course of 'what you will do' based on accumulated information.

In Haskell, the analogy would be to an Applicative vs. a Monad. There are a lot of kinds of computations you just can't do with the former. Not only that, this is weaker still because of the lack of recursion.

Ultimately, f you want to try to abuse this for general purpose computation beyond NC1 you have to come up for air and ask the source to decrypt and re-encrypt a value to refresh your error pool, because the pool is of bounded size and you can't change the 'shape' of future computation. And with that, you have now leaked timing information. Now attacks start to arise that look like the power-draw monitoring side channel attacks that have been made against smart cards, etc.

As a poorly thought out aside, I'm somewhat curious if an attack vector could use known plaintext and detailed knowledge of the decryption algorithm to try to recover some information about the operations performed to obtain the value.

]]>The point isn't searching encrypted data for plaintext words - that ability would obviously mean a broken cryptosystem. The point is searching encrypted data for encrypted words, yielding an encrypted result.

]]>If, however, you're able to get a copy, please pop me an email.

]]>1. Why the focus on searching encrypted data / oblivious querying? Both have been solved for over a decade. There are plenty of semi-practical protocols for private information retrieval (PIR) and searching encrypted databases.

2. Isn't Craig Gentry at Stanford? (I know, this isn't that important.)

]]>"One explained to me that their software doesn't have to decrypt to search. Huh??? Either it's not much of a search and it's decrypting headers only, or it does decrypt then re-encrypt."

There is another option in that they have used a simple cipher book as opposed to cipher chaining encryption on the disk. To search that you just have to encrypt your (expanded) search term(s) and do a direct compare (assuming block aligning etc).

However which ever way they are doing it if the salesman is correct (which he probably is not) then it does not inspire confidence in the product at all.

My guess is that the sales person does not understand that the encrypt/decrypt is done "transparently" in hardware at a rate that does not significantly effect the HD access times.

]]>Perhaps I don't understand the utility of this, perhaps it is intended for storage of merely confidential information.

Yet in fact, this issue often arises with full-disk encryption vendors. One explained to me that their software doesn't have to decrypt to search. Huh??? Either it's not much of a search and it's decrypting headers only, or it does decrypt then re-encrypt.

"In essence your Boolean circuit can have no feedback loops."

True but it does not stop their being either intermediate storage or counters or for that matter data comparison (provided you know the encrypted value to compare against)

So you could make it do generalised computing.

]]>]]>

I'm not quite following how google would be able to perform a spell check when effectively given nothing but homomorphic encrypted words.

So the document can be stored encrypted on Google's servers and transmitted to you.

OK. This part works. Nothing preventing google storing an encrypted document.

----

Your client can decrypt it

OK. This part also works.

---

and transmit encrypted words back to Google, which can spellcheck the words and send them back to your client.

What?

The only way this could possibily work would be if a spell check could be performed on a word solely as a mathmatical operation. The entire point of this encryption is so that an untrusted party (google) could perform mathmatical operations on a piece of data without knowing what the contents of that data is.

Lets simplify this to a search that might return one of two results; either page A, or page B.

You start with a document that is [search query]; [this represents an encrypted document that you cannot access].

You run a calculation that converts search queries to relevancies, the document now looks like [(relevance A, relevance B)]

You can now set the document to:

A * (relevance A >= relevance B) + B * (relevance A

The document now is either [A] or [B]. You send that result back to the user, who decrypts it and gets the correct page.

]]>Basically stuff that is intrinsically parallelizable.

And keep in mind, NC1 basically rules out recursion (you can only unroll to a bounded depth using the fanout) so its hard to envision an application that can fit all of the limitations of this algorithm.

In essence your Boolean circuit can have no feedback loops.

]]>I don't think this is entirely correct. If you decrypt 2C, you get (2^d) P (mod N), where d is your decryption exponent, and N the modulus.

What is meant is, if P' is the encryption of 2, then decrypting P' * P gives you 2C

]]>Is the assumption that the pages being sought are also homomorphically encrypted? Or does this imply that Google would maintain an encrypted copy of everything in its index to be ready to respond to these searches? Either way, I don't see how this keeps Google from knowing what you're searching, or at least what results you got...

]]>You can't actually tell from the encrypted result what the answer was. Instead you would have some encrypted value that when decrypted would give the algorithm's answer. So you could ship your email encrypted to some service, which would run its filter algorithm on it, then ship it back to you. You then decrypt it and put it either in the "spam" or "not-spam" folder. At no time does the filtering company know whether the message is spam or not.

Calum: You must not be trying very hard. If the computational requirements were low enough, I can come up with a lot of potential uses. Think about things like Google Docs, TurboTax, etc. For example, Google's word processor doesn't really want to send the client its whole dictionary for spellchecking, but it wants to be able to spellcheck your document. You as a client would prefer that Google didn't actually know the contents of your document if they do not need to. So the document can be stored encrypted on Google's servers and transmitted to you. Your client can decrypt it and transmit encrypted words back to Google, which can spellcheck the words and send them back to your client. The client decrypts the results and puts red squigglies on the incorrect words.

Of course, as Bruce says, this is unlikely in the near future. But if it was cheap enough, and secure enough, there are many possibilities for turning this into real improvements in security and privacy for users of network applications.

]]>"I'm struggling to think of a practical use for this even if the computational requirements can be brought to heel."

Have a look at the book Cryptovirology by Adam Young and Moti Yung, some of the things they have thought up may give you (white hair or) pause for thought.

]]>'Imagine the fraud case; "No, your honor, we don't know what he did, but we know it's illegal!"'

This is actually not quite as daft as it sounds.

With hand cipher systems it often possible with no other information to tell the language the plaintext is in, without doing very much work.

So yes that might just happen.

And with Gentry's system definatly as although the data befor and after are encrypted the transformation process is not.

The ultimate system would be not just to be able to perform logical and mathmatical (yeah I know they are but one) operations on encrypted data but to have the program also be effectivly encrypted as well so that there is not even the posability of saying,

"No, your honor, we don't know what he did, but we know it's illegal!"

Is it possible well actualy yes I think it is with Gentry's system but as Bruce notes it is not going to be practical.

]]>