Defending Against Algorithm Substitution Attacks

Interesting paper: M. Bellare, K. Paterson, and P. Rogaway, “Security of Symmetric Encryption against Mass Surveillance.”

Abstract: Motivated by revelations concerning population-wide surveillance of encrypted communications, we formalize and investigate the resistance of symmetric encryption schemes to mass surveillance. The focus is on algorithm-substitution attacks (ASAs), where a subverted encryption algorithm replaces the real one. We assume that the goal of “big-brother” is undetectable subversion, meaning that ciphertexts produced by the subverted encryption algorithm should reveal plaintexts to big-brother yet be indistinguishable to users from those produced by the real encryption scheme. We formalize security notions to capture this goal and then offer both attacks and defenses. In the first category we show that successful (from the point of view of big brother) ASAs may be mounted on a large class of common symmetric encryption schemes. In the second category we show how to design symmetric encryption schemes that avoid such attacks and meet our notion of security. The lesson that emerges is the danger of choice: randomized, stateless schemes are subject to attack while deterministic, stateful ones are not.

Posted on June 24, 2014 at 7:21 AM29 Comments

Comments

Freemove Quantum Exchange June 24, 2014 8:35 AM

On wuala.com/pgpstore you can find an example Freemove Quantum Exchange Group which uses a combination of symmetric provable secure encryption with asymmetric signing (both using true (quantum) randomness)

Note that pgpstore an example group. Regular Freemove Quantum Exchange Groups which use these functions are not visible on the public internet.

Perhaps it is interesting to test the resistance of the used symmetric encryption using the algorithm-substitution attack to which is referred.

Thoth June 24, 2014 8:48 AM

A little late on the article but is all good 🙂 . This is a must read paper for cryptographers and security professionals. Very interesting concept.

Bob S. June 24, 2014 8:59 AM

NSA has repeatedly said it “collects” ALL encrypted communications.

A L L !

So they are getting our bank records, amazon.com purchases, investment records….ALL of it.

The idea they portray is they only peek sometimes, and it’s for future research…nothing political you know.

Horse manure.

I would have to guess they can read encrypted communications via their usual MO of legal coercion, co-opted providers, skulduggery and frank physical attacks.

Anyway, WE need to do something about it and this article points one possible option.

Gweihir June 24, 2014 9:21 AM

Not really a surprise. In symmetric cipher use with state and no randomization, you have no redundancy and hence no space to leak keys or content. As soon as you go stateless, you can do thing like send packages with invalid checksums and the like and leak everything you like. And if you have, say, randomized IVs for CBC, that is also a nice way to leak things like keys.

Still, nice to have a more thorough analysis of this.

Benni June 24, 2014 11:37 AM

I think it is important that such kind of research now begins at universities.

Because that is how these services attack. They do not crack the raw protocoll and algorithm. No, their question is: if we change an algorithm by X how much faster is it then to crack. Since the agencies have experience at least since 1970’s with this strange science of algorithm subversion, one may assume they are a bit ahead of academics in this.

As far as I know, usually lectures in cryptography discuss the raw algorithm, and then the mathematics professor shows it is practically unbreakable and says “work done. finished”.

But no, its not finished, since the attacker may simply have bought the entire company that sells these algorithms, and then bribed the engineers to modify their work. So the original algorithm is not used in products but some subversion of it that produces similar data, making the user believe the original algorithm is in place.

We need to invent algorithms whose modification is easily detectable from the output of the encrypted code.

David in Toronto June 24, 2014 11:49 AM

I haven’t yet had time to read this, but I plan on doing so.

Having said that, has anyone categorized current algorithms as stateful or stateless in this way?

David

Gerard van Vooren June 24, 2014 12:31 PM

“The lesson that emerges is the danger of choice: randomized, stateless schemes are subject to attack while deterministic, stateful ones are not.”

That’s the problem I have with quite a lot of technology: The built-in upgradability.

Not only does it makes the technology harder to get it right but, as it shows in this report, it also weakens the security.

The TLS protocol for instance. Why not only have, let’s say, 2 preset configurations:

  1. Military grade. For banking and other payments, taxes, medical etc..
  2. Economical grade. For ordinary use on the internet.

In that case the protocol can be simple. And when a new crypto standard arrives, just update the TLS spec and give it a new version number.

Nick P June 24, 2014 12:44 PM

@ Gerard

That’s what NSA does internally so there’s precedent for it. Most stuff uses COTS VPN’s and such while highly sensitive stuff is supposed to go through Type 1 equipement (read: actually secure). However, NaCl shows very strong crypto can be very fast, easy to use, and easier simple to implement. I vote we just do SSL v4 as a single good standard like that and just hardware accelerate it in resource constrained devices. A subsidized I.P. core would help too.

Gerard van Vooren June 24, 2014 1:03 PM

@ Nick P

As long as Al Gore doesn’t hunt us down for being “not green”, having 1 option is better than 2. Way better.

NC June 24, 2014 1:17 PM

For this and lots of other reasons, I think there is a need for good “reference code” for all these security critical algorithms. Performance should be the lowest priority, only requiring the bare minimum to be usable. Instead, readability and “matching the spec” should be highest priority.

I’d like to see someone take all of the pseudo-code from Bruce’s Security Engineering and write working versions, preferably in a high-level language. The attempt should be to add a minimum of extra lines of code above what is described by the pseudo-code.

Using the reference code may not be practical for all production environments, but for many (where perf is not a concern), it might be. Even in the cases where it’s not practical to use it directly, every change/additional feature/bug fix/etc could be analysed as a delta against the reference. Even in the case where versions are written in lower level code, the exercise would merely be to insure that various lines of low-level code are 100% equivalent to some high-level code in the reference.

Wael June 24, 2014 1:29 PM

@Gerard van Vooren,

As long as Al Gore doesn’t hunt us down for being “not green”, having 1 option is better than 2. Way better.

Easy!
Send him on a hunting trip with Mr. sharp-shooter / marksman Dick Cheny 🙂

L June 24, 2014 1:34 PM

@Gerard

The built-in upgradability is exactly what saved the TLS protocol up to this day.

Adding new stuff always means danger of getting it wrong, but it also mean that we are not stuck with plain DES. Or stuck with some weak algorithms because they were too computationally intensive years ago, or because we didn’t know if the math was sound (think elliptic curves).

Also, your solution doesn’t remove the upgradability and it says nothing about downgrade attacks, and forces everyone to upgrade even the embedded devices, which doesn’t seem to be getting much attention today… at the end of the day, even having multiple versions of TLS requires clients/servers to support all of them.

The whole world will never be able to synchronize every few years on a different standard. Just look at the SSL/TLS situation regarding web servers and browsers… That’s why we need upgradability.

What we need is a protocol that is designed to be securely upgraded. Implementations will fail, of course, but not the protocol. It’s not difficult, TLS already does this to an extent, and this can be expanded in newer protocols.
Also, clear definitions in the RFC helps, too, so avoiding “should” and “optional” would help. It would have prevented the 3Shake OpenSSL attack, for once. If you think that “optional” is great in a security RFC, just look at the mess made by OAuth…

Anyway,
The paper assumption is that the encryption library is closed source, and there is code that somehow deliberately leaks the key.
The only thing that papers shows is that you can not trust closed source security software. And if you are paranoid enough, you should recompile your crypto libraries by yourself.

Gerard van Vooren June 24, 2014 2:09 PM

@L

I think the crypto advancements are not that fast that a protocol needs to be updated each couple of years. DES for instance has been known to be vulnerable for a long time. And with a new version of the protocol you can say that protocol version x.y.z is obsolete. It’s not that hard.

About embedded devices: If they don’t get updates they are probably consumer products and these don’t “live” that long (5-10 years max).

“Also, clear definitions in the RFC helps, too, so avoiding “should” and “optional” would help. It would have prevented the 3Shake OpenSSL attack, for once. If you think that “optional” is great in a security RFC, just look at the mess made by OAuth…”

To make myself clear: I prefer no options but sane defaults instead.

L June 24, 2014 2:32 PM

I think the crypto advancements are not that fast that a protocol needs to be updated each couple of years. DES for instance has been known to be vulnerable for a long time. And with a new version of the protocol you can say that protocol version x.y.z is obsolete. It’s not *that* hard.

Saying it is actually extremely easy. Getting everyone to do it, that is the problem.

We tried that. Today we have SSL3, TLS1.0, 1.1, 1.2, and are talking about 1.3.

Just look at the tables here: http://en.wikipedia.org/wiki/Transport_Layer_Security

Its was a mess until the latest browser versions, server support is even worse.

And we had big problems (beast, heartbleed, 3shake…), you would think that people would have upgraded by now. So bumping the protocol version, for both minor or major modifications gives us no real benefit, and actually makes the situation worse, as the application themselves must be upgraded to add support for the new version (and this is the part that gets ignored).

Having an upgradable protocol requires an update of the underlying library and nothing else.

And we actually need to change the protocol every few years:
SSL2: 1995
SSL3: 1996
TLS 1: 1999
TLS 1.0:2006
TLS 1.2:2008

It all depends on when the new bugs, attacks and defence mechanism are discovered. Can be as much a 7 years or as low as 1.

Gerard van Vooren June 24, 2014 3:20 PM

@L

A bit of history. SSL is from the Netscape era. At that time Netscape battled with IE, which IE won and that brought us IE6 and bugs galore. SSL (and X.509) was NOT a well designed protocol. Today however we don’t live in that era anymore but the inheritance is still painfully there.

That said, today we have IETF and knowledgeable security experts. The browsers are IETF standards compliant (there is only Trident, Webkit and Gecko) and they are all very well maintained. With websites, I agree, that is a problem. There should be some sort of automatic notification, maybe with colors in the address bar of the browser.

Still, if we get rid of options and use sane defaults, the protocol could be drastically simplified. Simple, well designed protocols usually have a long lifetime.

Iain Moffat June 24, 2014 3:21 PM

@NC: The other important aspect of reference code is that it can be used to compare outputs of candidate production code (even closed source) for the same input or decode a test data set prepared by another implementation – if the reference implementation is very strictly coded (so that anything unexpected is flagged, the opposite of the normal internet custom to ignore what cannot be decoded) then it may even detect some forms of key leakage in stateless implementations as non-decodable.

Many years ago I worked on early CCITT H-series multimedia protocol implementations and we always prepared reference code on our VAX in parallel with the DSP production code and cross tested, before going into compatibility tests with other vendors, so it is a well trodden path. Similarly we had independent Intel/PLM and M68K/C ISDN signalling code bases written by different teams and took advantage of that to cross-test at each release of the M68K code for regression. I think the open source world finds the temptation of software monoculture due to reuse an easier way to achieve compatibility but it is only a way to have compatible bugs and spread any inserted weakness, of course. We need independent free implementations rather than forks from a common base to ensure that a rogue implementation is easily spotted.

@L and Gerard: I think capability negotiation with a goal of the highest shared capability is necessary in any multi-vendor multi-user world where a “big bang” upgrade as could happen in defence or Government is impossible to organise. This is as true for encryption as for audio codecs. What matters when security is involved is that it is done in a way that allows the user to give informed consent to any downgrading. I realise that there is a difference between informing and understanding of course – there almost needs to be an “algorithm revocation list” but that opens all the same trust issues as PKI and Certificate Authorities. Capability negotiation only addresses protocol or algorithm versions of course – if the issue is a bug or side channel then the need is to update the implementation rather than the protocol and this has to be managed outside the communication channel. I see very little point in including application IDs in the capability negotiation – my experience with web browsers and servers is that implementations lie to improve the number and variety of remote systems that will talk to them.

Regards

Iain

Chris Abbott June 24, 2014 8:38 PM

I read that there was some vulnerability with the change cipher suite (CCS) feature in OpenSSL. What if you could just change the cipher suite to something silly like DES. I wonder if that vulnerability could enable something like this. Browsers probably don’t support DES though.

Thoth June 24, 2014 9:16 PM

@Chris Abbott
I always have a suspicion if someone can somehow intercept a CCS or cipher suite list sent by the client/server and declare that they can only use DES and even if the other party was using a TripleDES, they have to use option 3 keying (EDE mode) which is as good as DES and if that’s the case, that may indicate that the banks and payment industries who love to use TripleDES could have an issue.

@L
Regarding the use of closed source software, we are doing electronic transactions all the time regardless if we manually authorize it or not. Our iPhones cannot be recompiled unless you jailbreak it and void your own warranty. If we recompile and break our Androids, our phone warranties are also gone. Not everyone are technically savvy or are technically capable of determining the quality and correctness of codes. The relevance of that paper is highly important in an era where we are walking along the streets messasging to friends, making phone calls, typing blogs … etc … and we don’t really know what is running our electronics. Not all open source codes can be trusted either (OpenSSL/GnuTLS/Apple goto bugs and drama).

In banking and payment industries, you buy a HSM from a vendor (Thales, SafeNet…) and they come with a PKCS11 crypto library which is usually closed source. How are banks and users going to trust the security of the payments when their source codes and mechanics are closed source ? You don’t really have a choice but to use the close source libraries unless you want to make your own machines and libraries and that’s another story.

It’s about time we intensify researches on verifiable cryptographic processes, ASA processes and other processes that tells us that my input will always result in a deterministic output and nothing else.

L June 25, 2014 3:18 AM

@Iain
I agree, feature negotiation is a good thing.

@Thoth
I didn’t know closed source crypto was widespread even in bank transactions, I always assumed they demanded the code, so it could be verified… Well, I never dealt with them before, and I try to use only open source software even in my phone, so I didn’t think about that.

I didn’t mean to diminish the paper anyway, which I consider a good thing. I just wanted to point out how easier it is to trust open source software 🙂

Thoth June 25, 2014 6:11 AM

@L
It’s good to know that your bank transactions are always hidden and even the banks have very murky ideas on what they have purchased. To be honest, customers usually buy crypto related products just to complete their compliance audit and they would rarely use the full potential and many of them do not really go in-depth to find out what they have bought !

I am personally worried about bank transactions because they rarely care until something happens.

Thanks for trying to give the open source more credit that they really deserve. Open source codes are usually kept hidden in the background when bundled with products.

Do not be surprise that critical codes in big HSM providers uses stuff like GnuPG and OpenSSL libraries.

Mike Amling June 25, 2014 2:51 PM

On page 15, in the paragraph “A unique-ciphertext scheme”, the authors’ construct prefixes sigma onto the plaintext and encrypts it. Forgive my ignorance, but why wouldn’t it be sufficient to include sigma in A, the auxiliary data?

Mike Amling June 25, 2014 2:56 PM

Make that page 13 as denoted in the paper proper, which is the 15th page as counted by the Adobe Reader.

ripped Off June 26, 2014 1:32 AM

Thanks to the deliberate weakening of encryption standards by the NSA, rather than strengthening and encouraging robust online encryption and security standards to protect the public and their financial details, online fraud is set to break records again this year.

Banks are already seeing 500 million dollars on average in fraudulent transactions this year, reaching the entire amount for last year in just under 6 months.

HeartBleed may contribute a little to some of this fraud, as it is yet unknown the total amount of unpatched infrastructure, clients and routers, but a large part of the problem is due to problems with law enforcement not being informed by banks and the difficulty of enforcing laws against bad online businesses operating overseas. The lack of duty of care of customer data privacy and the lack of international co-operation on privacy law and customer data protection leaves everyone open to crimes of online financial theft.

Banks often allow overseas financial transactions without informing their customers that transactions are taking place in another country. Even worse, banks do not refund overseas financial transaction fees when fraudulent transactions take place. Banks should block all overseas transactions by default, leaving customers the option to enable overseas transactions for the particular country they are travelling to.

It’s a little obvious fraud is under way when a transaction is taking place on the other side of the world from the customer. Any related overseas transaction fees and the total amount of the fraudulent transaction should be immediately refunded by the bank, as this can only happen due to the banks own lax security policy.

q June 26, 2014 9:22 AM

The paper leaves out of scope one important consideration: plaintext malleability. The security model proposed by the authors does not consider the possibility of achieving the ciphertext bias attack by modifying the plaintexts before encryption in a way that’s not obviously detectable by the receiver.

For example, if encrypting simple ASCII text in a natural language, the substituted algorithm can tweak the text a bit by adding some spaces before EOLs, changing single space to two spaces, introducing a plausible typo here and there etc. If encrypting Unicode, it can use aliases, insert a zero-width space somewhere, etc. These are just examples; for any kind of plaintext there is a way to introduce hard-to-spot modifications. E.g. if encrypting an array of floating-point numbers, the substituted algorithm can toggle the lowest-significant bit of a number with small exponent and nobody would notice.

Of course it requires the substituted algorithm to know the nature of the plaintext, but it’s not an impossible requirement. In some cases, where encryption and plaintext handling are done by the same program, it’s trivial. Examples: encrypted notes app such as Tombo, password manager such as PasswordSafe.

Example attack on an encrypted text application:
0. Suppose the application uses deterministic encryption with N-bit key, where N=2^{n}. E.g. for 128-bit encryption n=7.
1. The attacker’s key, K~, is used to select n+1 bits of ciphertext: c[1]…c[n+1] (the specific selection algorithm is unimportant).
2. First n bits of the selected bit array, c[1]…c[n], are used to form a number i, which is used as a bit index into the user’s key K.
3. The last bit of the selected bit array is compared against i-th bit of the user’s key. If c[n+1] == K[i], the ciphertext is accepted (and transmitted, or stored, or whatever). If not, the algorithm adds an empty line at the end of the plaintext and tries again (goto 1).

This way, each ciphertext leaks one (randomly chosen) bit of user’s key.

On average, half of the plaintexts will have to be tweaked and re-encrypted.
The user probably won’t notice extra blank lines (heck, some of the editors I used tend to add blank lines at the end of text sometimes without any reason).
The eavesdropper, on the other hand, will be able to reconstruct the user’s key K after capturing approx \Theta{N*log(N)} ciphertexts (it’s a coupon collector’s problem; for N=128 it gives 695 ciphertexts).

But even in a more generic setting it can be achieved. E.g. the big brother B can attack network encryption using a two-part exploit: the “bottom” part will compromise OpenSSL, and the “top” part will inject into the web browser. The “top” part will record (or hash) all text entered into input fields and communicate it to the “bottom” part, which will scan the data being encrypted and tweak only parts it knows how to deal with.

I write this not to diminish the importance of the paper, but only to say that even if deterministic encryption as proposed by the authors is proven secure under their model, I’m not sure it’s possible to implement their model in the real world. It’s impossible to shield against plaintext malleability without placing plaintext creation AND authentication into trusted environment. And if you have a trusted environment, why not put the encryption there as well?

Thoth June 26, 2014 11:23 AM

@q
I am not sure if “trusted environment” actually exists anymore. I have been thinking of the problem on how to define “trusted environment” and especially, the case where cryptographic keys and passwords should reside.

If the community’s suspicion on the state actors’ involvement in cryptographic suppression and weakening are really deep into the production lines, hardware-based modules (HSM and cryptochip modules) may turn out to be insecure.

Most of these hardware crypto modules are a total black boxes where we have no idea what’s running inside them. The only hardware crypto module I have seen that made their modules transparent (see-through) is German privacy Foundation’s Cryptostick so that those who have knowledge of hardware mechanics could take a look at the transparent plastic casing and visually inspect the hardware for tampering.

All these secure algorithms need some form of secure medium to express themselves and if the state actors’ bugging and meddling are deep enough, it’s really hard to escape their dragnet.

So far, my conclusion is that if we stick to open hardware designs and the crypto hardware are wrapped in transparent tamper evident casings, this might give us some comfort.

The recent growth in the research of AEAD ciphers are suppose to tell us if encrypted messages are being tampered with and I am not sure if any AEAD ciphers can ever tell us if the ciphers themselves have been meddled (ASA attack).

If we can somehow marry the ability to detect tampering in plaintext messages (AEAD feature) and determinisitic encryption (ASA attack resilience), we maybe onto something that may tell us if someone is attacking our algorithms and/or messages.

ariel June 30, 2014 7:30 AM

I’m quite sure these ideas were well-known before – the paper seems a bit of an over-complication.

Quite obviously, encryption can be used to leak stuff iff it’s nondeterministic and the distribution can’t be sampled from:

1) Deterministic encryption, of course, can’t leak anything via the data sent (if anyone knows the key they can check you did the right calculation).
2) If the encryption is nondeterministic but is distributed via a small (brute-forcable), known distribution, you can sample from this distribution and check that the probabilities are the right ones (at time polynomial in the leak amount).
3) If the encryption is nondeterministic from a distribution too large to find repeats in, or repeats can be prevented via another way (state, a known nonce, etc.), one can do the following trick:
* Peek a “leak-channel” PRF f – the key is shared between the attacker and leaker.
* Let d be the data one wants to leak.
* Encode d via a binary symmetric channel with error probability 1/3 code to W.
This multiplies the length by less than 13.
* Let l be the length of W. Mark the bits of W by W_i.
* Let b(x) be the most significant bit of f(x), i(x) be f(x)%l.
* Now, when encrypting a message, pick 2 points from the distribution. Let
m_1, m_2 be the relevant messages. If you can, send a message with
b(m_j) = W_{i(m_j)}. You can do so with probability 3/4.
If, as assumed, there are no repeats, this is indistinguishable from random.
* The attacker picks up ln(4)*(l+1) messages. He reconstructs W_i by taking
the b-value of the first message received, or guessing if no such message is
received. This succeeds with probability at least 11/16.
* Then, the attacker uses the code to correct the data.

This allows message decoding in ln(4)16(n+1) messages, when n is the length of the data one wants to leak.

ariel June 30, 2014 7:33 AM

My latter scheme uses a quite stupid error-correction mechanism – but I think the bound of 18(n+1) messages is good (or bad) enough, and I’m not an expert on error-correcting codes.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.