Who Does Skype Let Spy?

Lately I’ve been thinking a lot about power and the Internet, and what I call the feudal model of IT security that is becoming more and more pervasive. Basically, between cloud services and locked-down end-user devices, we have less control and visibility over our security—and have no point but to trust those in power to keep us safe.

The effects of this model were in the news last week, when privacy activists pleaded with Skype to tell them who is spying on Skype calls.

“Many of its users rely on Skype for secure communications—whether they are activists operating in countries governed by authoritarian regimes, journalists communicating with sensitive sources, or users who wish to talk privately in confidence with business associates, family, or friends,” the letter explains.

Among the group’s concerns is that although Skype was founded in Europe, its acquisition by a US-based company—Microsoft—may mean it is now subject to different eavesdropping and data-disclosure requirements than it was before.

The group claims that both Microsoft and Skype have refused to answer questions about what kinds of user data the service retains, whether it discloses such data to governments, and whether Skype conversations can be intercepted.

The letter calls upon Microsoft to publish a regular Transparency Report outlining what kind of data Skype collects, what third parties might be able to intercept or retain, and how Skype interprets its responsibilities under the laws that pertain to it. In addition it asks for quantitative data about when, why, and how Skype shares data with third parties, including governments.

That’s security in today’s world. We have no choice but to trust Microsoft. Microsoft has reasons to be trustworthy, but they also have reasons to betray our trust in favor of other interests. And all we can do is ask them nicely to tell us first.

Posted on January 30, 2013 at 6:51 AM48 Comments

Comments

Jann Horn January 30, 2013 7:12 AM

And all we can do is ask them nicely to tell us first.

Well, at least for text chats, jabber with OTR works as an alternative…

Pradyumna January 30, 2013 8:33 AM

If someone indeed has a need to discuss secret matters on Skype (or for that matter, any public telephone network), I wonder if really would be all that hard to encrypt their voices themselves.

risk January 30, 2013 8:49 AM

I wouldn’t even trust the answers skype or MS would give. What is the military using? Im assuming its some version the NSA hardened

Simon January 30, 2013 9:12 AM

The NY Times has a good article about the gun control dichotomy manifested in Chicago. There are many larger security implications. What happens when the law says I go to jail for eavesdropping on my neighbor’s phone call but the gov’t gets to do, essentially, as it sees fit? Something very unexpected but very bad happens.

Spaceman Spiff January 30, 2013 9:20 AM

Which is why I no longer use Skype… There are other, better means for secure communication.

alanm January 30, 2013 9:32 AM

People rely on Skype for secure communications? Who are these people? They’re using a communications tool freely given to them by a giant multinational corporate with close ties to the US government and known to implement a buggy proprietary security protocol and expecting to get secure and private communications? What world are these people living in?

zev January 30, 2013 9:47 AM

Instead of using Skype for private calls use a free anonymous SIP 2 SIP account and X-Lite softphone and Zfone encryption or Jitsi softphone with encryption.

NobodySpecial January 30, 2013 10:00 AM

alanm – that’s pretty much true of any communications technology.
Imagine though if you aren’t a revolutionary but just an ordinary business.

If it’s only the NSA spying on you then it’s probably as secure as a cell phone – so OK unless you compete with the US govt or one of it’s friends (like Boeing)

If Microsoft is sharing data with other regulators then you probably shouldn’t use it if you are Boeing – microsoft may be giving it to the EU who will give it to Airbus.

If Microsoft are sharing it with their ‘partners’ then you probably shouldn’t use it if you are discussing patient records, patents or financial details of a listed company. It would just be nice if Microsoft told who (apart form the NSA) it was handing the data over to

dragonfrog January 30, 2013 10:34 AM

@Jann Horn

Well, at least for text chats, jabber with OTR works as an alternative…

It works if you don’t actually have to talk to anyone. How many people use Jabber + OTR? A few tens of thousands worldwide? Skype recently hit 45 million concurrent users (not user accounts occasionally logged into, users online at one time).

Zev January 30, 2013 10:40 AM

Earlier I said use SIP 2 SIP with X-Lite and Zfone or Jitsi. This is good but be careful of spyware. Spyware is the biggest threat to secure encryption and secure communication. It is best to create a live DVD-ROM with the OS and the VoIP and encryption software and then boot and run your computer with the DVD and to make calls from a laptop. Never use a desktop. Always use a VPN. Also be very careful with smart cell phones because it is easy to put spyware on them. I never use a cell phone. When I need to use a phone instead of my laptop I use a Wi-Fi only phone with softphone and encryption. There are many free or easy Wi-Fi spots in my big US city. P2P calls are free and anonymous and can be easily encrypted. When you use a Wi-Fi always change your MAC address. PSTN calls can also be made free and anonymously if you want to, but do not use them for private calls, just to order pizza.

RainForestGuppy January 30, 2013 10:45 AM

But what is different between Skype and the old copper telephony system.

The electro-mechanical phone is the locked down smartphone. The cloud is just the telephone exchange.

Who, apart from James bond or Derek Flint, could tell if a bug had been planted on their electro-mechanical phone. Did anybody phone up the telephone company to find out if anybody could listen in?

Government agencies have been able to intercept phone calls for years, ‘echelon anybody’, even the maintenance engineers could listen in, or the original phone phreakers. Why do you expect and demand perfect privacy on a free electronic system.

Surely this scare mongering amongst security experts is exactly the same as homeland security using the Al-qaeda bogey man to justify unnecessary and pointless checks.

@ihatevista January 30, 2013 11:11 AM

I’m surprised no one has mentioned starting their own VoIP server (Asterix Et Al) to manage their personal calls.
ENcryption is sometimes over emphasized for basic conversations. Learn how to sign-language and communicate that way using video calls…

Clive Robinson January 30, 2013 11:18 AM

@ Pradyumna,

wonder if really would be al telephone network), I wonder if really would be al that hard to encrypt their voices themselves

Yes iit is quite difficult for a number of reasons.

Firstly the human voice is full of redundancy as are most languages. You can actually take out vowels swap them with different vowels and add extra vowels and with little difficult understand what is being said. Likewise quite a few consonants can be swapped removed or be added with no loss of intelligerbility.

Then you can do similar things with phonems because many languages are actually phonem insensitive.

For instance you want to see just how bad it can be, take a section of spoken audio strip the envelop and use it to modulate a section of music which has been amplitude limited. Guess what many people can hear the words and understand them. You can improve things a bit by spliting the spoken audio up into five or six spectral bands and use the envelops to those bands to modulate the spectrum split music. Now your music realy speaks to you and such a system is known as a vocoder and various bands used it the one most remember is the end of “Mr Blue Sky” from ELO, where in the dying moments a ninstrument quite clearly says “please turn me over” (originaly it was the last track on side A of the “LP Record” Album.

So lets assume you just compressed your digitised voice track (A-Law coded 12 to 8 bit is very common and easy to do) then used plain oldd ECB encryption well a simple analysis of what is in effect a substitution cipher recovers the audio envelope fairly easily which as I indicated kind of gives the speach content away…

So you need a codec that not only compresses the audio very well, it also breaks the statistics up quite well. Having done that you need to pick an appropriate cipher mode that not only deals with the audio, it has minimal delay and is very robust to the sorts of data transmission error you get on the Internet…

But… Most systems these days use an audio codec that uses a variety of CELP encoding invented by guess who? None other than the NSA (yup US spooks R US) if you go to Wiki and just type ‘celp nsa’ it will bring you up a lot about it.

The point is that NSA are schizophrenic in their mission, one side is tasked with protecting US communications the other with breaking others communications. You have to ask yourself a serious question is this system actually backdoored in a way the NSA can use? And based on their history I’d say assume so unless you can prove otherwise (which might be difficult).

So encrypting spoken audio is actually very hard to do way way harder than encrypting ordinary data files…

@ alanm,

What world are these people living in?

Sadly the real world of this moment with many many countires governments not just controlling communications but technology as well.

As I’ve noted above actually developing a secure system for audio communications is actually quite difficult, much more so than cobberling a few snipits of code you’ve managed to find on the Internet in places that have not been blocked.

In fact many of these countries get assicistance from US UK and other EU companies who are not only quite happy to help these governments monitor such traffic, but also to place booby trapped code up on servers so that what you download there may not be what the website owner actually put up.

As a general idea a good inteligable and timely codec can reduce the human voice to around 4800 bits per second. Most conversations last for atleast thirty seconds. How long do you think it would take to leak a 128 bit key in the 144000 bits sent in 30 secs it’s in effect 1 bit in every 1125 bits sent. If this was hidden in a stream encryption system it would be very very difficult to find…

Sadly most of the people living in countries not in the first world have very limited access to anything remotly close to a modern PC. In fact due to the peculiarities of such things they are way way more likely to have access to a mobile or smart phone than a PC (North Korea being one such example)…

nobodySpecial January 30, 2013 11:28 AM

@RainForestGuppy – I think the fundamental difference is that it used to take a lot of resource to eavesdrop on a specific targeted person. Now it’s easier to eavesdrop on everybody and decide guilt based on statistics and associations.

There is also the much wider distribution of the results. The RIP/Patriot collected data doesn’t just go to MI5/CIA – it goes to everyone from the rubbish collection to the local school board

Glenn Fleishman January 30, 2013 11:55 AM

Skype makes interesting and tendentious arguments about whether it can spy. If you read its security briefings (including the advanced one), it claims that it secures the conversation by issuing certificates to each party that let them validate each other against a Skype identity. But it fails to ever discuss the fact that Skype (and Microsoft) use their own root certificates for this, which means that the Skype operators can create any valid certificate for themselves that would validate to the clients.

Skype is secure against outsiders, if you accept their largely unaudited security system (the last report is years out of date and partial), but it is 100% insecure against insiders who have access to the certs.

Johann Gevers January 30, 2013 3:01 PM

I second Zev’s and Alan Fairless’s recommendation to use Jitsi or SilentCircle for sensitive communications.

The problem with existing systems is that they are centralized. Centralized systems are inherently fragile because they have a single point of control and failure:

“Centralization makes systems fragile. Antifragile systems are decentralized.”
—Nassim Nicholas Taleb, in his new book “Antifragile”

In centralized systems, users are dependent on intermediaries. Users’ assets and information are controlled by a centralized provider—the old provider-centric paradigm.

Decentralized systems have no single point of control or failure—like the internet. No single person or entity can control a decentralized system.

Decentralized systems distribute risk across multiple intermediaries, or even eliminate intermediaries entirely. Control of users’ information and assets is shifted from providers to users, producing a user-centric and provider-independent experience.

Decentralized designs are inherently more reliable, secure, and private.

The weaknesses of centralized systems go beyond communication systems. They also apply to centralized financial systems, social systems, and systems generally, as Taleb shows in his book “Antifragile”.

We urgently need to replace fragile centralized systems with antifragile decentralized systems.

The problem is that decentralized systems such as peer-to-peer communication networks have traditionally suffered from a tragedy of the commons, because they have lacked a profitable resource-allocation mechanism. This has prevented decentralized networks from achieving significant scale. (Even the Internet is still too centralized, enabling governments to exercise undue control.)

This is why I co-founded Monetas—the world’s first decentralized system for financial and legal transactions. Monetas uses cryptography, federated and peer-to-peer technologies to enable, among other things, unforgeable transactions, profitable micro-transactions, and automated resource allocation for decentralized networks, which will enable the automated sale and provision of resources such as internet access, cloud storage, and telephony in a user-centric, provider-independent way with unprecedented security and privacy.

By transitioning to decentralized systems, the power balance is fundamentally altered, shifting power from providers to users—from a provider-centric paradigm to a user-centric paradigm.

Johann

MingoV January 30, 2013 4:33 PM

“We have no choice but to trust Microsoft.”

Baloney. Anyone who trusts organizations that have to kowtow to unknown governmental requirements is naive. There are alternatives to Skype. There are alternatives to land lines and cell phones. There are alternatives to plain text e-mail. People who need secure communications should find and use such alternatives. If you won’t work a little harder or shell out some money to gain privacy, then you really don’t need it.

Clive Robinson January 30, 2013 5:06 PM

@ Johann Gevers,

“Centralization makes systems fragile. Antifragile systems are decentralized.—Nassim Nicholas Taleb, in his new book”Antifragile”

I’ve not read the book but, as you’ve described it Taleb has it wrong.

The fragility of systems is not primarily related to whether they are centralized or decentralized.

It’s primarily related to the number and topology of the communications paths between nodes and secondarily the ability of the nodes to recognise and either issolate or work around points of failure. It is only at the tertiary level that the redundancy or duplication of information and control in nodes has a bearing on it’s fragility.

The problem with a fully decentralised system is that it is grossly inefficient due to the required level of communication i.e. each and every change has to be communicated with every other node and this has a quiternary effect for which no realistic solution has been found, which is as nodes change how do they ensure that the information in every other node is correct. The only general solution is that only one node can change at any one time and this has to be “rollback” proofed usually with a three phase commit protocol which is even more inefficient as it tripples the communications involved each of which has a path length delay issue related to the longest path. Which encorages a centralized system which “rippless out”.

The reason traditional “pyramid” hierarchical systems are fragile is that in general none of the communications paths are duplicated and they are organised for “efficiency” into a tree structure that is general balanced. Such systems generaly encorage centralisation at the root node simply because it’s generaly the shortest distance to all nodes in a balanced tree.

It can also quite easily be seen that a decentralised system based on a balanced tree structure ranges from compleatly inefficient through to extremly fragile depending on the redundancy at the nodes (ie each nodes stores all information related to all nodes to each node only stores the information pertinent to that node and no other).

Such systems can be shown quite easily to be more fragile and more inefficient than a centralised system on a balanced tree structure…

I guess the question is, is Taleb wrong or have you misread what he has written?

RickD January 30, 2013 5:23 PM

Not using Skype is a great idea – until you start dealing with tech illiterate types. My sister, for example. Granted, we have no need for secure conversations, but it would still be nice to know we’re talking in private, especially since she lives in Mexico so the PATRIOT Act says the NSA can eavesdrop on us without a warrant. But getting her to use Skype was hard enough, any replacement would have to be as simple to use.

Carl 'SAI' Mitchell January 30, 2013 5:26 PM

Skype is very convenient, but I never disclose sensitive information over it. As with any communication over the internet it’s safest to assume that everything is public, and anyone can view it.

Even if you can verify the encryption software you’re using, and know you have strong keys, do you trust your compiler maker not to have a backdoor? How about the OS? The CPU manufacturer? The motherboard manufacturer?

Face to face communication in private in an area unlikely to be bugged is the only really secure system, and even that’s not guaranteed perfect. Very few things ever need that much security though, so unless you’re part of the NSA/CIA/Mossad/etc you probably don’t have to worry about a backdoor in your CPU.

PGP is pretty good privacy, not perfect privacy. Skype is piss poor privacy, but many, many things it’s used for don’t need privacy.

Ari Maniatis January 30, 2013 5:48 PM

@Clive Robinson

Firstly, schizophrenic doesn’t mean what you think it does. Perhaps you were aiming for ‘bi-polar’.

Secondly, many of the issues with redundancy in voice communications creating opportunities to recover the key can be solved by applying good compression before encrypting. G729 is one of the common compression algorithms and it will reduce a ulaw voip stream to one eight of the bandwidth. Since it compresses in 10ms packets, there is still going to be lots of repeating predictable plaintext input to the encryption, but it is likely to be a big improvement over encrypting just the plain voice stream.

Scott Ferguson January 30, 2013 7:05 PM

@NobodySpecial I suspect it’s a mistake to measure the risks of using Microsoft products purely on company motivation. Even if trusting Microsoft wasn’t a case of optimism overcoming experience, there’s still the issue of company security.

To those that propose jitsi….. gee I’d love to trust it but I’ve got a fundamental problem trusting anything the requires Java.

I also recall reading some research done by Google into the ability to search and index encrypted VOIP which agrees with what Clive says.

@Clive Robinson Nice summation of the problems. My understanding is that there is no secure method of VOIP, unless you’re happy with slight delays and non-duplex. I’d be very interested in your opinion.

Nick P January 30, 2013 9:43 PM

Re Skype

The ultraparanoid blog gave people all the reason they need not to trust Skype years ago. Of course, it’s security architecture is reason enough from a more conventional risk standpoint.

https://ultraparanoid.wordpress.com/2007/06/19/why-skype-is-evil/

Re Secure VOIP Done Right

Actually, the government tells you how to do it. Google for the defense report on Voice over Secure IP, Secure Voice over IP, etc. The nice thing about secure voice is that it’s easy to implement with the red-black model. High assurance community has decades of experience with that. My takeaway from all their research is that there is a simple way to do it:

  1. Use whatever VOIP system you like.
  2. Force all traffic between the two nodes to go through link encryptor (e.g. IPSec VPN).
  3. Link encryptor should be a dedicated, very hardened advice. (OpenBSD at minimum, type 1 ideal)
  4. Remove all unneeded code from that device.
  5. The Black IPSec packets should be fixed length and each node constantly sends packets to other at fixed speed. These defeat covert channels.

And this is the basic secure communication system that I devised years ago. It’s still better than everything else out there in terms of security. And it supports every point-to-point-able feature of VOIP. And supports all other P2P application protocols. That’s called security ROI.

Look up the Micro-SINA VPN for a more modern building block than OpenBSD. The TCB is ridiculously small. I called for physical separation of comms stacks from crypto core & non-DMA connections in the Transaction Appliance discussion on this blog (see link). That might be applied to the VOIP system. Hardened OpenBSD on ARM/PPC for comms, crypto on VIAPadlock/PPC/MIPS. You people have fun with that. 🙂

Side note: ZRTP looks nice. The guy who made it knows what he’s doing. It uses some components of successful past technologies. Run it on a robust OS or dedicated partition for best results.

Re suggested skype alternatives by other commenters

Thanks for the suggestions. I’ll re-review Jitsi in the future. I especially like that one of the links used Psyc. It was one of my favorite IRC alternatives and I hate Jabber with a passion. This was another case where the inferior protocol became the standard. The complexity of Jabber means I wouldn’t trust it for anything security-related by itself. The online service another commentor mentioned is interesting and might get my attention later. I’m always happy to see innovation in the INFOSEC product space.

@ Clive and Gevers

I think the fact that we’ve built more reliable and secure centralized systems than decentralized systems doesn’t help the author’s hypothesis. I agree with Clive about communication paths and resiliency features. I’ll add another aspect to the discussion though: control. The architecture might be centralized or decentralized. How it’s designed, operated and maintained will have the greatest impact on its resiliency.

The best methods will have a bit of centralized control with decentralized contributions. My hypothesis proved out initially via the “benevolent dictator” model seen in open source projects. The development is decentralized. The internals of the system are modular. However, there is a central group of people who make the final call about what goes in and what doesn’t. The underlying beneficial attributes seem to be diversity, many eyes, and increased productivity.

Nick P January 30, 2013 9:46 PM

@ Ari Maniatis

“Firstly, schizophrenic doesn’t mean what you think it does. Perhaps you were aiming for ‘bi-polar’.”

I agree. I’ve always said multiple personality disorder. That the various conflicting behavior is the result of competing individuals/groups’ ideas seems to make it a very accurate metaphor. For NSA, you have the IA Directorate trying to protect everything and the other one trying to spy on everything. There are also groups in groups. This is government and large formal organizations in general.

nomnomnom January 30, 2013 11:18 PM

“People are aware that Windows has bad security but they are underestimating the problem because they are thinking about third parties. What about security against Microsoft?

Every non-free program is a ‘just trust me program’. ‘Trust me, we’re a big corporation. Big corporations would never mistreat anybody, would we?’ Of course they would! They do all the time, that’s what they are known for. So basically you mustn’t trust a non free programme.”

“There are three kinds: those that spy on the user, those that restrict the user, and back doors. Windows has all three. Microsoft can install software changes without asking permission. Flash Player has malicious features, as do most mobile phones.”

“Digital handcuffs are the most common malicious features. They restrict what you can do with the data in your own computer.”

by Richard Stallman @ http://www.newint.[org]/features/web-exclusive/2012/12/05/richard-stallman-interview/


CIA Head: We Will Spy On Americans Through Electrical Appliances

[www].prisonplanet.[com]/cia-head-we-will-spy-on-americans-through-electrical-appliances.[html]

Clive Robinson January 30, 2013 11:37 PM

@ Ari Maniatis,

Firstly, schizophrenic doesn’t mean what you think it does. Perhaps you were aiming for ‘bi polar’.

Unfortunatly it depends on who’s definition you use…

If you Google the various online Dictionaries you will find they have several definitions in each dictionary (as is normal with most words).

Importantly though is the one they mainly have in common under general usage of the word,

(in general use) A mentality or approach characterized by inconsistent or contradictory elements

Which I think we would probably agree is one of the NSA’s defining characteristics.

But… As I think you mean with “schizophrenic doesn’t mean…” it is not the definition that is used in clinical diagnoses. Which is amongst practicioners found in ICD as recomended by WHO.

One of the joys of ICD is every so often they update it and sometimes (actually often) they change clinical definitions quite radicaly and this has knock on effects [1].

Whilst it might be proffesionaly prudent to change definitions, as we know “words have a life of their own” once in the general publics hands [2]. The fact that the practicioners who think they own the words they coin get upset/frustrated/whatever by what the public do with them is just one of the reasons George Orwell came up with his ideas on Newspeak being used to conttrol people by jargon.

Thus I respectfully submit that in our own ways we are both right and thus we should just say “hang the lot of them” to those who disagree with us (however I won’t hang around for your answer 😉

[1] In the UK we have a statute which is called the Autisum Act 2009 by many people, it was drafted back a few years before that, when ICD had a very broad set of diiagnoses fall under what was the Autism Spectrum Disorders (ASD) however the latest ICD has changed all that and moved a number of things into their own catagories etc. What is not clear is what legal repercussions this will have long term.

[2] In ITSec we have been forced to change our meaning of “Hacker” to that of the general public even though we had a perfectly good word that we had coined “Cracker”. And I remember this being debated on this blog, where people had decided the lackadasical Journalists had by their termilogical inexactitude for ever cursed us into a different meaning and that we should just acquiesce lest we look ridiculous.

Pradyumna January 31, 2013 2:30 AM

@Clive Robinson,
Thanks for the insight into this. I now realize the problem with it.

Re: “So you need a codec that not only compresses the audio very well, it also breaks the statistics up quite well. Having done that you need to pick an appropriate cipher mode that not only deals with the audio,
it has minimal delay and is very robust to the sorts of data transmission error you get on the Internet…

Richard Birenheide January 31, 2013 2:43 AM

@Clive Robinson,
Re: encrypting voices:
If I get your statement correctly the problem is that voice doesn’t carry enough entropy. Couldn’t this be countered by:
1. Compressing the voice
2. Generating a one time pad with the length of the voice snippet
3. X’oring both
4. Transmit both encoded voice and one time pad over an encrypted line (such as https)
Or is this not feasible for reasons I don’t see (eg. require more bandwidth)?

Pradyumna January 31, 2013 3:07 AM

@Richard Birenheide,
The problem seems to be with encrypted transmission of real-time voice where there are dual constraints of delay and bandwidth, not with encrypting stored voices such as a piece of recorded music.

You can have either a bandwidth-efficient, good quality plaintext voice stream, OR a bandwidth-inefficient, good quality encrypted voice stream, OR a bandwidth-efficient, poor quality encrypted voice stream. But you can’t have a bandwidth-efficient, good quality encrypted voice stream.

Clive Robinson January 31, 2013 4:04 AM

@ Ari, Scott Ferguson, Nick P,

All gentle jesting about word meanings aside, on to the technical asspects where in the case of ITSec every word does matter including those squirreled away behind generalisations and assumptions.

Ari you said,

Secondly, many of the issues with redundancy in voice communications creating opportunities to recover the key can be solved by applying good recover the key can be solved by applying good compression before encrypting

I was not talking about “opportunities to recover the key” that should be with a good crypto algorithm not possible except by chance or other failings in the implementation (one such being poor prime selection in PK Certs).

I was talking about the ability to recover inteligence from the cryptotext directly from the incorrect choice/use of codecs and crypto modes, which is an altogether different form of attack which is not much talked about these days on the assumption that it’s “old news” belonging to the days of “Substitution Ciphers”

Now I realise from your subsiquent text you were not actually talking about “key recovery” but “intelligence recovery” but I suspect other readers will ignore the mistake or take it as read depending on their experiance.

As Bruce has found in the past over AES key issues assumptions about who’s reading can be the equivalent of target practice at your foot. As critics pull you up more for what you don’t say than for what you do 🙁

So I apologise if I sound like I’m being pedantic (but I quite like my toes 😉

But with voice encryption “The devil is in the details” rather more than it is with other forms of encryption and it’s incredably easy to miss a step in the process, and that could be disastrous for security.

As Scott has indicated,

I also recall reading some research done by Google into the ability to search and index encrypted VOIP…

There are some very real and actually quite simple attacks that can be performed against voice encryption systems if not properly implemented. Which allow speakers to be identified, the language spoken and the type of content of the conversation.

The first problem is as I noted the soken word it’s self, for something that requires 144,000 bits per second raw digitized data rate to convey information at around 10bits per second [1] that’s one heck of an amount of redundancy.

Whilst the data rate terms abound in communications and computing [2] in information theory we tend to only talk about the information content in abstract and use the fundimental data element of a bit. This was due in the main to the work of Claude Shannon carried out around WWII. Shannon partialy defined and charecterised what we now understand as a communications channel [3] and came up with the term entropy [4] for the redundancy in information in the channel which he borrowed from thermodynamics.

Most codecs seek to reduce the redundancy and thus get the bit rate required down. However other codecs actuall try to match the charecteristics of the voice to the channel which although it usually gives compression can give a vastly improved error rate or reduced latancy etc. Thus care has to be used in codec selection or other mitigating techniques used.

One such mitigation used in the early days was adding synthetic noise to the audio signal to fill out quite spots and make the speach envelope less apparent. This has a significant number of problems so is not usually used these days.

The major reason for this these days is that tailored synthetic noise can actually be used to significantly reduce the resulting bit rate from a codex. In essence this is what the CELP based codex algorithms do, so adding further synthetic noise at their input is likely to degrade their performance.

Another mitigation is data expansion by whitening to the codec data output, this has two benifits, firstly it makes many attacks more difficult and secondly it spreads the energy in the channel which is usually desirable, the obvious downside is the data rate goes up. This can be easily achived by a simple 3-4 bit expansion that is you take 3bits of codec output and add a fourth bit that is unpredictable to an evesdropper. In essence it’s a little like adding bits from a stream cipher generator to pad out the data to make it more random. At the other end the recivier knows which bits to strip out so does not actually need to know what the bit values are.

However the same effect can be achived without expanding the data simply by xoring the data with the whitening stream. It does however make synchronisation harder and requires the receiver to know the whitening stream. This is in effect a stream cipher operation for which many stream ciphers are available. However as it is done prior to the main encryption it does not need to be a secure cipher therefor a simple LFSR of say 8bits can be used.

But there is another issue with codecs not yet mentioned and that is the quality of their data output verses the bit rate.

A codec can be either fixed rate or variable rate, variable rate codecs produce better reduction in redundancy but have the disadvantage of having their output rate correlated with thae audio input in some way. Whilst fixed rate codecs don’t correlat their bit rate to the audio, their data output is quite clearly correlated with the audio input thus in effect it produces a predictable pattern of data where say it produces all zero data output for times when the speaker is not voicing or uttering. Whilst the corelation is usually not that simple the statistics are, so the speach envelop can be easily recovered if the wrong cipher mode is used.

Of the two it is technicaly easier to use a fixed rate codex and whitening. However variable bit rate produces less data overall and usually does not require whitening providing the packet packing is done correctly.

There are two basic ways to put a variable bit rate output into data packets. The hardest to do is conncatination, but this results in variable rate data packets which is not good, the easiest generaly is “bit stuffing” although this does provide fixed rate data packets it has the side effect of increasing the overall data rate and to a small degree add extra latency.

Thus in realtime systems where latency is important variable rate encoders are best given a miss.

Which brings me onto the points Nick P made specificaly,

3. Link encryptor should be a dedicated, very hardened advice.

5. The Black IPSec packets should be fixed length and each node constantly sends packets to other at fixed speed. These defeat covert channels

There are two assumptions in there which can kill the security of a system stone dead.

Firstly whilst point 5 is correct you need to be aware of the differences between “data stuffing”, “packet stuffing” and “link stuffing” and where and how you apply them. Get it wrong especialy with the wrong encryption type and mode and you will alow the evesdropper to recover intelligence.

Further as the stuffing process adds redundancy, if badly done it gives known data in fixed points of the transmission data stream, this gives the possability of performing “bit flipping” attacks against stream ciphers and other types of attack against block ciphers esspecialy if it coincides with the block width, at the very least it’s easy to see how it would half the brut force search time for one known bit and by a power of two for each successive known bit

Second point 3 is correct but insufficiently stated. Thus the assumption that many people will make about the “Link Encryptor” is that is actually the point to point or application level encryption. It’s not it’s in addition to the application level encryption. Further having got it wrong they will use something like a crypto library to make the application encryption or link encryption or both themselves. Almost invariably they will use it incorrectly. The operation of the application encryption and the modes used are speciffic to voice encryption, likewise the operation of link encryption are speciffic to link encryption and are usually quite different.

One thing that often goes wrong is the developer will find that using a block cipher can add considerable latency to the audio especially if not used in ECB mode. I don’t have the space to do the subject justice (you need a large book) so all I will say is there are many trade offs with cipher systems and many gotchers both of which will wreak your security margins. So you realy need to know how your whole system works from the microphone onwards before you sellect operation type and mode for the various encryption stages (of which you should have three, codec data whitening, application encryption and link encryption).

Then not mentioned so far is what do you do on an error…

My simple advice is fail hard and fail long prior to link encryption to stop system transparency issues opening up side channels and have fast self resyncing modes on the link encryption. But… one of the problems with self resyncing cipher modes is they can be abused effectivly in ways that reveal intelligence.

And there is the Press To Talk (PTT) dilema, when using half duplex communications users talk in turn. Each time they “key up” to start talking the cipher systems have to resync. It is very very easy to get this badly wrong. Which is why “open mic full duplex” often appears very attractive to designers. I would urge them to actually work out how to do PTT correctly as this then makes error recovery more secure etc plus it is kinder on what is the VPN created by the link encryptor, and also alows the transmission of other data such as “whiteboard data” as well.

Voice encryption like many real time encryption systems can be very difficult to get your head around. There are all sorts of “Gottchers” waiting to be like “the croc in the swamp”. There are whole books that have been written on the subject and likewise for just small parts of the subject. Much that is in the public domain about VOIP and Mobile communications goes out of the window when you start talking about Secure Voice Comms which means there is a whole body of knowledge that is right for one domain (VOIP) and wrong for another (SecCom, VPN) but no easy way to distinguish which bits are transferable and which bits are not…

My advice is as given by Nick P’s point use as many off the shelf parts as you can in a strongly layered aproach. That way you have less development work. But make sure you configure things correctly at each layer for what you are doing. That is with VoIP pick the right codec for encryption not transmission, whiten it correctly, and then encrypt it correctly. You can do the whitening and encryption as a push in software tunnel similar in idea to using stunnel on a loop back interface. Finally ensure that your VPN link encryption is correctly configured to link stuff to prevent the problems problems Nick P identified in his points list. And as always test thoroughly the test again when you break each bit in turn to find out the behaviour under fault… Oh and most importantly keep the user out of the loop they will break things in more ways than you can imagine, such as forgetting to switch from red to green mode and using the wrong keys etc etc etc.

[1] Iit is difficult to define the amount of redundancy in speach as it varies depending on what asspect you are trying to determin (actuall words, accent, identity, mode of speaking and thoughtfulness etc). For just the words we know from analysis of copious quantities of written text that English convays aprox 1.4bits per letter. Further we know the average word length for simplified “plain speak” is five letters per word, which is usually spoken at a maximum of four words per second and more normaly half this in ordinary conversation and about 1.33 words where clarity of meaning is being convayed. Thus 1.4 x 5 x 1.33 = 9.3 b/s to 1.4 x 5 x 4 = 28 b/s giving the aproximation used for effective millitary radio telephony of 10 bit / sec.

[2] In communications we frequently talk about the number of sent symbols per second (BAUD) which is then multiplied out by the number of bits of information per symbol to give the “bit rate” in bit’s per second (pbs or b/sec). However in information theory we general only talk about bits of information, whilst in computing we usually use Bytes and confusingly talk about data transmission in Bytes per second (Bps or B/Sec). Comms engineers frequently tend to be quite lax about the usage of baud, bps, b/psb Bps, B/ps and this has rubbed of backwards with the “specmanship” of modem manufactures in the 1990’s.

[3] Shannon’s original channel consisted of a transmission source and a receiver for the transmission, in the channel was a noise source which added uncertainty to the transmission and the channel also had a bandwidth which would by the process of Inter Symbol Interferance (ISI) limit the capacity of the channel to a theoretical maximum for the channel. However by exploiting the redundancy in the information being sent it is possible to compress data and thus appear to break the Shannon channel limit which in fact you are not [4]. Since Shannon’s original channel other atributes have been added such as delay, susceptability and emissivity to account for other issues such as nonlinear channel charecteristics and the actions of third parties with access to the channel in some form.

[4] Shannon defined redundancy in the channel as being the difference between the maximum channel capacity to carry bits of information and the actual information sent from the transmitter to the receiver. Thus a communicattions protocol might have start bits stop bits and parrity bits, they actually do not convay usefull information over and above the seven data bits thus although 10 bits are in the channel only 7 bits of information are transfered. However in Information theory the usuall meaning of redundancy gets turned on it’s head and becomes a meassure of possability, thus the more redundancy in a system the greater the possability it has to convey usefull meaning, Shannon thus chose to call it entropy not redundancy to try and distinquish the possability meaning from the more normal waste meaning of redundancy.

Random Cryptographer January 31, 2013 9:42 AM

Hi everybody.

Sorry if I’m stupid, but can’t you simply add a fixed-rate link encryptor?

That’s it, create a FPGA/simple processor (seperate from the main computer), that takes the packetstream from an insecure p2p phone program, and every k milliseconds AES-GCMs (which can be done very quickly and low-lantencily on a FPGA) the last n bits and sends them over the insecure channel, padding if there were less then n bits transmitted, and on the other end unAES-GCMs the stream and sends it over to the application. Of course if the stream is too much variable-rate there is a glitch-bandwidth tradeoff so we need a bounded-rate codec for the insecure program – added latency shouldn’t be too much because GCM is quite fast and parallelizable.

Nick P January 31, 2013 11:58 AM

@ Random Cryptographer

See my post for a simple design to shortcut the whole issue and Clive’s comments on it. He noted that you can implement it at the software layer via tunneling, although it has increased risk. Alternatively, license the Cryptophone stack and use it. They have a free Windows client available too if you want to study the implementation. Just don’t steal it.

@ Clive Robinson on secure VOIP

“Further as the stuffing process adds redundancy, if badly done it gives known data in fixed points of the transmission data stream, this gives the possability of performing “bit flipping” attacks against stream ciphers and other types of attack against block ciphers esspecialy if it coincides with the block width, at the very least it’s easy to see how it would half the brut force search time for one known bit and by a power of two for each successive known bit”

I honestly don’t know what effect that might have. IPSec has integrity protection. Hopefully, the attackers raise enough alarms to shut the channel down in time.

” Thus the assumption that many people will make about the “Link Encryptor” is that is actually the point to point or application level encryption. It’s not it’s in addition to the application level encryption.”

Actually, my baseline design was totally link encryption. The encryptors themselves were essentially the communication nodes (think like phones). They would be plugged in, turned on, told who to call through a trusted path, and establish a connection safe for real-time voice. On client side, scripts would automate this so it’s just mouse clicks or simple commands for the user. The setup happened transparently and an expert (theirs or me) configured how.

I can’t see a realistic attack on this w/out bypassing the trusted endpoints or cracking the crypto. Both were unlikely. Now, if all requirements could still be met, I’m all for doing both the app- and link-level protection. However, my actual design made the encryptors the point of trust for simplicity. The authentication built-in adds to its securtiy, but that’s not key here. An additional benefit of the link-level choice is that the user picks their own VOIP client and its totally untrusted.

“One thing that often goes wrong is the developer will find that using a block cipher can add considerable latency to the audio especially if not used in ECB mode. ”

Maybe in the old days or for microcontrollers. Cryptophone’s people found that the voice compression was what affected performance the most. The crypto didn’t cause much performance problem even though they used two ciphers in parallel. I mean, realistically we are encrypting around 128-256Kbps of data a second on processors that do MB/s or have acceleration. It’s not problem. Now, we need compression acceleration.

“And there is the Press To Talk (PTT) dilema, when using half duplex communications users talk in turn. Each time they “key up” to start talking the cipher systems have to resync. It is very very easy to get this badly wrong. Which is why “open mic full duplex” often appears very attractive to designers. I would urge them to actually work out how to do PTT correctly as this then makes error recovery more secure etc plus it is kinder on what is the VPN created by the link encryptor, and also alows the transmission of other data such as “whiteboard data” as well.”

Definitely. Rekeying isn’t as bit a deal to me. I see few to no attacks happening because of rekeying so I do it less often. The PTT problem for voice security is quite real. I have two options on that one:

  1. The fixed length, fixed timing to eliminate it totally with performance and bandwidth side-effect.
  2. Fixed length and transmit only when someone is speaking. This tells the enemy when people are speaking. The proper stuffing hides the specifics of what they’re saying entirely or mostly. They might piece things together. The user should know the risk fully before agreeing to this. Plus, it shouldn’t be the default or it will REMAIN default. 😉

“Voice encryption like many real time encryption systems can be very difficult to get your head around. There are all sorts of “Gottchers” waiting to be like “the croc in the swamp”. ”

Indeed. I still haven’t figured all of it out. That’s why I cheated and tunneled the traffic at app or link level. 😉

“My advice is as given by Nick P’s point use as many off the shelf parts as you can in a strongly layered aproach. That way you have less development work. But make sure you configure things correctly at each layer for what you are doing. That is with VoIP pick the right codec for encryption not transmission, whiten it correctly, and then encrypt it correctly. You can do the whitening and encryption as a push in software tunnel similar in idea to using stunnel on a loop back interface. Finally ensure that your VPN link encryption is correctly configured to link stuff to prevent the problems problems Nick P identified in his points list. And as always test thoroughly the test again when you break each bit in turn to find out the behaviour under fault… Oh and most importantly keep the user out of the loop they will break things in more ways than you can imagine, such as forgetting to switch from red to green mode and using the wrong keys etc etc etc.”

Concluded very well, Clive.

Zev January 31, 2013 1:36 PM

More security suggestions.

The most important rule for communications security is to be invisible. In an article regarding anti-computer-forensics the author said “Make it hard for them to find you and impossible for them to prove they found you” and I think that’s a perfect explanation of what your goal should be.

For voice communications you can get many SIP accounts free and anonymously and use a new one for every call if you want to. Always use VPNs, free wi-fi, and free VoIP software with encryption, and always change your MAC address and all other identifying information. Always use an anti forensics strategy to negatively affect the existence, amount, and quality of information to make the analysis and examination of that information difficult or impossible to conduct. Make all VoIP calls of private information completely anonymous. Don’t mention any identifying information, and if you are concerned about voice identification you can also use a voice changer or text reader. All of this sounds criminal, but it is not a crime YET to have a private communication with friends. It is terrible that we now need to take these types of precautions to prevent the govt and criminals from spying on us. It has become apparent that many governments are now recording all of our digital communications and will probably archive that information for many years.

For maximum available security it is best to communicate really secret information in writing, and securely encrypt it and send it in a way that would make it impossible for anyone to prove that you wrote the document or that you sent it.

I suggest that you encrypt the private letter first with open PGP, and then put that document in a BestCrypt encrypted folder. BestCrypt can be encrypted and decrypted with PGP keys or with a passphrase. Most important BestCrypt enables you to remove the header from the encrypted folder. As you know, it should be impossible for an attacker to decrypt an encrypted file without having the file header.

Then send the BestCrypt folder through one anonymous method. Never use email. Also put the header from the BestCrypt folder into a folder and encrypt it with a PGP zip, and send it through another anonymous method. If you are using a passphrase instead of a PGP key for the BestCrypt folder, then put it into another letter and encrypt it with PGP and send it through another anonymous method.

Then it’s extremely important to always securely delete all PGP keys and passphrases after using them one time. If you don’t have the PGP private key or passphrase, no one can get it. Also securely delete the BestCrypt folder and the copy of the detached header.

If you are using your computer to work with really private information you should never use your computer hard drive or any other persistent memory device.

I think that it is best to use a live Linux DVD-R to boot and run your computer. It is also best to remove your computer hard drive to prevent any temporary files being placed on your hard drive. I know that Linux can be programmed to not create a swap space on the hard drive, but I would never trust that this was completely secure. As you guys probably know, it is very time consuming to search a hard drive for hidden data, temporary files and traces, and deleted logs and to securely delete it. I would not trust free space wipers to securely delete all hidden files.

If you are worried about someone kicking your door open and grabbing your computer, then remove the computer hard drive and the battery. If someone kicks your door open, quickly unplug the computer. If your computer is using DDR3 RAM all data in the RAM should be securely deleted very quickly after the power is cut. Certainly usable data would be deleted.

If you must use your hard drive for very private information, then read a book on computer forensics and forget about sleeping. Change time date stamps before creating files, and always encrypt and then change the file extensions. Never have a “.pgp,” “.gpg,” or “.tc” file extension showing on your computer. A forensic technician probably would not spend much time searching your hard drive they would use software instead, but just in case someone personally looks at your hard drive also hide secret files deep inside something like a system file or application file. But always realize that if secret data is on your computer it will be possible for a skilled forensic technician to find it.

It is easy to obtain offshore data storage free and anonymously today. Consider Mega. Don’t trust their encryption, use your own open PGP and BestCrypt which you encrypt on your computer.

It is best to use the live Linux DVD Ubuntu Privacy Remix to create PGP keys, and to encrypt and decrypt files.

Good luck, stay private, and stay safe.

Alex January 31, 2013 7:39 PM

“We have no choice but to trust Microsoft”

I don’t know that I’ve seen scarier words uttered on this Blog in some time.

Johann Gevers February 1, 2013 10:45 PM

@ Clive Robinson and Nick P:

Good comments, and agreed that the dynamics of centralized and decentralized systems are of course more complex than is apparent from my post. Yes, there are advantages and disadvantages on both sides, so it’s a trade-off, and the best solution in any given context will almost always have both centralized and decentralized elements.

Nevertheless, I think my point stands that many of the issues we see today are the result of excessively centralized systems. Compare, for example, the central planning command-and-control model for a country’s economy and society vs the decentralized spontaneous order model. Or central banks vs Bitcoin. Or the trend away from rigid hierarchies towards loose horizontal networks of self-managing individuals in organizational design.

Clive, the phrase I used to describe Taleb’s position is my simplified (perhaps too simplified) “nutshell” of his arguments—not a literal quote. (I shouldn’t have used quote marks.) His position is more nuanced, as you’ll see if you read his book.

genericanonymousname March 1, 2013 2:16 PM

all i want to say is your encryption is cracked
your uncracked encryption is flagged
anything you cant say on a open public service like skype wont matter, voice recogniton is all around us and logs every word we say and think to a point, social networks monitor our every behavior, we are tracked to FEET anywhere in the us via our personal transponders, cameras always on protecting us, you have all seen what is available to US at this point, imagine whats available to those who made the tech we use, the ones controlling their designs…
our life is monitored and it is for the better… those of us who do not fit in are weeded out until a utopia is formed

genericanonymousname March 1, 2013 2:19 PM

also i want to make note, simply by the way i wrote that they know who i am, my ip, my mac address(every modem and network end point has 1) and innumerable other ways, i am in no way anonymous

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.