The NSA's Voice-to-Text Capabilities

Home Blog

The NSA's Voice-to-Text Capabilities

New article from the Intercept based on the Snowden documents.

Tags: eavesdropping, Edward Snowden, NSA, privacy, surveillance

Posted on May 5, 2015 at 12:51 PM • 36 Comments

Comments

Andrew Wallace • May 5, 2015 1:38 PM

I remember when the Signals community pretended they couldn’t read Skype traffic.

Andrew

Andy • May 5, 2015 1:47 PM

We had voice to text translation, back in 2007, at the little agency I worked for. We were translating prison phone calls to searchable text. It was good and affordable tech 8 years ago, I can only imagine what it’s like today with the nearly limitless processing power of the NSA.

David Leppik • May 5, 2015 1:52 PM

I’d be surprised if the capability is significantly better than Google or Siri. Especially now that Apple and Google are competing for the same talent, and also have massive budgets for data centers. And I can’t imagine the NSA spends money more efficiently.

That said, there’s a saying in speech recognition: “there’s no data like more data.” And it’s hard to beat access to every phone call in the world.

Also noteworthy:

…because they’ve been tuned using actual intercepts, the specific parameters of the systems are highly classified.

In general, speech recognition systems use incomprehensible data structures (such as Hidden Markov Models), but even if you treat the code as a black box you can learn a lot from it. For example, which accents does it understand the best? What phrases does it detect most readily? (Especially when it comes to erring on the side of detection. “Ouch! I stepped on attack!”) Which voices is it best tuned for? Whom can it transcribe better, Angela Merkel or Vladimir Putin?

Archimedes • May 5, 2015 2:01 PM

Do you reckon they might release the code if we ask very nicely? It’s been a while since the Linux community has been wishing for an efficient native speech recognition solution.

Sasparilla • May 5, 2015 2:14 PM

Based on the previous choices, that we know about, of U.S. intelligence leadership (where they never said no to any kind of surveilance of the U.S. citizenry/population), I think it would be safe to assume that the NSA would log (permanently of course) the content of all U.S. phone calls as soon as its technically feasible (if that isn’t the case already).

@Andrew

Crazy, especially now that we know Microsoft coded in pre-encryption access for the NSA to all Skype data (presumably the NSA still has this access as Microsoft has never renounced it as far as I know).

http://www.theguardian.com/world/2013/jul/11/microsoft-nsa-collaboration-user-data

Mat2 • May 5, 2015 2:14 PM

Speech recognition isn’t a new concept in information science. It has been researched for years.
There are even open source speech recognition engines (like Julius or CMU Sphinx).

Anaximander • May 5, 2015 2:29 PM

@Archimedes

If they say no, give it a couple of months until someone leaks it.

65535 • May 5, 2015 2:31 PM

I wonder what this voice to text recording is going to do for doctors, pharmacists, psychologists, politicians and lawyers. Their conversations are getting vacuumed up with ours.

Markus Ottela • May 5, 2015 3:00 PM

Not to derail but — I’m more interested in NSA’s Text-to-Voice capability. I’d like to be sure who I’m comparing OTR/ZRTP fingerprints with – a contact or computer. This is everything I’ve seen on the subject so far.

Andrew Wallace • May 5, 2015 3:23 PM

Sasparilla,

NSA have ways to listen into U.S citizens on U.S soil.

As long as you use foreign servers, foreign infrastructure or communicating with a foreigner overseas it is legal under their mission.

Your only defence against NSA activities is the law retrospectively if you can prove there was no foreign signal involved.

NSA and others build infrastructure to make sure there is at least one foreign hop to make sure all U.S citizens on U.S soil is fair game.

This is of course kept very hush hush and is a sensitive subject for the agency when asked about it.

Andrew

V • May 5, 2015 3:35 PM

@ Archimedes:

Pick up your phone, dial your mom, and ask nicely.

Yes, stolen from a very old joke.

Anura • May 5, 2015 3:52 PM

@Andrew Wallace

Even if you are using entirely US servers, it’s perfectly legal for GCHQ to intercept your traffic.

Andrew Wallace • May 5, 2015 4:06 PM

Anura

If you use Twitter and other social media you are pinging all sorts of foreign signal and thus why social media is NSA’s best friend.

It is virtually impossible to use the internet without a foreign signal unless you know what you are doing.

NSA and others will use any tactic possible to throw a banana skin under your feet if you are trying to be careful about it.

As I have already mentioned there are mechanisms and platforms introduced to the internet to make sure you are creating a foreign signal under their legal framework.

Andrew

BoppingAround • May 5, 2015 4:17 PM

Interesting if something like this is being used by carriers for cough-using your phone conversations transcripts for marketing/advertising purposes-cough-cough-cough enhancing user experience.

If not, when will it be.

Andrew Wallace • May 5, 2015 5:13 PM

Anura

I’m based in the United Kingdom so I tend not to mention British agencies.

NSA can create a foreign signal without asking a partner agency.

Espeically if for operational reasons NSA don’t want the British to know what is going on.

With the Bin Laden capture it was kept under strict secrecy.

Andrew

Evan • May 5, 2015 5:22 PM

@Sasparilla: What’s interesting to me is not whether they can or will log all traffic, but what the legal status of of it all ends up being. Part of me thinks that once the NSA has the whole system up and running they’ll get around to securing secret legislative authorization to do so, so it can be used to prosecute/persecute enemies of the state/of the intelligence community, but part of me thinks they might just skip that step. There’s literally no real consequences for them for breaking laws about this stuff. What we’ve seen from the Snowden revelations is that Congress is perfectly happy to turn a blind eye to illegal activities on the part of the NSA, but even if that weren’t the case and there was actual political pressure to stop and destroy the archive, it would be impossible to verify that any of this were actually carried out. It’s all computer hardware so it’s easy to substitute junk for the actual switches and storage media, and since they can classify whatever they want under whatever categories they want, they can hide it in a new location that no one can check to verify. And the worst punishment for anyone involved is that they get fired and end up lobbying or in the private sector for 10x what they were making before.

Jarth • May 5, 2015 5:30 PM

Somehow. Lernaut and Hauspie comes to mind.A once promising Belgian company gone belly up.

Thoth • May 5, 2015 6:20 PM

@Markus Otella
Your closest assurance is to share a secret physically since offline shared secrets are usually unreliable (there goes back to the OTP style of SD card keystreams).

Also, Matthew Green in his recent blog post to debunk the frontdoor/backdoor that NSA and friends kept trying to introduce, pointed to the recent research to automate end-user key verification called CONIKS (https://eprint.iacr.org/2014/1004.pdf) if anyone is interested in reading and probably even implementing one.

Thoth • May 5, 2015 6:21 PM

@Markus Otella
Definition of offline shared secrets in case someone trips over, is sharing secrets without knowing each other’s secrets or public half (in my own terms).

Hans • May 5, 2015 6:21 PM

From what I learned the differntial semantic approach works best. Would be interested in knowing which approach the Services took. Lectures in physics ror other distict disciplines esemble each other more than some professor would like to admit. Start your n-grams Gentleman 🙂

Andrew Wallace • May 5, 2015 6:35 PM

Thoth, do you mean this kind of sharing offline?

http://isc.sans.edu/forums/diary/Dead+Drops+Hidden+USB+Sticks+Around+the+World/19551/

Andrew

Thoth • May 5, 2015 7:30 PM

@Andrew Wallace
No, that kind is too obvious. Someone’s going to dig that stuff out or load some viruses in it.

Thoth • May 5, 2015 7:45 PM

@Markus Otella, Andrew Wallace
What I meant is something like this over USB…

http://www.mils.com/en/technology/security-tokens/#0 (32 GB of OTP keystream)

or in a smaller form factor

http://www.mils.com/en/technology/security-tokens/#1

For the CIK feature, you can use something the military uses on their secure telephone:

http://datakey.com/products/secure-memory

This is much more assuring 😀 .

Benni • May 5, 2015 7:47 PM

Not only NSA can do that:

http://www.heise.de/ct/artikel/Die-Bayern-Belgien-Connection-284812.html

BND had first stolen the sourcecode of a smart database called polygon. This database works, according to its developer, ideally together with language translation software, because it can associate the words together and thereby extract some meaning of a sentence.

In order to get the translation software, bnd made the management of the language software company learnout & hauspie to for a BND agent with the name bodenkamp, who was convicted in court because of fraud..

Via massive illegal manipulations on the american stock market, BND made Learnout and Hauspie so valuable that it could buy the american language software companies dictaphone and dragon.

Of course all that collapsed shortly, with thousends of people loosing 8 billion dollars of shareholder money after it came out that the stock of learnout and hauspie was manipulated.

But by that time, BND had aquired the sourcecode for the american translation software it wanted….

The developer whose database software BND had stolen almost got ruined in the process. She has put some facts together about how BND got its language processing software by stealing and by manipulating 8 billion dollars on the stock market here:

Benni • May 5, 2015 8:30 PM

Is this just an accident?

“The first-generation tool, which made keyword-searching of vast amounts of voice content possible, was rolled out in 2004 and code-named RHINEHART.”

Rhinehart, is spelled like the german name Reinhard. But rhine, is a german river. And that BND agent who aquired the US language translation software for BND by illegal means had his office in Bonn, which lies at the river rhine. And the agent was convicted in the year 2000…..

Benni • May 5, 2015 8:46 PM

and Reinhard was the surname of the BND founder Gehlen….

interesting codename “rhinehard”, contains the sure name of the BND founder and the place where BND assembled its language translation software with technology from american companies after they were bought with help of a deliberately manipulated stock market…

NSA is used to get BND software….

http://www.spiegel.de/international/world/german-intelligence-sends-massive-amounts-of-data-to-the-nsa-a-914821.html

Perhaps they really should invent more illustrative codenames… I would suggest that they name their software simply after the real names of the developers and the organizations from which they bought their technology…..

65535 • May 5, 2015 9:34 PM

@ Benni

Your links were interesting.

It looks like there is a huge amount of money to be made in this mass spying and text translation game. Stock and bond markets run on rumor and innuendo. The temptation for manipulation must be high.

MrC • May 5, 2015 10:02 PM

@ Markus Ottela:

That feels like old news. I don’t recall where I heard or read it, but I’ve know for years that voice audio could be spoofed if one cared enough to gather sample data. The shocking recent development is that it appears they already have gathered sample data for everyone on earth through bulk wiretapping.

So, now what? It’s basically the classical key distribution problem that no one has a good solution for.

One possibility would be to upgrade from audio to video. The person seeking to share their fingerprint prints it on a sheet with some kind of plaid background, then shoots a video while twisting and warping the sheet, passing the sheet in front of their face, passing their hands in front of the sheet, and speaking the fingerprint with clear lip movement. Probably still spoofable, but more difficult to spoof. And, of course, it’s usefulness limited to sharing your fingerprint with people who know you on sight. Thoughts?

Another possibility would be to heed that fellow who’s running around shouting that the blockchain is the solution for everyone’s key pinning needs. I’m somewhat skeptical about this since I suspect that the NSA could launch a 51% attack if this ever saw wide adoption. I also speculate that you don’t actually need 51% to launch a 51% attack if you control the pipes — you can identify the miners with the most processing power and prevent them from ever publishing a block you want to alter by simply dropping their traffic if they finish the block before you.

Andreq • May 6, 2015 1:42 AM

There were several references to this system in the list of NSA patents which was shown here few months ago.
It’s the only way to index conversations, to match a word said by one person with same word of another voice. Google tried once to add automatic transcripts of Youtube clips but they gave up, I’m not sure for what reason.
Once the discussion is indexed, you can easily perform a search by some person name, for example, and get all conversations mentioning that person.
Nothing really new, technology is widely available. What is not, is the access of private conversations :).

Benni • May 6, 2015 4:49 AM

The assumption that this Rhinehart program is of german origin also makes sense if one realizes that most of the massive amounts of data that BND sents to NSA over Bad Aibling comes from Afghanistan. Apparently, BND takes somewhat of a leading role in the surveillance business there, for the exchange that germany did not have to do much fighting. But its main part of surveillance can only be fulfilled, if the language translation systems of BND are at least on par with that of NSA. If I remember correctly, they write in the spiegel book Der NSA Complex That BND has more expertise in arabian and african language translation.

By the way: Do you know where NSA buys its database software? No, it is NOT developing them itself it is buying the database software Hana. In this video https://www.youtube.com/watch?v=tD1Lmj3xeP4 you have general hayden speaking at a conference of the german software manufacturer SAP. He says on 3min41s:

“Much what we do with the databases is finding targets Thank god that you give us this ability. We have a watchlist and a targeted killing program”.

SAP sells its database softwares as a long term company strategy to NSA. In order to get them as customers, SAP bought the search machine company inxight, and the database company sybase, whose main government customer is NSA. SAP also made contracts with Palantir and Attensity. In order to sell surveillance databases from one single company, SAP founded a child firm SAP NS2 to sell surveillance databases.

In that video, you will see Jo learnout, from the language software company learnout and hauspie. He says that many of his former employees will now work for inxsight…

In this video on 4min15s of this video, jo learnout admits having worked for BND: https://www.youtube.com/watch?v=rA0TGZgK6Is He says: “all these technologies became property of my company. The technologies, language recognition and automatic translation were developed for the german secret service”

Markus Ottela • May 6, 2015 11:38 AM

@ Thoth:

I agree with you on physical key delivery. I’ll have to read the Green’s blog post and the CONIKS paper, thanks thanks for those.

The Mils security tokens places a lot of trust on the idea interdiction doesn’t focus on such high assurance storages, especially they are bought for transmission of OTP key material. They also stand out in customs. The pricing of Mils’ tokens is probably quite high and yet, they should be considered insecure and destroyed the minute keys are delivered and the device is plugged to RxM of TFC OTP. Of course, if you live in US, obtain those from factory store and transmit keys to contact inside the country, they can offer more protection than standard USB memory could. However, roadside inspections by LEAs probably don’t include dumping your USB memory with portable forensic kits. As for long term key storage in endpoints, these probably offer better protection against MSAs than standard FDE HDDs, yet I don’t think the proprietary design offers significant protection against HSAs. (Also, could you please double-check the spelling of my surname.)

@ MrC:
It is. But if generated voice was somewhat indistinguishable 16 years ago, think how much it
could have been polished to this day. AFAIK the leaks have yet to reveal collection of the samples for this specific purpose, so I’ll take it with a pinch of salt and a bucket of healthy paranoia.

Agreed the video increases the difficulty of attack. Jitsi should be used as it’s AFAIK the only end to end encrypted video conferencing client out there.

I’m not familiar how block chains work. How long term would the pinning of fingerprint need to be? The issue is you’ll want to avoid using one DSA key pair too long, as exfiltration of private keys renders the verification channel useless.

Nick P • May 6, 2015 1:25 PM

@ Markus Ottela

The tech is real: I used it in the field a few times long ago. The “voice changers” were sold in commercial catalogs for P.I.’s. They looked like mixers musicians use. The workflow is as follows: get good samples of target’s voice; identify its key characteristics; manually tune the equipment to modify yours to theirs; optionally create different profiles for different moods or vocal styles of the target; match their behavioral style during the call (equipment couldn’t); match their choice of words for common situations. The equipment literally just changes the sound of the vocals. A technician did the configuration and tuning while the investigator does the rest during the call after losts of practice. You have to be really good with your voice anyway.

Far as the military tech, it likely automated matching vocals of target to vocals of users. The rest seems like it would still be manual. A technology such as IBM’s Watson could believably handle most of this. It’s only a matter of time before they combine the two tech’s with a black program. The other one is undoubtedly already in use for clandestine and especially covert operations. The difficulties mean its use is probably limited.

Note: I tried to do some quick Googling to give you the examples. The problem is that the market is saturated with “voice changers” that do cartoonish things or terrible impressions for fun. Had trouble finding anything professional past the Auto Tune product the Owl City band uses live. Will have to try another time.

Nick P • May 6, 2015 1:30 PM

@ Markus

EDIT to add: Video is extremely difficult if rendered and hard if done with make-up. Still a trick that helps: use a combination of angle, lighting, and signal “quality issues” to hide the fake-looking parts of the appearance. This must be used in combination with other social engineering techniques (esp sense of urgency). They can’t have time to think about what they’re seeing.

Thoth • May 6, 2015 6:49 PM

@Markus Ottela
Dumping the USB memory might only be feasible if they brought back to the lab to target the security passivation layers on the device (regardless if they know the PIN code to the device) which is considered a HSA attack. Normal forensics dumping wouldn’t simply block it as most cryptographic tokens do and the channels of communication might be different from most normal USB storage protocols (probably goes into USB Smartcard mode) which have security functions.

It can only withstand up to MSA attacks and the concept for the Mils tokens are to have a Security/Crypto Officer (SO) to distribute a centralized store of these tokens where the tokens generate their keystreams onboard or offboard in a secure location which is the same concept of most HSMs or military style KMS systems. The good thing is the bulky version has a screen and input so that you can double verify codes with the laptop/desktop data but that’s all the tricks it can do whereas the more secure version as you mentioned which is your TFC would be suitable in a more or less fixed position context (probably a HQ secure room). Probably most LEAs wont have the capability of HSA attacks like bypassing chip security passivation layers but that does not mean they cannot call in the big boys at NSA/GCHQ/BND …IC Warhawks… to do their job of using ion workstation to cut into the passivation and those high tech technologies presumably the IC Warhawks have that the LEAs don’t and which can be expensive to attack on a mass scale if the chip is well made without backdoors (Mils from a NATO neutral country… hmmm…) ???

Well, almost any portable token without the kind of separation of data flow and other C + P controls we usually discuss here are probably going to only withstand up to MSA levels until the big boys in the IC Warhawks level comes down to attack it with HSA attack vectors.

YY • May 20, 2015 6:49 AM

NUCLEON Voice Reposity

It doesn’t exist without a reason. It shouldn’t take much to infer that but doing so seems to be above the capabilities of most people.

APEX VoIP Exploitation

Page four, the last bit down on the right, it spells it out but no one but me seemed to even read it back in 2014. Those three words ought to be plenty: NUCLEON Voice Repository. There’s not a shred of ambiguity there.

Or am I supposed to be smarter than everyone else? That would be even worse news.

Release all the files now with no redactions and someone like me might be able to do something. Until then the system is safer than ever: Snowden likely increased its safety since everyone is busy wasting each others time grasping at convenient and obviously wrong conclusions to make themselves feel better.

Natalie RAE Lemmer • September 24, 2022 9:30 AM

I don’t understand the objection to the NSA Collection of citizen/ residence communication collection? I wouldn’t believe that data would be used to sell to broker to manipulate outcome of a trial, etc. I, personally will grant full acess to the flow of communication, and electronic and voice data to my devices and surveillance into my personal airspace, as it is such legally defined, today to NATIONAL SECURITY AGENCY (NSA). And entry of this, as stands, notwithstanding of any unintentional/ errors, here on this speech to text blog is evidence and serving as electronic signature. NRL 9/24/2022

Schneier on Security

The NSA's Voice-to-Text Capabilities

Comments

Leave a comment Cancel reply