Voice Authentication in Telephone Banking

This seems like a good idea, assuming it is reliable.

The introduction of voice verification was preceded by an extensive period of testing among more than 1,450 people and 25,000 test calls. These were made using both fixed-line and mobile telephones, at all times of day and also by relatives (including six twins). Special attention was devoted to people who were suffering from colds during the test period. ABN AMRO is the first major bank in the world to introduce this technology in this way.

Posted on July 21, 2006 at 7:43 AM42 Comments

Comments

Tom July 21, 2006 8:29 AM

I wonder if they tested recorded voices? Seems like if I could record you speaking your account number, I could access your account, no?

hggdh July 21, 2006 8:32 AM

I am not sure. Having suffered voice recognition systems for some years now, with some rather amusing (in retrospect) false positives, I tend to zero out of it as soon as I can (which, nowadays turns out to say “operator” or equivalent, and hope the bloody system will indeed get you a warm body).

Example:
(system): what is the departing city or airport?
(me) Dallas
(system) you said Cleveland. Is this correct?
(me) no
(system) I am sorry blah blah blah. What is the departing city or airport?
(me) Dallas
(system) you said Portland. Is this correct?
(me) operator

Ed T. July 21, 2006 8:35 AM

“I tend to zero out of it as soon as I can…”

Yeah – me too. However, I prefer to channel Gordon Ramsay, as below:

Example:
(system): what is the departing city or airport?
(me) GIVE THE THE **** OPERATOR, YOU SORRY **** PIECE OF **** DONKEY!!! ****, ****, AND ****!!!

~EdT.

Paeniteo July 21, 2006 8:38 AM

I do not see any advantage. It might be good not to have to remember a PIN, but it doesn’t protect against the common phishing attacks any better than a password.

If recording doesn’t work, you could still try man-in-the-middle.

The common phishing email will change from “Please go to URL and enter all your account data to verify your account.” to “Please call number to verify your account data.”
Or direct calls from “your bank” via VoIP will be used instead of emails.

Phishing by VoIP is nothing exotic anymore:
http://en.wikipedia.org/wiki/Vishing

dbh July 21, 2006 8:54 AM

This is authentication by voice, AND voice prompt recognition. Note, that it is NOT two factor, you ‘no longer need to remember passcodes’. And account numbers (or numbers in general) are not rare in speech so could be recorded.

RvnPhnx July 21, 2006 8:59 AM

@Ed T.
I guess that is is a good thing that many don’t bother to record what you say until you are about to reach a human being.
@Paeniteo
It is worth noting that where I am people have already been getting calls of that nature from fake banks for years. The ones that really bother me are the college loan people whom will read off all of your personal information over the phone without truly verifying that you are whom you clam to be.

Grin-Mouse July 21, 2006 9:02 AM

There is apparently at least one voice verification company who has an interesting idea: when you call, you have to say your standard phrase, which they match up against the one you recorded at registration. But they then also ask you to say one word that is randomly chosen from a list of a thousand, just to make sure that the two are the same voice. Pretty much eliminates the recording attack (though not a man-in-the-middle attack, of course).

jayh July 21, 2006 9:03 AM

–I am not sure. Having suffered voice recognition systems for some years —

There is a difference between voice authentication (which can work fairly well) and speech interpretation (which really sucks)

JakeS July 21, 2006 9:21 AM

There isn’t enough detail in the press release to guess how secure it’s likely to be.  They don’t say exactly how it works, and they don’t say how hard they’ve tried to break it.

If I was ABN-AMRO, I’d deal with phishing by telling customers two rules:
(1) You can only ever use our system by calling number nnn.  No other number will ever work.  If you’re given any other number, it’s a fraud.
(2) You always must call us.  We will never, ever, ask you to authenticate yourself except on a call that you initiated.

Also, if I was ABN-AMRO, I’d publish details of an account with $10,000 in it, and say “if you can get into this account using our voice-recognition system, you can have the money (provided you tell us how you did it).”  If no-one claims the prize, after a while they could make it a million.  That would give customers a lot of confidence.

hggdh July 21, 2006 9:28 AM

@jayh

Yes, I know. What I was trying to state was my acquired distrust of systems that depend on voice, based on my experience.

I agree with you. I would expect voice authentication to be better — by quite a lot, since what they are looking for is basicaly wave formation & composition, as affected by a series of factors. Another advantage here would be the fact that specifications for phone paraphernalia is very standardised — meaning the response curves are quite similar.

Then there is VoIP… and, maybe, here we might see some significant changes in the speech.

A possible issue, although small: a false negative: you call in, say your say, and fail. You are, now, passed over to a warm body, that is already distrusting you a lot.

Also, another point: the bank statement says

“With the help of this technology in combination with voice recognition, the customer is first asked an open question: ‘How can we help you’? Depending on the answer, he or she is then transferred to the appropriate member of staff.”

So, there may be additional verification performed by the “appropriate member of the staff”, which would put this schema as a two-way authentication (of sorts).

Lotharster July 21, 2006 9:30 AM

>>Also, if I was ABN-AMRO, I'd publish details of an account with $10,000 in it, and say "if you can get into this account using our voice-recognition system, you can have the money (provided you tell us how you did it)."<<

A criminal would have to be really dumb to choose the prize instead of hacking dozens of other accounts and making millions.

wrc July 21, 2006 9:38 AM

Voice authentication works in bulk to prevent a non-specific attack. If an attacker is targetting you, they just need to record enough of your voice, perhaps a couple of conversations.

CdG July 21, 2006 9:39 AM

‘I wonder if they tested recorded voices?’

I heard about this system on dutch radio yesterday: Apparently it has been tested with recorded voices and the system could distinguish between actual voice and a recording. Not sure how they do this though.

Xyz July 21, 2006 9:40 AM

“Seems like if I could record you speaking your account number, I could access your account, no?”

My guess is that you’d have to obtain a pretty good recording of the call itself (i.e. have the line tapped) in order to get a good enough voice recording to pass the > 100 characteristics that the system checks for. If that were the case, it seems about as secure as entering the account number via dialtones anyway– the numbers are simply being “spoken” differently.

Since the obscurity of the account number itself isn’t really an issue, it seems to me that they could get a better biometric voice profile by having the customer repeat back something dynamic like a randomly generated phrase (“I would like a ripe papaya!”) in addition to reciting or dialing the account number. That should trump any recording attempts short of constructing an entire soundboard of the target’s voice.

j July 21, 2006 10:08 AM

@Paeniteo:
Re: The common phishing email will change from “Please go to URL and enter all your account data to verify your account.” to “Please call number to verify your account data.”

I have already had phishing email like that. (On the one occasion I called the number, the automatic answering system answered generically “Thank you for calling. Please enter your account number.”

Right.

 /J

Nathan Freeman July 21, 2006 10:11 AM

Phishing attacks by this route would be much more difficult. Not too many phishing destinations are in the same country as the legitimate service. If you’re in the US, are you likely to call a number in Russia or China or Pakistan to authenticate to your bank down the street? That’s a much clearer sanity check than DNS provides today.

If you tried to set up a fake 800 number to conduct man-in-the-middle attacks, you’d leave a helluva paper trail.

Armbrat July 21, 2006 10:25 AM

The Bank of Nova Scotia employs a similar scheme.

If I forget my telephone banking PIN (after entering in my card number), I am prompted to speak aloud my mother’s maiden name (mmn). I’ve never tried to fool it.

I guess I trust it. If someone has compromised my card and my mmn, they still have only changed by telephone banking PIN. The customer service reps still verify other information as well (3 numbers on the back of the card, my DOB, address, etc.)

Dave July 21, 2006 10:39 AM

I did this long ago – voice recognition is very different from voice authentication. Recognition tries to recognize phonemes from a wide variety of speakers. Authentication tries to recognize essentially the shape of the vocal tract, for different speakers speaking the same phonemes. If the shape match is “good enough”, you pass. It’s actually pretty good as a confirming authentication metric.

It’s a different problem altogether to recognize a speaker out of many different speakers. This suffers from the same problems facial recognition, etc. With a sufficiently large population, you’ll ultimately find that many people are “close”.

Magnus July 21, 2006 10:55 AM

ABN-AMRO has had a good two-factor authentication in place for years and it seems they’re swapping it for a single-factor one. Is that good?

Brian July 21, 2006 11:02 AM

OK, so they’ve built a system that can recognize a person’s voice even when that person has a cold. Given that level of false negatives, what happens to the level of false positives? What percentage of the population have voices similar enough that the system is unable to distinguish between them?

Last time I looked at fingerprint readers the machines were having trouble matching the same fingerprint twice. After you registered your initial fingerprint with the system there was a good chance you’d have to run your finger over the reader a dozen times before it would let you in again. The “solution” was to turn the sensitivity of the machine down so far that it accepted three different fingers on my hand as good matches for the original finger print.

piglet July 21, 2006 11:05 AM

I wouldn’t be comfortable with such a system. Why should I trust it? We all have had the same experience as hggdh. And I don’t see why anybody would want to do banking business by phone – I want to get a receipt.

As a rule, the more you try to minimize false negatives, the more false positives you’ll have to tolerate. Why shouldn’t this apply to voice authentication? I prefer my e-banking TFA code cards by far.

roy July 21, 2006 11:06 AM

I suspect the basis for distinguishing a recorded voice from a live voice is simply fidelity. If the loudspeaker is tiny and nearly massless, the tinny sound quality will tell. A larger studio quality loudspeaker has a much better chance of passing for live, simply because of the higher fidelity.

Sean July 21, 2006 11:12 AM

Never mind banks authenticating me: what about them authenticating themselves? Every few weeks I get a call from somebody claiming to be a representative of my bank – often, but not always, with their number withheld – and the first thing they ask is for me to prove that I am actually an account holder at the bank.

I applaud their attempts to ensure that I am who they expect me to be. But any self-respecting fraudster would do this too, it seems, as part of the confidence-building scam. So I invariably ask them to prove that they are who they say they are. And of course, I invariably get the same incredulous and/or angry response from call-centre monkeys who have clearly had little or no training in how to answer this question, often resulting in me terminating the call long before we get to any kind of transaction. Wouldn’t it be nice if they spent some of the money they’re investing in authentication technologies on solving this problem too?

meters July 21, 2006 11:41 AM

If recorded voices are accepted by this scheme it will interact really well with the electric company’s “speak your meter reading” line. They know their customers’ names and bank accounts.

Alice McGregor July 21, 2006 12:16 PM

With the voice systems I’ve used (including my cellular phone and Microsoft acticvation) and I’ve learned the technique. Speak like a robot. In the case of the activation system where you speak a silly long list of numbers, speak them as identically as possible, with a longer-than-average delay between them.

BLP July 21, 2006 1:00 PM

@roy

More simply, direct digital playback (and recording) by trunking the calls through a VOIP system.

However, I suspect that something like @Xyz’s suggestion about speaking arbitrary language is in effect.

Moshe Yudkowsky July 21, 2006 2:14 PM

Comrades, speech technology is what I do for a living. (http://www.Disaggregate.com)

First of all, I think the bank should keep their PIN requirements. Authentication is based on something you know; something you have; something you are. They’re dropping “something you know” and replacing it with “something you are.” That’s fine if speaker authentication is perfect, which it is not.

As for the “more than 100 biometric characteristics,” I say, “huh?” There many be 100 biometric characteristics that cause voices to differ from each other, but what you have is measurements of the voice frequencies and energies. The system likely tracks a dozen time-varying characteristics.

As for pre-recorded speech, we have our little tricks to distinguish a recording from a live voice — even a bit-for-bit copy over the same system. For example, in a replay attack: if the copy is perfect, than we’ve seen that same set of measurements before, eh?

Challenge-response provides a nice compromise. The system challenges you with a random four-digit number, which you repeat back. That prevents any pre-recorded responses.

I wonder how the posters who claim you can record speech and create an on-demand version of the voice intend to perform that feat. A person-specific Text-to-Speech system? They exist, but they’re expensive and still sound like Text-to-Speech.

But the upshot is that (a) speech technology continues to improve and (b) speech technology isn’t perfect, any more than humans are perfect. But sometimes speech technology does just as well as, if not better than, a human in the same task.

VoiceRecognition July 21, 2006 3:27 PM

@Moshe
“As for pre-recorded speech, we have our little tricks to distinguish a recording from a live voice — even a bit-for-bit copy over the same system. For example, in a replay attack: if the copy is perfect, than we’ve seen that same set of measurements before, eh?”

Does that mean it is impossible for the me to speak such that the bit representation is the same for a given set of spoken elements?

Stephen Dedalus July 21, 2006 7:04 PM

@moshe
“Challenge-response provides a nice compromise. The system challenges you with a random four-digit number, which you repeat back. That prevents any pre-recorded responses.

I wonder how the posters who claim you can record speech and create an on-demand version of the voice intend to perform that feat. A person-specific Text-to-Speech system? They exist, but they’re expensive and still sound like Text-to-Speech.”

It needn’t be that complicated, you could just record multiple phone calls and/or ambient speech samples and construct a sound board, such as:
http://www.ebaumsworld.com/arnolds4.html

Even if you only managed to record three numbers with any great fidelity, you could just keep calling back until the four-digit number was composed only of the digits you had on file. Far better to use the date. Caller must say, “two one July two thousand six.” Combined with a random number, this is far better than using the account number, which is usually easier to obtain than speech samples.

Shachar Shemesh July 22, 2006 1:58 AM

I used to be with an phone-only bank afew years back. At some point they placed voice authentication, not as a replacement for the PIN, but as a replacement for the identification question they used to ask (what is the second letter of your best friend’s sir-name).

It wouldn’t work with cell phones, but maybe the years that passed in the mean-while improved the technology (and, anyways, they would fall back to the identification question if the voice recoggnition failed).

They also did the voice recognition as part of the very begining of the actual conversation, so that recording wouldn’t really work well. The representative says “hello, how can I help you”, and when you dump all the things you need them to do during the phone call on them, the system tries to identify you.

For me, it didn’t work particularily well, I have to admit. More often than not, I had to go through the challenge-response stage. Maybe the years have improved things.

Shachar

Stefan Wagner July 22, 2006 8:41 AM

@ roy: (sound quality of recordings):
Isn’t the fidelity of phones in general less than cheap audio equipment?

afaik, the frequencies transported via phone are very restricted.

Perhaps the reactiontime of live speaking is hard to fake – we normally respond to questions before the question is finished.
If voice-recognition is widely used for a long time, we will see if it is still secure enough.

Jungsonn July 22, 2006 9:15 AM

Sounds scary to me.

With audio on telephone lines which are usually 8bit (distorted) and not near the real original human voice, i wonder which algo they use to recognize it. to me this is doomed to be flawed or possibly exploited in various ways.

If my bank would implement this, i would not use or subscribe to it.

Why not just rely on the basic PIN ? no one can guess or compute that if you only have 3 strikes.

Moshe Yudkowsky July 23, 2006 7:30 AM

Two replies: @VoiceRecognition, @Stephen Dedalus

@VoiceRecognition

You ask:

“Does that mean it is impossible for the me to speak such that the bit representation is the same for a given set of spoken elements?”

I don’t think I quite understand the question. If you say “hello world” twice, each time the bit representation will be different. That’s in fact the entire reason that speaker verification is so difficult — each time you speak, the system sees a different bit pattern, and then the question becomes how to determine if these pattterns are from the same person. The current systems use Hidden Markov Models based on theories of how the human body produces utterances.

@Stephen Dedalus

You propose a method to game the system:

“Even if you only managed to record three numbers with any great fidelity, you could just keep calling back until the four-digit number was composed only of the digits you had on file. Far better to use the date. Caller must say, “two one July two thousand six.”

The numbers in a digit stream are pronounced differently depending on whether they are in the front, middle, or end of the four-digit number. (Listen carefully to the next text-to-speech number you hear. Good ones will have a rising first digit and a falling last digit. Bad ones will sound flat all the way through, very robotic.)

Secondly, there are “co-articulation” effects that change how the numbers are pronouned, based on the preceeding and succeeding numbers.

As such, what you’re proposing isn’t possible. Just having the individual digits isn’t enough; you must have the exact same sequence.

And I believe I can assure you that if someone makes a thousand attempts against an account number, waiting for a particular sequence to come up, in a reasonable scenario an alarm will go off well before the fraud is accomplished — I realize that’s not part of the speech recognition technology but an application issue, but even so I’ll make that guess.

In other words, we in the speech business aren’t blithering idiots.

Roger July 23, 2006 11:48 PM

@Moshe Yudkowsky:

For an employer, I looked into biometrics in some detail a few years ago. My impressions at the time were:
a) most of the systems, except a few of the most expensive & intrusive ones, had really poor performance, only slightly better than toys; and
b) the industry as a whole was riddled with scammers overhyping low quality products, consequently only independent tests should be considered.

At the time, I seem to recall that voice recognition had very mediocre performance, with EERs on the order of 5%. Not as bad as face recognition (junk), but far too inaccurate for anything but the lowest security applications. Certainly it was nothing like good enough for securing access to bank accounts. At any rate, we can’t really begin discussing whether or not voice recognition for telephone banking is a good idea, without knowing the state of the art in voice recognition FAR, FRR, FTE etc. Do you have such data available? Or preferably, ROC graphs?

Maybe things have changed a lot in the last 5 years (I haven’t really been keeping up), but independently produced data is preferable.

Paeniteo July 24, 2006 3:45 AM

@Moshe (Re: Challenge-Response):
This fails in the sight of a man-in-the-middle attack.

The phisher could simultaneously call the bank on a second line and simply transfer their challenge and your response to it unmodified.

No need for a sound-synthesizer, just a slight degradation in voice-quality and a slightly increased delay.

Moshe Yudkowsky July 24, 2006 8:47 AM

@Roger

Please feel free to contact me via my web site if you need consulting in this area…

That said, speech biometrics continues to improve. I have yet to see any large-scale independent data, although I did manage to dreg up a study last year. After all, who would conduct such a study and pay for it?

The semi-annual industry conference (www.SpeechTek.com) is at the beginning of August, and I’ll get a better sense of what’s happened in the past six months.

To my mind the question is how authentication is used. Authenticaion does not replace PINs. It’s an additional layer.

At one point my colleagues at Bell Labs put a authentication system into place in South Africa as an additional layer. We had an equal-error rate of 8% — it was years ago. We were unhappy because the EER was so high. The bank was ecstatic because we reduced fraud (perhaps not by 92%, but substantially).

Moshe Yudkowsky July 24, 2006 8:53 AM

@Paeniteo

I don’t claim that a man-in-the-middle attack is not possible.

I don’t think it’s very feasible, given the constraints of timeouts and whatnot, but I suppose it’s within the realm of possiblity.

Then again, I do know of a case where someone managed to re-route telephone calls that were supposed to terminate at company X. Instead, they went to a voice-mail system somewhere — one that had fictitious individuals who posed as members of company X and ordered expensive computer equipment.

Daedala July 24, 2006 1:01 PM

The problem with the systems I’ve seen is the initial enrollment of the voiceprint. We have this at work. The first person to call on an ID gets their voice entered as an authentication credential. Yay!

Josh July 25, 2006 11:14 AM

person: “Please transfer $100 from savings to checking.”

machine: I heard you say “give all my money to charity.”

person: “No! No!”

machine: I heard you say “Yes.” Thank you.(click)

Eli July 26, 2006 9:22 AM

The issue of recorded voice was not addressed simply because the system under test was not capable of doing so . How do I know ? Take a look at vendor’s VoiceVault website and you will find that :
“The verification speech samples recorded during the 25,000 ‘live’ calls were used to generate a further 10 million automated verification tests.” This means that the recordings and not live samples where tested . If “liveness” detection was present (it should reject every recorded (not live) sample , no matter , true speaker or impostor )- then such test would be impossible to carry .
This fundamental flaw invalidates the results reported .

Jungsonn July 27, 2006 3:12 PM

Well they state they can analyse the difference between a recorded and a live real human voice.

I wonder how? because the quality of phonelines to my knowledge is 8bit, and heavily distorted, which almost “sounds” like a recorded voice.

anyone ideas?

robert strong January 19, 2007 5:19 PM

Hi, just stoped by and wanted to know if there was any freeware or open source of this type of voice security which works ok? I want to use this type of security to restrict acess to our voip network. Not a computer guy, but about the weakness of the voice security I think that hackers will spend their time on easier attacks,,ie something you know….this is a security in my mind….just a thought.
thanks Robert

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.