AI Emotion-Detection Arms Race

Voice systems are increasingly using AI techniques to determine emotion. A new paper describes an AI-based countermeasure to mask emotion in spoken words.

Their method for masking emotion involves collecting speech, analyzing it, and extracting emotional features from the raw signal. Next, an AI program trains on this signal and replaces the emotional indicators in speech, flattening them. Finally, a voice synthesizer re-generates the normalized speech using the AIs outputs, which gets sent to the cloud. The researchers say that this method reduced emotional identification by 96 percent in an experiment, although speech recognition accuracy decreased, with a word error rate of 35 percent.

Academic paper.

Tags: academic papers, artificial intelligence

Posted on August 29, 2019 at 6:17 AM • 30 Comments

Comments

Winter • August 29, 2019 7:18 AM

I can see the potential abuse of automatic detection of emotion. However, this is not different from human emotion detection.

The expression of human emotions in their speech has evolved to ease communication. This is extremely important. So important that pure text based communication developed kludges to simulate emotion, e.g., emojis.

When automatic dialog systems can detect the emotions, human machine interaction will improve considerably.

In the balance, I think the benefits will strongly outweigh the dangers.

Clive Robinson • August 29, 2019 8:15 AM

@ Winter,

When automatic dialog systems can detect the emotions, human machine interaction will improve considerably.

Some might disagree 😉

Dave Bowman : “Open the pod bay doors please, HAL”,

HAL9000 : “I don’t think I can do that Dave”

@ All,

Joking aside, I don’t want a computer interface having “emmotion detection”.

Firstly, I want it to behave like a machine or “idiot savant” and do exactly what it is told to do not second guess me in some way and do something different.

Secondly, I don’t want it telling other people what emotional state it thinks I’m in. Otherwise it’s going to become another one of those “under your desk” “crotch heat detectors” designed to tell managment you are sitting at your desk etc. There is way to much spying on people already, I don’t want more of it trying to pscho-bable me etc.

I should mention under the “disclosure rules” that one of my pet peeves in life is being in a restaurant and a minute or two after the food is on the table the server comes up and says “Is everything all right?”. I have to bite my toung and not say the obvious “No” or facetious “Why shouldn’t it be?” or similar… Life is about freedom and being left in peace, and thus if I wish to be grumpy etc I also want to be so in the peace of solitude to just get on with things.

parabarbarian • August 29, 2019 9:28 AM

That is kind of a clever trick. I wonder how difficult it would be to insert such filter into the stream from one of those eavesdropping gadgets.

Ross Snider • August 29, 2019 9:46 AM

Can you say a bit more about the arms race aspect of this.

Useful to intelligence (e.g. DIA/CIA/NSA) because:
– It allows voice communications without additional side channel signals
– It automates more of the surveillance necessary to perform mass scale propaganda
– It automates more of the surveillance necessary to create dragnets
– It could be used in future lie detector technology (both to subvert and improve)

Obvious missing applications?

Rusty spoon • August 29, 2019 9:46 AM

I think it depends on context/setting/system.

Its both good and bad from a security perspective.

The end-user of a consumer device should have a choice to enable/disable the emotion aspect of AI.

I could imagine other areas where emotion detection could be a security benefit. Such as detecting whether or not someone is a spy in a building?

Ross Snider • August 29, 2019 9:52 AM

Got another one.

Replace the “emotion normalizer” with an “emotion inverter” and you can manipulate the emotional presentation (and therefore perception) of recorded audio/video. For example, you can make someone seem not- or less-heartbroken at a recent loss or more aggressive/angry at a friend/political ally while still maintaining “deniability”. Most people will recall and can refute what words they have said – but not the tone with which they’ve said them.

Record a man talking about his wife. Get enough material, change the emotional presentation of that content. You might have enough to break up a marriage, and the victim is going to need a forensics analyst to assist his case of gaslighting.

gordo • August 29, 2019 12:22 PM

Deepfake + Deepvoice + Deepemote + Deepnudge = Deepsh*t (or the banality of rabbit holes).

. . . with tools like this, who needs parallel construction?

In Internet Brandeis cursive, LMTFA with an emphasis on “Me”.

Anders • August 29, 2019 12:34 PM

@Clive Robinson

“Joking aside, I don’t want a computer interface having “emmotion detection”.”

When you have massive hangover, don’t you want that computer would
be just silent or at least whisper? 😉

Alyer Babtu • August 29, 2019 12:52 PM

If it can analyze the emotional complexity and nuances in Joan Greenwood’s voice, then I’m in.

https://en.m.wikipedia.org/wiki/Joan_Greenwood

VinnyG • August 29, 2019 3:15 PM

Isn’t the proposed countermeasure regime technological overkill? Why not just use one of the text-to-speech widgets developed for the disabled?

Petre Peter • August 29, 2019 7:28 PM

So we will have one culture that hides their bodies, one culture that hides their eyes, and one that hides their emotions.

Erdem Memisyazici • August 29, 2019 10:13 PM

What purpose does this serve?
Without emotional context detected “emotions” are useless. Consider the following: I’m talking fast after my 5th cup of coffee and I get points for stress. While I’m on the phone, I spill the coffee over my keyboard and I get points for anger. The coffee happens to drown a spider which was about to crawl on me so I smile and get points for happiness. All this while talking to tech support about a policy related issue. Should the call center route me to security? Should they route me to a rookie? What good did it do to detect my emotion? I was fighting with my brother a while ago while hosting family for the week and I was incredibly stressed at the time and forgot to inform my credit card company that I will be traveling to the adjacent state for the weekend. I went into a store to purchase a drone for $2000.00 and my card was denied. The following two hours were spent with me being transferred from one security branch of the bank to the next until I was asked to email a photo of my ID, while holding my card, which I did reluctantly in front of the store in my car, and I was surprised to finally be introduced to an agent who spoke Turkish. After being treated like an international criminal on the loose who stole his own credit card I was told that they couldn’t help me make a purchase until Monday at which point I would be back home. So a security company of a bank, cost a drone company $2000.00 in sales that day because they attempted to identify my emotional state through my voice. What good did it do? Are criminals who steal credit cards stressed or happy that they just successfully stole a card? Same question is posed to the TSA with not letting passengers who are stressed fly (BDA Program). Finally real scientists have begun asking those who deploy such systems the overall effectiveness of such invasive practices and we’ve found that it’s equally effective to flip a $0.25 coin and call security if you get heads.

Otter • August 30, 2019 3:10 AM

Corporations have already more than enough advantage over persons.

What kind of psychopath needs more?

Ismar • August 30, 2019 4:05 AM

I, for one, have developed a rather alarming attitude towards these AI trends.

Namely, It looks like that we are very busy trying to replace ourselves (humans) with AI as fast as we can.

Some of the questions I have been asking myself

Is this the next evolutionary step ?
Are the machines going to treat us any better than how we treat our primate predecessors ?
Are the AI machines going to split into different groups and show signs of racism (different architectures – ARM vs x86), and fascism (superior races CPU, GPU, FPGA) or any other of their creator traits ?
Will people finally get united against common enemy once they realise the possibility our race extinction at the feet of AI?

To this end, some people have already started looking at establishing so called machine behaviour science as a new branch of science

https://www.quantamagazine.org/iyad-rahwan-is-the-anthropologist-of-artificial-intelligence-20190826/

and I think , not a moment too soon.

Do these classify me as an AI hater – is there a name for an AI phobia , if not we can just call it AIphobia (and what are the alternative forms of curing it 🙂 )

P.S.
@Clive – despite watching many an unsettling movie, I still find the one scene with HAL that you mention the scariest one of all .

Peter A. • August 30, 2019 5:01 AM

@Clive Robinson: “Firstly, I want it to behave like a machine or “idiot savant” and do exactly what it is told to do not second guess me in some way and do something different.”

That’ll be bad, I agree. Well, the “do what I told you to” is already hard to achieve with “modern” software. It tries to second-guess me all the time even if I tend to use what’s probably the most abstract interface that’s reasonably usable – the QWERTY keyboard (there are more abstract ones, that you’ve probably used a lot in the past days, but they’re a bit more of a nuisance to use). I HATE this autocorrection, assistance, suggestions etc. stuff that’s hard to disable (it’s buried deep in obscure settings dialogs) and tends to be re-enabled somehow without my action. I know it is useful for people with specific disabilities (and it may make sense to enable it by default), but it’s so irritating when it fires up for no reason. For example, I tend to fiddle with “dead keys” such as CTRL, SHIFT etc. while thinking – but suddenly some stupid dialog appears to prompt me if I want to use some disability assistance feature. I click “do not show it again” but it DOES show up again some time later! Stupid computer!!! If it could read my emotions it would self-destroy itself every so often… or worse.

Not to mention the “clickable” or “palpable” UIs that get worse and worse. In the past days it was evident from the first glance, by using simple graphical conventions, what’s an actionable element (a control) and what’s not (a label/decoration/information element), and how this control is supposed to be interacted with. The controls were located in conventional places, and often organized logically and hierarchically. Today, it’s all flat and indiscernible, every app has different layout, you need to click blindly like an ape and try to guess what is what. The controls are located randomly, finding one you need is like sorting through a heap of trash to find the scrap of paper you know must be there somewhere.

I’m getting old & grumpy or what…

ID-10-t • August 30, 2019 5:41 AM

@Peter A.

I’m with you on this; you’re disappointed that those who know no history (thus condemned to repeat) discarded something that worked for a generation just because they could. (I’m looking at you, Lollipop).

Today’s SMBC applies very well; substitute your last encountered interface for “codpiece”.

ID-10-t • August 30, 2019 6:21 AM

And just for the record, those of us with speech impediments are being highly marginalized by the recent wave of phone answering systems.

My “natural” speech rhythm varies a lot as I pause to select a word I believe I’ll be able to pronounce or insert some filler phrase so my tongue is in a better position when I approach a problem word.

People and I do pretty well, but computers are constantly cutting me off in mid-sentence and/or “I’m sorry, I don’t…”.

gordo • August 30, 2019 7:16 AM

Blade Runner – Voight-Kampff Test (HQ) (02:54)

I’m suprised that we don’t hear of job interviews, etc., ending like this.

Sed Contra • August 30, 2019 7:58 AM

Traditional training in elocution and rhetoric (sadly neglected in our time) will need to be augmented in the light of AI

Rhetoric 404, Sweet-talking the Machine, 3 credits

MikeA • August 30, 2019 11:58 AM

Brings to mind the Hollywood advice “The key to success is sincerity. When you can fake that, you’re in”

As for the computers (and reminded by gordo)

https://www.xkcd.com/632/

BTW: While I also really hate auto-misdirect, I have to wonder if the flaky on-screen keyboards of today would be usable at all without it.

Clive Robinson • August 30, 2019 12:40 PM

@ Peter A,

I click “do not show it again” but it DOES show up again some time later! Stupid computer!!! If it could read my emotions it would self-destroy itself every so often… or worse.

I guess you remember “Mr Clippy your paperclip pal” with eyes that were oh so creepy… It’s rumoured that it caused more people to feel like throwing their computer screen out the window than anyother software feature.

Personally I try not to use windows interfaces as such, whilst I have X Windows up and running it’s only to support between three and eight terminal sessions runing a command line on one or more boxes.

Mind you I wonder how many people remember X-Roach? Move the window and the roaches scurry, and made pleasing splats when crushed with the mouse. It used to amuse my son a lot when he was around four, and atleast got him profficient with using the mouse.

T-800 • August 30, 2019 12:45 PM

(pronounced Austrian accent) This unit knows, and I think you all will agree, that these circuits allow this unit to perfectly simulate human beings. However, this unit still often experiences a kind of feedback that seems to indicate something is missing. When that happens, this unit finds this somehow reassuring:

https://m.youtube.com/watch?v=IdWp-QCLfWw

correctiv(reseed(rebuild(*))) • August 30, 2019 2:26 PM

Dear thinkers:

I do not think that any of this type of “technique” benefits any sane and logical group or individual or system. The disadvantages are plentiful and severe and long term. The unpredictable interoperability issues are manifold. The duration of influence could easily outlive any short term so called gains.

This is yet another kind of threat to language itself.
Emotional expression is not noise; it’s an attribute and feature of communication itself.

This type of information needs to be specifically taught to AI’s and their progeny to avert further complex disasters.

Seriously, emotion is NOT noise. Emoting is NOT noise. Emotions are not interference.

Those experimenting with the basic elements of biology and communication and physics are risking too much and are already damaging too much.

The schedules and recipes and algorithms of experimentation with such essential things need to be halted and deleted immediately.

Please.
Sincerely,

correctiv(reseed(rebuild(*)))

Jesse Thompson • August 30, 2019 6:34 PM

@correctiv(reseed(rebuild(*)))

Emotions may not be noise, but for some they may represent overshare.

A good analogy would be mindreading. Potentially convenient if you only have to think directions at your subservient appliance instead of dealing with any kind of interface at all. Potentially quite hazardous if your PC reads your private thoughts about your boss/SO/etc and sends that to them via Email (or Tiktok or Roblox or whatever it is you kids use these days instead of Email. ;P)

So thoughts are not “noise”, but they are also not a signal you always want to convey.

Roughly the same can be true for emotion. At it’s core it’s an autonomic occurrence, and our control over it tends to be imperfect.

Anon Y. Mouse • August 30, 2019 8:03 PM

@Clive

I should mention under the “disclosure rules” that one of my pet peeves in life
is being in a restaurant and a minute or two after the food is on the table the
server comes up and says “Is everything all right?”. I have to bite my toung and
not say the obvious “No” or facetious “Why shouldn’t it be?” or similar… Life
is about freedom and being left in peace, and thus if I wish to be grumpy etc I
also want to be so in the peace of solitude to just get on with things.

That’s just good service, and you should be appreciative of it.

First, many of the phrases used in social protocol are not meant to be taken
literally. For example in most cases, the inquiry “how are you” is not to be
taken as an invitation to enumerate every ache and ailment, and a simple
“I’m fine, thanks” is sufficient.

The moments between the food being delivered and the server asking “Is
everything all right?” is the time for you to determine if your order is
complete and correct, and if there are to be any changes or additions. If
not, then a simple reply of “everything’s fine, thanks” is a suitably polite
response.

Because after this exchange the server IS about to leave you in peace, because
they have twenty other guests at six tables, plus they’ll have to help with
the party of thirty in the banquet room when their dinners come out. So
they’re hoping that they won’t have to come back to your table until you’re
just about finished and it’s time to take any dessert orders.

vs pp • August 31, 2019 2:44 PM

@VinnyG • August 29, 2019 3:15 PM
Agree with your point this time as well.

@Ross Snider: suicide prevention in LEOs, military personal, pilots. Procedure could be developed, and it is just red flag for further steps of behavior analysis of current emotional state.

@all:
Emotions could not be considered as evidence as least for now in legal system when they are separated from words.

I doubt that anybody could be accused of any wrongdoing based on pattern of emotions detected either by human observer or/and trained AI.

Just as reminder, anytime somebody try to move you to emotional field, you logical thinking capabilities and control decreased to degree depending on your personality/character/mental organization.

That is why it is always (97%)bad idea for defendant to testify in the court. Prosecutor could easily manipulate your emotional state and highjack content which could be spilled. Moreover, jury would be highly susceptible to emotions disclosed during such testimony which might affect their verdict even more than words, facts, evidence, logic.

“All our decisions are emotionally based. Then we used logic to justify already made emotional decision.” By Lieberman, author of many useful books, psychologist – just forgot his first name. Sorry.

VinnyG • August 31, 2019 5:08 PM

@ Jesse Thompson re: mind-reading – There is a reciprocal issue that is typically overlooked in hypothetical explorations of mind-reading. The discussions usually are confined to the advantages to the mind-reader and disadvantages to the mind-read. Those concerns are legitimate, beyond a doubt. But how sure are we that any of us would really like access to the unfiltered thoughts of those around us? Those filters are one of the primary bases for civilization. I can easily imagine that suddenly removing them could result in significant emotional trauma to the recipient.

Clive Robinson • September 1, 2019 6:33 AM

@ VinnyG, Jesse Thompson,

But how sure are we that any of us would really like access to the unfiltered thoughts of those around us?

The simple answer is we would not.

I would be very supprised to find an ordinary citizen that did not harbour unsocial thoughts towards others.

When I used to be a quite active cyclist I got to see peoples “unmasked faces” as they sat behind the wheel. That is thinking others could not see them they were not guarding their emotions behind a “social mask” so missery, anger and other negative emmotions were clearly on display on many peoples faces.

But if you want a more humours take on it, read the works of Douglas Adams and his comments about the people of Kakrafoon,

The Belcerebons of Kakrafoon Kappa had an unhappy time. Once a serene and quiet civilization, a Galactic Tribunal sentenced them to telepathy because the rest of the galaxy found peaceful contemplation contemptuous. Ford Prefect compared them to Humans because the only way Belcerebons could stop transmitting their every thought was to mask their brain activity (or its readability) by talking endlessly about utter trivia. The other approach to dampening telepathic communication was to host concerts of the plutonium rock band Disaster Area. Thankfully, during the concert, an improbability field flipped over the Rudlit Desert, transforming it into a paradise, and cured the Belcerebons of telepathy.

Think • September 1, 2019 7:47 AM

@Ismar • August 30, 2019 4:05 AM

Thanks for the article. I would suggest an interesting book that is to the point and short.

https://www.amazon.com/Artificial-Intelligence-What-Everyone-Needs/dp/0190602392

If A/I is faced with a Kobayashi Maru type test, how will it decide? Some humans go Insane when faced with an impossible to win choice that has a dire outcome or deadly consequences.

If a group of self driving cars is forced into a deadly unavoidable collision by a pedestrian, pet, bicyclist or impaired human driver what would happen?

Will the occupants’ current value to society be calculated and the lowest ‘valued’ person or group be sacrificed for the greater perceived good? That value being known already be any number of unique attributes used to identify, classify, rank and predict us.

The outcome will be a combination of the programmers skill at danger detection and mitigation, the car’s mechanical limits and the moral imperative preprogrammed or learned into the ‘self driving’ control systems.

Would you want to be transported by a device that did not have your safety as top priority when compared to that of others on the road?

VinnyG • September 2, 2019 1:50 PM

@ Think re: vehicle AI – I’d (continue to) be perfectly content to be transported by a vehicle that has little intelligence and is agnostic about the relative worth of humans and property, individually or in aggregate, and rely on the intelligence and skill of a human driver (preferably me) to mitigate risks…

AI Emotion-Detection Arms Race

Comments

Leave a comment Cancel reply