Determining What Video Conference Participants Are Typing from Watching Shoulder Movements

Accuracy isn’t great, but that it can be done at all is impressive.

Murtuza Jadiwala, a computer science professor heading the research project, said his team was able to identify the contents of texts by examining body movement of the participants. Specifically, they focused on the movement of their shoulders and arms to extrapolate the actions of their fingers as they typed.

Given the widespread use of high-resolution web cams during conference calls, Jadiwala was able to record and analyze slight pixel shifts around users’ shoulders to determine if they were moving left or right, forward or backward. He then created a software program that linked the movements to a list of commonly used words. He says the “text inference framework that uses the keystrokes detected from the video … predict[s] words that were most likely typed by the target user. We then comprehensively evaluate[d] both the keystroke/typing detection and text inference frameworks using data collected from a large number of participants.”

In a controlled setting, with specific chairs, keyboards and webcam, Jadiwala said he achieved an accuracy rate of 75 percent. However, in uncontrolled environments, accuracy dropped to only one out of every five words being correctly identified.

Other factors contribute to lower accuracy levels, he said, including whether long sleeve or short sleeve shirts were worn, and the length of a user’s hair. With long hair obstructing a clear view of the shoulders, accuracy plummeted.

Posted on November 4, 2020 at 10:28 AM16 Comments

Comments

Clive Robinson November 4, 2020 12:53 PM

In a way, this is an extension of earlier techniques that only had coarse almost blobby images of hand movments.

Such as those from “through the wall microwave imaging”… Or THz systems where clothing and long hair in effect become invisible.

Which raises the question of “Will other imaging, such as passive imaging from WiFi access points give sufficient information?”…

Eyes November 4, 2020 1:41 PM

Or you just look at the screen itself – through the eyes of the other person, like a mirror. Sure, HD resolution of the webcam / video feed is a must.

JohnDoe November 4, 2020 4:02 PM

It’s one thing with grandma hunt-&-peck typing a couple words per minute. How well does it work with folks who touch-type >60WPM (> 5 characters per second)? Shoulders don’t move too much then. Also I expect full size keyboards versus the tiny ones on laptops will matter. And your big jumps are going to be reaching for Delete or Shift or Control or arrow-keys or tab or numbers or …

Clive Robinson November 5, 2020 12:51 AM

@ Ismar,

this is next to useless

You forgot to add “at the moment”.

We tend to say “next to useless” with all new potential surveillance technology, yet give them just a year or two and they become sufficient to be “weaponised” by those with pockets full of taxpayer money.

As our host has noted about security/privacy attack vectors in the past, they do not get worse with time…

Infact frequently they get better rather quicker than we would like. Modern “managmant” thinking has alienated the work force and demonized them as indolent, lazy and unproductive. Thus as with Amazon they watch employees in ways that would get you prosecuted for stalking or harassment if you did it to your neighbours or abuse if you did it to your family members.

You might remember the “Hot crotch detectors” a UK newspaper put in under employee desks and what that caused.

What should scare you is not that some moron in managment thought it would be a good idea, but that some company had designed, manufactured and put on the market such sociopathic tools in the first place…

Likewise do you remember “voice stress analysis”? In the UK an insurance company thought it would be a good idea for what they claimed was “fraud detection”[1]… At the time they were quite blatent about it, but as far as I can tell they nolonger use it. Which makes me think the only people to make money out of it were the people marketing such products.

[1] The company concerned was strongly related to “Fred the Shred” who was accused of being rather more than a sociopath. His behavious brought down significant criticism upon him during the banking crisis, not just by newspapers but by even a Minister of State and the then Prime Minister in the UK.

Sheila S November 5, 2020 4:02 AM

This may not as yet be very accurate – most newly invented things are not.

But we may add other environmental clues available over a live video connection:

  • the sound of the keyboard as keys are struck
  • gaze of eyes especially when looking down at keyboard
  • any dimming or otherwise on any reflective surface in the room
  • direct reflects from dark spectacles worn to prevent eyegaze direction analysis
  • otherwise imperceptible nods of the head that can be correlated to other signifiers such as keyboard sound or shoulder movements
  • etc

Curious November 5, 2020 4:17 AM

I wonder how many people type with one finger at a time, and how many, like me, typing with both hands with eyes on the screen and not the keyboard. I learned to type on a typewriter a loong time ago. It was called ‘the touch method’ iirc.

Petre Peter November 5, 2020 6:50 AM

Now i know why the communists don’t like people with long hair. They are afraid of rock music and people who conceal their shoulders while typing.

Andrew November 5, 2020 10:23 AM

@Bob Paddock

I expect they did not account for people using Dvorak or Coleman keyboard layouts.

Even if they did, I wonder how well it would work. Dvorak, for example, optimizes layout by putting the most commonly used (in english) letters on the home row, so there’s a good chance it reduces movement enough to affect accuracy.

David November 5, 2020 7:59 PM

So don’t multitask and check your bank account while in a video call.
I try to keep open programs to a minimum when I am doing anything sensitive anyway

My concern is petrol pump PIN number keypads in full view of HD security cameras

xcv November 5, 2020 8:30 PM

In a controlled setting, with specific chairs, keyboards and webcam, Jadiwala said he achieved an accuracy rate of 75 percent. However, in uncontrolled environments, accuracy dropped to only one out of every five words being correctly identified.

Other factors contribute to lower accuracy levels, he said, including whether long sleeve or short sleeve shirts were worn, and the length of a user’s hair.

Typing accuracy? Length of hair? The emphasis on “control” as such? Maybe the length of the fingernails, too? Anyways, there’s a skinhead boss cutting hair and advertising an open position for a young pretty female secretary. Hoping to score somebody like Miss Moneypenny from the James Bond series.

It’s a sexual harassment suit, hands down, not to mention a double entendre with “hackers” versus “crackers” in an “outside” context where hackers give nasty haircuts, and crackers are snipers, aiming rifle shots at the head of a male (?) targeted individual, (probably not talking so openly about it if the targeted individual is female.)

JohnDoe November 5, 2020 10:32 PM

@ Sheila S:

Hmm. Now that’s an interesting thought. What about not just the sound of each key, but the time between keypresses?

Video is what? 60fps? 30fps? Maybe less. Often a lot less. But if sound is 44khz, you’d have pretty fine resolution there. It takes time to reach between keys. Or delays when switching hands to keep keys in order.

Sound could tell you a lot.

Add in an AI system to blend all these data sources together…

Hmm.

Curious November 6, 2020 4:14 AM

Btw, I just realized that when typing passwords on a keyboard, I always look a the keyboard when typing. 😐 So I don’t use touch method for when typing passwords on a keyboard.

JohnDoe November 6, 2020 5:25 PM

@ Curious:

Heh, reminds me of the fellow who’s password would only work when he was sitting, but never when he was standing up. Turned out he touch-typed when sitting, but looked at the keyboard when standing. And someone had swapped his S & D keys.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.