Video Conferencing Apps Sometimes Ignore the Mute Button

New research: “Are You Really Muted?: A Privacy Analysis of Mute Buttons in Video Conferencing Apps“:

Abstract: In the post-pandemic era, video conferencing apps (VCAs) have converted previously private spaces — bedrooms, living rooms, and kitchens — into semi-public extensions of the office. And for the most part, users have accepted these apps in their personal space, without much thought about the permission models that govern the use of their personal data during meetings. While access to a device’s video camera is carefully controlled, little has been done to ensure the same level of privacy for accessing the microphone. In this work, we ask the question: what happens to the microphone data when a user clicks the mute button in a VCA? We first conduct a user study to analyze users’ understanding of the permission model of the mute button. Then, using runtime binary analysis tools, we trace raw audio in many popular VCAs as it traverses the app from the audio driver to the network. We find fragmented policies for dealing with microphone data among VCAs — some continuously monitor the microphone input during mute, and others do so periodically. One app transmits statistics of the audio to its telemetry servers while the app is muted. Using network traffic that we intercept en route to the telemetry server, we implement a proof-of-concept background activity classifier and demonstrate the feasibility of inferring the ongoing background activity during a meeting — cooking, cleaning, typing, etc. We achieved 81.9% macro accuracy on identifying six common background activities using intercepted outgoing telemetry packets when a user is muted.

The paper will be presented at PETS this year.

News article.

Posted on April 29, 2022 at 9:18 AM18 Comments

Comments

Charles Indelicato April 29, 2022 9:31 AM

This was apparent back in ’20, when Zoom and Teams became popular meeting apps. Anyone who has seen the pop-up reminding you you’re on mute (even if you were intending to speak to someone in your office and not to the meeting) knows the apps were listening. I prefer wired mics with a physical switch, although there could always be a rogue device that overrides the mute.

Chelloveck April 29, 2022 10:17 AM

There are a few interesting use cases for continuing to sense microphone input while ostensibly muted. The obvious one is the “you’re muted” warning when you start talking while on mute. Another is detecting the ultrasonic proximity signal some conferencing equipment uses to advertise its presence.

But the fact that this happens is something all users should be aware of and should be something they have to consent to and enable rather than being enabled by default and hidden under layers of obscure settings.

Software developers should also get out of the habit of routinely collecting telemetry. Hey, I get it. I’m a software engineer. I know how valuable telemetry is when debugging an issue or understanding how features are actually being used, to say nothing of its value to the marketing department. But routine collection erodes trust in your application and may inadvertently leak information about the customer. Just the fact that the application is connecting back to your servers gives third parties information the customer may not want to divulge.

JonKnowsNothing April 29, 2022 10:44 AM

@All

A recent MSM report of how home-voice-assistant devices continue to monitor conversations not just for the Key Awake words, but also extract information used for advertising and ad-auctions, generating targeted ads directed at the speaker but also increased revenue for the device provider.

The report indicates that the companies providing these listening devices have some weaselly definitions-explanations of what they are extracting from the ambient air.

===

Search Terms

Amazon

Echo interactions

to target ads

John April 29, 2022 11:14 AM

@Charles Indelicato,

Opening the wire is the best.

My cellphone switches between the physical jack and it’s internal mike apparently using different analog input pins?

So it is perhaps listening at the same time to two sources?

So much for opening the wire.

John

JonKnowsNothing April 29, 2022 12:13 PM

@All

re: speakers and mics

There have been a number of discussions on speakers and mics that can be found in the archives.

Long short is: speakers and mics maybe one and the same.

You might disable the mic, but the speakers will work as a mic just as well.

Matrix April 29, 2022 5:42 PM

@Charles Indelicato

Spot on.

And now on teams they backdoored it with the option of “press space to un-mute”. I’m always surprised with so many experts on security who don’t **ck do any reverse engineering on the binaries to actually see what’s going on specially how you exploit this new “keyboard” feature on the Microsoft ecosystem.

The so called developed societies are now entering a new “slave” condition where your working rights are efficiently being mined. As some on this blog use to pinpoint such entities, the “Guard labor” is on the rise and with upgrades and I must say with the so called network effect. That is it’s the enslavement/control is cheaper in today’s technology fabric.

aldo April 29, 2022 6:25 PM

@ Chelloveck,

But routine collection erodes trust in your application and may inadvertently leak information about the customer.

…for which there will generally be no punishment. People sometimes worry about GDPR etc., but we’ve yet to see much enforcement. Just a few flashy cases against the most obvious targets for their most egregious behavior.

If J. Random Appmaker were aware that implementing telemetry incorrectly could cost them millions of dollars—in practice—they might think twice about whether it provides sufficient benefit to offset the risk. Right now, though, that risk is an externality; if anything goes wrong, they’ll say “oops, we didn’t mean to collect that” (or transmit it unencrypted, or whatever), “we’re very sorry, we’ll do better, …”. And people on the internet will spend a day or two being angry, telling other people not to trust the company, and then the controversy will fizzle out. At worst, the FTC will fine them $0 and warn about a larger fine if they do it again.

(Remember when people got all upset about Windows automatically sending memory dumps when programs crashed, and said they’d stop using Windows if MS didn’t let them disable it? Well, MS refused to relent—at least for home users—and very few people really did stop. Just like when they’d complained about XP’s “activation” requirement in the previous decade.)

Ted April 29, 2022 9:56 PM

That is going to be a really interesting presentation.

I couldn’t quite follow every interaction analysis, ie: how the OS’s (Windows, Linux, and macOS) interact with both native and web-based video conference apps (Zoom, Slack, Teams, Webex, etc.).

But the paper definitely gives some really, really interesting insights into how these programs possibly integrate and give permissions.

It looks like Cisco’s Webex won the special attention prize 👏

To inform Cisco of our investigation results, we opened a responsible disclosure with Cisco about our findings. As of February 2022, their Webex engineering team and Privacy team are actively working on solving this issue.

All-in-all it’s certainly a little complex, and it’s great to see people care about it. There are so many reasons to want a mute button to mute 🤗

Clive Robinson April 29, 2022 10:37 PM

@ ALL,

Re : Speakers as Mics.

And “transducers” in general in information leakage side channels.

I’ve done several deep dives on this blog before about this but to give the reduced “Conferance Talk” version,

The first thing you have to keep in mind is a good knowledge of the physical laws of nature, not least being, Newton’s Third Law, of

“Every action has an equal and opposite reaction”.

Which can be more generalised with “action” being replaced with force/energy,

“Every force/energy has an equal and opposite force/energy”

It’s the fundemental law behind all transducers, with the perhaps not immediately obvious implication there are two forces at work.

Think “the cat sat on the mat”, the cat has a force that pulls it towards the “local” gravitational center of the earth, but also the earth has a force that pulls it towards the gravitational center of the cat. Both are present in all physical systems and they are symetrical and differentiable from Noether’s Theorem[1].

If you look at a DC Motor it is also a generator… and has to be if you do not want the coils to burn out or the power supply to be destroyed.

Thus as the magnetic field from the rotor winding pushes against that of the stator the stator pushes back and causes an opposition. Without going through the other electrical mechanical laws, engineers call the resultant generation “Back Electro-Motive Force”(Back EMF). It is proportional to the speed of the motor. Thus it can be used in “Motor Speed Controlers” where the load can be fairly variable such as in cordless drills and model railways and electric vehicals on uneven grades. Also as the speakers in audio systems are also motors called “Linear Voltage Displacment sensors/driver/transformer/trasducer”(LVDS LVDT) the movment of the cone can in larger units be subjected to a feedback circuit to make the displacment more linear over a greater range.

The issue is that both forces are seen as energy on the same terminals on the motor/speaker simultaneously but in opposition to each other. How then to seperate the in-going and out-going energy signals?

Well because they are effectively traveling in opposite directions you use a “hybrid coupler/combiner” that seperates the two energy signals. In the case of “audio on the wire” for a century now a circuit called a “2wire to 4wire hybrid/coupler” has been used in the “Plain Old Telephone System”(POTS) you can make one using other transducers known as “transformers”.

It has been said that the “audio bug” used in Soviet Era courtesy car radios in the passanger compartment was originally designed by Léon Theremin[2]. It is the first acknowledged use of an electrical speaker, playing an audio signal, simultaniously being used to spy on people. It seperated out any spoken voice or other noises in the passenger compartment from the radio audio output signal, by the use of a 2wire to 4wire hybrid, with the extracted signal being recorded by a wire recorder in a concealed compartment.

Modern “Digital Signal Processing”(DSP) can be used to do exactly the same task and the required circuitry comes “built-in” in nearly all audio codecs and driver chips used in consumer computers these days. Especially laptops where it’s effectively essential to do so, because the speakers have horrible frequency and amplitude responses. Due in part to their small physical size as well as “modal resonances”. The DSP Codec / Driver thus pre-distorts the audio signal going into the speaker to gain a linear air preasure response.

But the issue of transducers is way more general that is, it is,

“A device that converts energy from one form into another form”

And that basically covers any tangible physical device from atoms upwards that “does work” which is all physical systems apparent or not.

Even though it may not be immediately apparent all transducers reflect a signal backwards from the load to the source of the energy directly proportional to the action of the load. Simply because as the load changes the energy required to do the work changes and that goes all the way back to the primary energy source, as the law for “the conservation of energy” postulated by Émilie du Châtelet requires[3].

But the important thing to remember is that the behaviour of the load also reflects the environment it is in. That is the “total energy” involved is the vector sum of all the energy applied to it by all forces acting upon it.

The conservation of energy postulate gives rise to various equivalent energy laws one of which is the “conservation of charge” law which gives rise in turn to Gustav Kirchhoff’s First (current) and Second (voltage) laws also called in the more general form “Kirchhoff’s Charge Laws” they are fundemental to nearly all static circuit analysis. The First law is usually stated as,

“The sum of the currents/charge entering a junction/node is equal to the some of the currents/charge leaving the node/junction”.

Which is a fundemental axiom of not just electrical theory but all static systems that do work. That is the “zero sum” of “What goes in must come out” it can not “disapear”, though it can be “disapated” in time as the most fundemental form of pollution waste “heat energy” which is the vector sum of the energy “lost due to inefficiency” causing molecules and atoms to vibrate.

A consequence of this is “information” that is “impressed or modulated” onto energy or matter is also not destroyed just disipated beyond our currant ability to recover it. We call this vector sum various names but in general it is called “Noise” and it exists atleast as long as the universe is old (see cosmic background noise) and is increasing by the process of thermodynamic entropy as coherent energy is used to do work.

So at this point you have the required science to understand why such transducer duality or bi-directionality happens. Also more importantly why the “environmental signal” “impressed on the load” travels in the opposite direction to the motivation force right back to the energy source(s).

But now you have to ask what are the consequences of this?

Well consider the ambiant light sensor in your mobile phone, and how it effects the charge put into the display backlight. Which means charge from the battery that has “inefficiency” via the inherant circuit resistance causes changes to the battery terminal voltage. As the phone thanks to HTML 5 can now report that voltage back to a web site, your phone acts inescapably as a “remote environmental sensor”. Well you might say “so what”, but further consider that the ambiant light changes significantly when you put your phone upto your ear, or in your pocket. So indirectly can indicate if you are using the phone in “hands-free” mode even though there is no direct connection from the phone audio circuits.

But… consider a more general case, due to high efficiency energy conversion in modern switch mode power supplies they have a high reverse bandwidth for information.

Which means that every electrical device you use connected to the “mains power” provides a component signal of information in the “vector sum” at the mains power supply point to your house. Not just at your utility cabinate, but all the way across the supply grid to the generator where it causes a physical change in the alternator speed that in turn goes back through the steam turbine to the source of thermal energy that super heats the water.

Can that “information” be retrieved all the way back there?

Well the answer is, in theory yes, in practice it depends on two things,

1, Shannon Channel bandwidth.
2, Signal to noise ratio in a given bandwidth.

As previously noted “noise” is actually the fractional value of the “total energy” charge movment expressed as a vector sum (see KCL).

As a general rule engineers design systems so that the noise at the power input terminals of a circuit is as small as possible. They do this in two basic ways,

1, They reduce the bandwidth.
2, They reduce the effective source impedence.

The first reduces the reverse transmission of information, the second has the opposite effect. In the past the first solution was used but this was at best inefficent as it involved passive components with considerable “loss resistance” that could only be reduced by increasing physical dimensions.

With the advent of fast semiconductors the equivalent of the passive components could be synthesized at a “node” but KCL still applies so the required charge to hold the node at an invarient potential under changing load conditions has to be sourced or sunk from another node on a branch other than that of the load branch.

How far back this source node is depends on if it can store charge or not. Both inductors and capacitors store charge, as a result they interact with impedence and thus provide a bandwidth reducing function. This is not true of active circuits that synthesize inductors and capacitors they simply pass the charge requirments back to a more distant node. The more efficient the synthesis the further back the charge movment is passed and with it any information impressed on it…

So remember when you sit down to watch a naughty video on your computer or flat screen, their power supplies are betraying you, and alowing a signal that can be used to identify the naughty video back to the utility cabinate where the “smart meter” can see it and if instructed to do so send the identifing signal via a communications link to any point on earth… Likewise your washing machine will send a signal that reflects the wash load and wash program. So will any other appliance, so even your fridge betrays not just the fact you have opened the door but also the movment of “thermal mass” into and out of it’s internal environment. Your shower reveals you are probably in the bathroom likewise the hotwater heater in the kitchen or the sink garbage macerator, as do “saniflow” type toilets…

Can you stop such information leakage?

It’s actually not easy to do as I’ve explained in the past, but yes you can do it to a limited extent by “issolating the smart meter node” in various ways from the work load.

You need another source of charge that is firstly unpredictable to the observer at the smart meter node. In the past AC to DC conversion that then feeds large charge storage batteries and then uses a DC to AC converter to drive the “secure mains” circuit. It does however require you to know how to fit specialised filters between the AC2DC converter and the charge storage circuit to minimise information bandwidth. Or use alternative power generation such as a “natural gas powered” generator or “alternative / green” energy system such as Solar or Wind. Which secondly must have a much lower impedence and much higher bandwidth to it’s charge storage elements than your mains supply.

But the essential takeaway is that any energy path can carry information from one environment to another. As such any energy path, –even an open window– is a “communications channel” by which information can be monitored remotely. This includes the metal or stone construction of buildings that can carry “mechanical / acoustic” vibrational energy for which “Spike Mics” and similar were developed.

It’s also why I coined the expression “energy gap” which is what you actually need to, establish. As the old notion that “air gaps” were sufficient was very much not the case and caused much in the way of “muddled thinking” thus security leaks.

Back more than two decades ago, I had my arm twisted quite a bit by several academics to give the equivalent of this as a paper to present at PETS 2001 Workshop (the workshop did not happen for various reasons). For other reasons one of which was an ongoing despute with a University and it’s “River House Rat” I decided not to, and stoped with moving forward on the PhD… So the question I ask “If I had gone to PETS2002 would it realy have changed anything in the general scheme of information security?”

Honestly, probably not. But I would encorage people to take this snippet, and enlarge it back into a presentation for not just students but software and hardware designers and importantly managment.

The understanding that Privacy is directly tracable back to the fundemental laws of nature is essential if we are to move the InfoSec Industry forwards in a rational manner. InfoSec needs to be on a firm scientific foundation to stop most of the cr4p and snake oil we see in it. But as importantly it needs to learn from it’s history not ignore it.

[1] Which German Mathmatician Emmy Noether proved via her so called “First theorem”, that there is a

“Time/Temporal translation Symmetry”(TTS) relationship in all physical processes with the conservation laws just a little over a century ago in 1915, and it forms one of the most fundemental of physical proofs. It states in short that “Every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law”.

[2] Léon Theremin who was later a Professor of Physics at Moscow State University Department of Acoustics, and also designed “The Great Seal Bug” or “The Thing” that was given to a US Ambassador and caused issues for a number of years in the US Embassy in Russia. His other perhaps more widely shared invention is the “Theremin” ethreal sounding musical instrument you hear on the Beach Boy’s “Good Vibrations” introduction.

[3] Emilie du Chatelet was a French Natural Philosopher and Mathmetician who translated Issac newton’s “Philosophiae Naturalis Principia Mathematica” work into French. She was a lively and controversial figure in the French philosophical system of the time especially over her published work in 1740 of “Institutions de Physique”. She was also closely asscociated with Voltair. Her translation of Newton’s work was published a little over half a decade after her untimely death in 1749 and is still considered the difinitive translation. But very importantly she had added her own postulate of an additional conservation law for total energy. Based on what we now call kinetic energy of an object under motion, it led her to be the first to conceptualise energy as an independent entity, and to derive its quantitative relationships to the mass and velocity of an object.

Who? April 30, 2022 11:45 AM

It is much better using an external microphone with a physical mute button or, even better, a detachable microphone; where not possible, the best you can do is using a Mic-Lock style audio input blocking device in the hope videoconferencing station switches to it, believing it is an active microphone.

Never trust on software whose source code is not available. Even in this case, build the software from source instead of trusting some random binaries put online by the development team.

Clive Robinson April 30, 2022 6:58 PM

@ Who?, ALL,

Never trust on software whose source code is not available.

The Security Trust rule realy is,

Do not trust what you do not control.

But what does “control” mean in the case of software…

Unfortunately it means an intermate understanding of what the sodtware is and how it does what it does. Not just at the “software layer” you happen to be working at in the computing stack but “all the way down” and in many cases quite a way up beyond the user level into and past National legislation and into Global Geo-Political.

This is not a task that a single individual can realistically do in our current way of implementing the computing stack.

Which means that two other main issue areas come into play,

1, Human Trust.
2, Computing Stack mitigation.

The Human Trust, because almost by definition it will involve a “Human Team” and to function in the current computing stack implimentation we use they can not check how the team behaves or actually themselves[1] so they have to fall back on Human Trust which is inherently problematical[2] as it is a sociological issue not a technical issue. As I point out from time to time,

“Technological systems to address sociological issues will all fail at some point.”

The reason is simple,

“Society is dynamic and changes continuously and with it all sociological issues change continuously.”

We generally “don’t see it” as we are mostly creatures at a point in time[3].

But how to mitigate the failings of “Human Trust” well by changing the rules and thus the environment. As noted in the second point above we need to mitigate the computing stack we currently use. Whilst it sort of works, it is now rapidly aproaching compleatly unmanagable and worse it enforces very bad simplifications that reflect the failings of much human thinking, rather than the reality of the way the universe works.

I’m not going to go into it in any detail but note that a study of the “Whys of Log4j” would be instructive in some respects.

But of more importance is the problem of “bubbling up” attacks. Most developers work on the uper side of even a nominal high level language like C. Any attack from below that level in the computing stack can not be stopped by anything a developer can do at this level. Whilst people talk of “formal methods” all they realy do is stop coding mistakes, and some development / specification mistakes. They can do absolutly noting about an attack that changes the values in variables that lower level attacks can fairly easily accomplish (look up RowHammer or other “issues with silicon” attacks).

The thing is we do know how to limit if not stop such lower in the stack attacks. But they way we have alowed the stack to develop has caused it to be about the most susceptible to low level attacks that it can be, whilst being also just about the worst it can be in terms of unfathomable complexity…

Which brings us to your point,

Even in this case, build the software from source instead

It’s one of those statments I hate, because of all the assumptions behind it. It sounds good but is compleatly usless when it comes to security.

It’s only of use if you can read and understand not just the program source code, but all the libraries it uses as well. Then the person has to have intimate knowledge of the way to correctly and securely use the toolchain (few developers actually know this). Then there is the requirment that you be able to compleatly reverse engineer the binaries to spot vulnarabilities the tool chain could have added.

But it does not stop there, you also have to have a very significant knowledge of just how the silicon works as well…

Oh and all the same for the OS and kernel and linker / loader etc.

In short so very very few posses all the depth and bredth needed in these skills and areas that even in a croud of half a billion people that can and do cut-code they are very nearly invisable.

But I’ve discussed these issues in the past on this blog with regards “Castles-v-Prisons” or as others called it “C-v-P” or just “CvP”.

We know how to either fix or mitigate most of the computing stacks failings… The real question is,

“Will we just keep on making the problem worse or actually fix it?”

As far as the industry tragectory is concerned, “fix it” is not close let alone on the arc.

The reasons for this are many, but most have been discussed when talking about “Free Markets” and their failings, especially why they mostly go into “tail spins of death” unless subject to legislation or regulation…

[1] I won’t go into the nitty gritty details, but an individuals trustworthyness is based on the scope of the information available to them and their knowledge of the implications of the information. The more depth you have, generally the less bredth you have, thus the scope of your understanding is reduced even though the knowledge of the implications is rather more in depth. Therefore it is more than possible for a security defeating or weakening measure to be presented to team members in a way that makes it look to one or more individuals as the opposit. The idea of security “silos / compartmentalization” can actually aid in making systems “untrustworthy” in the security context even though there is no untrustworthyness in the human context of the team. Unfortunately most would assume that this gives Open Source security advantages. It turns out that is not actually the case because of disparity in knowledge[2].

[2] Information is not intrinsically good or bad it just is. How information is used is decided by a Directing Mind, that has both agency and purpose, neither of which is intrinsically good or bad. It is the observer of the system in use under a social policy that decides the good or bad. Thus we get,

1, Spying on criminals is seen by many as “good”.
2, Spying on citizens for control of people is seen by many as “bad”.

Even though they are as a process exactly the same, the observer falls into the “them and us” trap. Where “them” are implicitly bad thus acting against them is “good”, and “us” are implicitly good thus acting against us is “bad”. Which is just one of oh so many issues that Human Trust has that Security Trust does not.

[3] Even the apparent foundation stone of society, the Commandment of “Though Shalt Not Kill” is and always has been in reality shifting sand, based on expediency at a point in time. That is death was once a punishment for any social harm such as theft even accidents. Likewise the punishment for taking up arms against the social group (War), even still being in opposition to “the leaders” (dissent seen as treason) is sufficient for death to be sanctioned, and so on. All Commandmets wether written in stone or not, all social mores, morals, and ethics are fluid thus seen as bad, acceptable, good as society progresses forwards and changed from good, through unacceptable to bad and thus suitable for a sanction of death as society regresses for some reason. Authoritarianism and conservatism expressed through a power hierarchy that favours a few against the many are frequently the underlying mechanism by which society is regressed into stagnation. In fact you could actually reason quite simply that the ten commandments are all the work of the “devil”, and designed to visit the greatest moral sin of conservatism on mankind and thus preventing mankind from progressing into a better state… I won’t do so but it can be shown logically without any need for deities or other seductive arm-wavery notions of good or bad.

Ted April 30, 2022 8:36 PM

@SpaceLifeForm

That’s pretty cool. I think Catalin Cimpanu is now curating the Risky Biz newsletter. So I wonder if the paper will also be shared there?

Max May 1, 2022 3:57 PM

This is one of the reasons my headset has a physical mute switch (activated by swiping the mic boom upwards).

It works as evidenced by Teams immediately complaining that it can’t hear me (no background noise from the mic).

Eris May 5, 2022 11:08 PM

Generally speaking I think security researchers should focus on mitigation rather then explore the many clever ways to exploit an issue. They should also avoid providing PoC code. It’s like they do all the hard lifting for malicious actors under the pretense of promoting “public awareness”.

SpaceLifeForm May 6, 2022 1:18 AM

@ Eris

They should also avoid providing PoC code.

The problem with that is, if there is no PoC code, then the closed source vendor will not fix the problem.

Clive Robinson May 6, 2022 7:48 AM

@ Eris, ALL,

Generally speaking I think security researchers should focus on mitigation

Mitigation by definition is not a solition of an issue but an avoidence of an issue.

As such it fixes nothing, it actually creates more issues by creating unnecessary complexity, thus increasing vulnerabilities.

As @SpaceLifeForm points out suppliers of closed source software especialy have a long long history of not acknowledging let alone fixing security issues their shody practices have created. In fact many go into denial and reach for “defamation or similar lawyers” to force any researcher into silence.

The POC code released early kicks away that option from such organisations.

Further the standard method of vulnerability disclosure has taken on the weight of “Best Practice” which not only protects the researchers by kicking the legs out from other “attacks by lawyer” it protects the consumers who have paid for the software, without them having to resort to costly and lengthy action through the courts. With a time period such that attackers will make hay with the defective software, and the company managment will “pull out assets” etc so that there is nothing left including employee jobs by the time the court action gets to the eyes of a judge.

Mitigation is thus the worst of all possible active remediations.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.