Using EM Waves to Detect Malware

I don’t even know what I think about this. Researchers have developed a malware detection system that uses EM waves: “Obfuscation Revealed: Leveraging Electromagnetic Signals for Obfuscated Malware Classification.”

Abstract: The Internet of Things (IoT) is constituted of devices that are exponentially growing in number and in complexity. They use numerous customized firmware and hardware, without taking into consideration security issues, which make them a target for cybercriminals, especially malware authors.

We will present a novel approach of using side channel information to identify the kinds of threats that are targeting the device. Using our approach, a malware analyst is able to obtain precise knowledge about malware type and identity, even in the presence of obfuscation techniques which may prevent static or symbolic binary analysis. We recorded 100,000 measurement traces from an IoT device infected by various in-the-wild malware samples and realistic benign activity. Our method does not require any modification on the target device. Thus, it can be deployed independently from the resources available without any overhead. Moreover, our approach has the advantage that it can hardly be detected and evaded by the malware authors. In our experiments, we were able to predict three generic malware types (and one benign class) with an accuracy of 99.82%. Even more, our results show that we are able to classify altered malware samples with unseen obfuscation techniques during the training phase, and to determine what kind of obfuscations were applied to the binary, which makes our approach particularly useful for malware analysts.

This seems impossible. It’s research, not a commercial product. But it’s fascinating if true.

Posted on January 14, 2022 at 6:13 AM35 Comments

Comments

John January 14, 2022 6:47 AM

Hmmm….

Seems to me to be a diversion from really figuring out what is going on!

That said I put a wideband crystal set near an old cell phone. LOTS of rf packets all night??!!

Phone didn’t work well either. Missed calls, etc.

John

Jon January 14, 2022 7:13 AM

It’s possible.

See, the whole point of malware is to make it do something that it otherwise wouldn’t – and that activity is indeed detectable.

If it weren’t infected, it wouldn’t do that… 😉 J.

EvilKiru January 14, 2022 8:16 AM

Wouldn’t you need to compare it to an EM signature from before it got infected? And won’t that mean that if you install the (not yet available) detection software after you’ve already been infected, it wouldn’t alert you to that infection, but it it might alert you to the next infection.

Merson January 14, 2022 9:16 AM

…sounds like total B.S.

The referenced Abstract is painfully vague & worthless.

This is either a joke or scam.

Arclight January 14, 2022 9:40 AM

This sounds a lot like “before and after” fingerprinting, where you would characterize how the device normally behaves and then periodically check it again. IOT devices are usually pretty specialized, so it probably would change its CPU activity and wireless transmission patterns somewhat if it’s doing anything other than the narrow function it was built for.

“Wake up every 20 minutes and send the current temperature and status of the coffee bean hopper” should be distinguishable from “scan for active directory servers and exfiltrate all of the employee information.”

Clive Robinson January 14, 2022 9:44 AM

@ Merson,

This is either a joke or scam.

Sorry it’s neither.

It is however based on work I and others have done many years ago, and I have described it in part as part of “Castles-v-Prisons”.

@ EvilKiru,

Wouldn’t you need to compare it to an EM signature from before it got infected?

No.

The malware is in effect a program of it’s own, and is “determanistic”. This means it has it’s own signitures.

If you make a “matched filter” for the malware signitures, then the malware gets not just detected but identified on an increasing probability of the number of the malwares signitures used.

At some point using plain matched filters gets cumbersome, so you can replace them with something a little more flexible like “Machine Learning” that with the right training data will do the same as the matched filter set.

Ted January 14, 2022 10:15 AM

I’m not sure if I have access to the whole paper through the ACM digital library site.

However, this appears to be a pdf of the paper.

https://hal.archives-ouvertes.fr/hal-03374399/document

From the paper: “Our malware selection encompasses three types, which are accurately representing malware targeted on IoT devices in the wild: DDoS, ransomware, and kernel rootkits.”

Well, the paper does seem to aim towards the practical regarding iot devices. I haven’t read it yet though I would expect that it has some pretty interesting info.

Chelloveck January 14, 2022 10:41 AM

I’m sure I’m not the only old fart here who remembers debugging via AM radio. Early PCs had frequencies (or harmonics) in the AM band, and you could hear what they were doing setting a radio next to it an tuning it properly. It was pretty easy to distinguish the sounds to know (in broad strokes, anyway) what the computer was up to. This sounds like a similar concept, especially given that they’re targeting IoT devices (which have slower clock rates and fewer threads/processes than modern desktop machines).

TexasDex January 14, 2022 10:41 AM

Seems absolutely possible to me. It might take a lot of EM samples to train, but malware behavior patterns could absolutely cause noticeable changes to the RF emmissions of a device. Think of a crypto miner–the change in CPU usage would be pretty noticeable.

dj January 14, 2022 12:06 PM

It is a plausible premise. From the earliest days of computers techs listened to audio converted from bus signalling and to the RF emitted by systems to diagnose problems. I have done the same more recently when trying to determine the state of a system. It was very useful diagnostic method, especially where there are no blinkenlights or other external indication of state, and it likely still is.

EvilKiru January 14, 2022 12:45 PM

I remember a physics professor putting an AM radio atop a homebrew computer and having the computer PLAY MUSIC on the radio with no physical connection between the two devices, back in 1976.

MikeA January 14, 2022 1:01 PM

@EvilKiru

I can back that date up a bit. Music via RFI was common by the early 1960s.
We had a more practical use for it at one installation, where the characteristic sound of the idle loop was the signal to put on our parkas and go into the machine room to start the next job.

Some have suggested that the original reason a transistor radio was tucked
into the IBM 1403 printer (for Group Captain Lionel Mandrake to find) was such “debugging by ear”.

Z.Lozinski January 14, 2022 2:19 PM

It may sound far-fetched, but consider Multi-Static Primary Surveillance Radar, so using TV (and other signals in the ether) to detect aircraft. Research program around 2015 by the UK National Air Traffic Service.

If you know what you are listening to, there is a lot of information in the electromagnetic spectrum.

If the Great Seal bug worked, why shouldn’t you be able to do this.

At the end of the day, it’s all Shannon’s Theorem. How much information is a device leaving into the EM environment?

Security Sam January 14, 2022 5:50 PM

Articulate obfuscations
With dubious foundations
Triggering conversations
With doubtful impressions

Authors have low citations January 15, 2022 1:06 AM

It’s hard to tell the paper’s 4 authors’ credibility based on their citations. The citation counts of the main author and the 3 co-authors are very low at 0, 0, 2, and 39.

Clive Robinson January 15, 2022 7:54 AM

@Moderator:

I posted a response to “Authors have low citations”

It was accepted by the blog.

Yet it’s still not displaying either on this page or in 100 Comments.

So I re-posted it and got,

Duplicate comment detected; it looks as though you’ve already said that!

Twice now.

There is obviously something seriously up with the core display software.

Scott Fenton January 15, 2022 7:58 AM

I can see how this may work in theory since resource utilization should generate a specific fingerprint, e.g. encryption operations from ransomware would have specific resource utilization profiles… but I’m curious how that can be distinguished from, say, encrypting the drive intentionally with some other software that uses the same hashing algorithm. For this to work they would need to be able to detect with sufficient resolution all of the other operations that the ransomware performs and have that fingerprint significantly different from other activities or combination of activities. Computers are highly deterministic but I’m surprised that this works as well as they claim it does.

If I look at some ransomware that runs some overhead/persistence code and then encryption with algorithm xyz then you would need to be able to fingerprint the overhead/persistence code to a resolution sufficient to be beyond contaminating signals like thermal noise or the sum effect of other normal operation. Is that code really so unique, even when folks rely on a finite number of coding paradigms or even duplicate specific code so readily?

Interested enough to read more, thank you for highlighting!

Clive Robinson January 15, 2022 9:14 AM

@ Scott Fenton,

For this to work they would need to be able to detect with sufficient resolution all of the other operations that the ransomware performs and have that fingerprint significantly different from other activities or combination of activities.

Let me introduce you to the notion of “probablistic security”. At it’s simplest it says you can not 100% know things are secure or insecure, –there actually is mathmatical proof to say why– so it’s fairly pointless striving to do so.

But it also tells you something else, which is to guide you from low probability to higher probability. On such is to “turn up the sampling rate” which in this case gets you more data.

In the world of real human fingerprints, the prints are too complex to meaningfully describe as a whole. So getting on for two centuries ago they started breaking them down into a list of easily recognisable features like loops and whorls.

This enabled fingerprints to be not just catalogued but indexed. So if the print had X-a’s, Y,b’s etc you could pull out a small subset of fingerprints that had the same number of features. You could then perfore a more detailed examination and reduce things down to say the ten most likely matches. This information as a “list of suspects” would be treated to other methods of investigation.

The system is not perfect the wrong people have been found guilty on an incorrect identification, but over all it’s a fairly reliable –though not perfect– tool.

In essence that is what this software tool does and it is probablistic in two ways,

1, Does it have the malware in it’s catalog.
2, Does the malware have sufficient distinguishing features.

But there is one type of malware, which though not codless relies on inbuilt code in the OS or App. Obviously these are “shared features” but that does not stop this system functioning, because of,

1, Order: the signitures appear.
2, Time : between the signitures.
3, Duration : of signiture.

Provides three extra search space dimensions. The third relates to “loops” in a program you may expect a given loop to have a maximum or minimum number of iterations in standard execution. Call them “low water” and “high water” marks, anything outside them indicates unexpected behaviour, which should be investigated further.

But there is another signiture that can be used,

4, Locality of code.

Where code is or more corectly how local it is to current code, does effect the EM signiture via the memory subsystem and caching. Put simply on average the further away the next instruction is from the current instruction the longer it takes to access it. A compiler can localise code but any malware that uses it can not, it causes significant disturbance.

It’s these signitures that build a fingerprint of the malware, and whilst over a decade of independent research has told me there are ways to reduce them they are neither easy or perfect.

Clive Robinson January 15, 2022 10:10 AM

@ Moderator, ResearcherZero, JonKnowsNothing, SpaceLifeForm,

I have tried rewording my post to “Authors have low citations” as @ResearcherZero and others have suggested currently and in the past.

And I’ve not added any “naughty words”, “links”, or similar that have been “assumed to be” responsible for a post getting “held for moderation”

There has been a significant change in the way the blog software works this year, and many people are noticing.

In the past some of us “debugged” the behaviours, but that has undesirable side effects for other users. So doing the same again would be done with some reluctance, but it’s looking strongly like it is necessary.

MikeA January 15, 2022 11:35 AM

@Robert Russel
“I thought that was a plot contrivance.”

Well, Mandrake needed to have some way to discover the plot (at least two meanings), but the specific

“radio somewhat hidden in machine room”

feels like a shout-out to whoever was “plausibility/continuity” adviser for that scene.

If only there were more more relatively savvy such persons, so we could

get past the “hacking involves wearing a hoodie and typing really fast while random text scrolls down the screen” cliche.

John January 15, 2022 1:42 PM

@Clive,

I used this basic technique to check out our newly manufactured products.

Radio noise sounds the same. Clock OK, processor probably running OK. Change stuff and observe ‘usual’ changes in sound.

Different noise, check clock, check chip selects.

It was a pretty good first level debug step.

I haven’t tried it with my cell phone. Have you? Maybe need a higher freq. receiver?

John

Clive Robinson January 15, 2022 2:32 PM

@ MikeA, Robert Russel,

get past the “hacking involves wearing a hoodie and typing really fast while random text scrolls down the screen” cliche.

Do you mean I don’t have to wear the dark mirror shades?

Darn I was just getting used to groping around in the dark 😉

The reality is most of my best hacks back in the 1980’s involved bashing around on the keyboard to find things like “buffer overflows” in login programs and the like.

The 90’s needed a little more thought but finding some nifty faults in AT&T / Unix Labs Unix terminal emulator on PC’s which alowed me to copy a link to sh under a users permisions in a hidden directory under the guest directory as they logged in was a real doozy.

But I guess getting UK Prime Minister “Mad Maggie” Thatcher so irrate she tried to get me set up as being a criminal, probably is the topper, and I didn’t even hack anything… Not that I knew it at the time, I just thought my sixth sense and stubbornness had stopped me catching a bullet which slightly later caught to people I was friendly with. That was untill some one pointed out some documents that became available at the UK National Archives under the thirty year rule…

The thing is I still do “hack” but not in the way “hackers” are thought of by most, and never did. It was like lockpicking, I was fairly adept at it before I was ten, but I never used it to do “wrong” things appart from an occasional practical joke (like turning the entire contents of a persons locker into a mirror image) or puting in a birthday card too big to get through the slots or when wearing the green iron someones uniforms with the creases in the wrong places like 90degree around on the legs and arms oh and with verticle pleats made horizontal.

I guess what I do these days is called “Security Engineering Research” though to me it’s often a way to relax and unwind…

Bruce McNair January 15, 2022 2:41 PM

I find it interesting that they are claiming 99.82% success rate. Think about the number of events that need to occur to make that claim (1 failure in 555). Now, to have statistical significance, you need about 555 failures to be able to make that statement. Did they run 300,000 independent experiments to get that result. Forget about the security claims, just look at the methodology…

Ted January 15, 2022 5:21 PM

@Bruce McNair

I find it interesting that they are claiming 99.82% success rate.

According to Figure 5 (a), the accuracy of the CNN type classification was based on 21,161 traces. 39 fell outside the predicted value.

I don’t know if that makes this statistically significant however.

Eric Nepean January 19, 2022 9:50 AM

What they are claiming to do is to fingerprint operational behaviours of Malware.

Consider the EM shielded Tempest rooms that are used in secure facilities. One reason for these rooms is to prevent the same fingerprinting applied to wanted applications on the secured computers and keyboards.

One mechanism of stealing data is to fingerprint the operation of the microprocessor driven keyboard against each keypress. This is used in espionage activies, not hackers per se as it requires nearby physical access. Once you have the keyboard fingerprint set that you can steal the sequence of keystrokes.

So yeah, this has already been demonstrated in another theatre.

Clive Robinson January 19, 2022 10:20 AM

@ Eric Nepean,

So yeah, this has already been demonstrated in another theatre.

And another time… Back half a century ago if not more. In the 1960’s to my knowledge and later in the 1970’s. It even got mentioned on a topical science news program “Tommorows World” from the BBC presented by amongst others Raymond Baxter…

someone January 21, 2022 11:39 AM

@MikeA re: IBM 1403 – I used one of those at my first IT (then DP) job. It was interfaced to a System/3 (10D iirc). I can’t imagine being able to hear the audio from a colocated radio when it was in operation – it was a very noisy piece of equipment, even with the door closed. I’ll need to watch the movie again to see how plausible that really seems.

john January 23, 2022 9:38 AM

This is a load of garbage. Neural networks have become paper-writing machines for academic researchers because they don’t need to know how anything works. Look at Google scholar and tell me I’m wrong. Every paper in malware detection claims extraordinary results and yet when you look at their training methodology, it is all bullshit. Look at this paper’s confusion matrix and ask yourself “what is class imbalance”?

Chris January 24, 2022 11:18 AM

This would be a nice way to detect the NoReboot hack on an iPhone (or Android phone if someone writes such a hack for it). If the phone is really shut down it won’t produce any stray RF, if it’s still running it will. And it will be impossible for malware to tell if a nearly radio receiver is listening for stray RF.

Clive Robinson January 25, 2022 4:05 AM

@ Chris, ALL,

Some time ago now, back in 2016 Ed Sbowden and “bunnie Hung” were talking about a security device for iPhones they called the “introspection engine”.

I did not like the design for good reason when it was first talked about,

https://www.schneier.com/blog/archives/2016/07/detecting_when_.html/#comment-278840

As you can see I was also more than aware that there were easy ways to create “covert” timing side channels in the phones ordinary operation that would get data out.

But that was not the first time I’d mentioned issues with mobile phones,

https://www.schneier.com/blog/archives/2012/02/computer_securi_2.html/#comment-175391

In there you will see I talk about building a simple RF detector using an AM radio.

But cautioning that,

“Having (potentialy) discovered the problem you then have to confirm it and do low level system testing to know what exactly is happening (suspicion is not proof, and acting on unwarented suspicion quickly ends up as a “tail chasing” excercise).”

That said my ending prediction of,

“I expect that keen young researchers wanting to make a “publication name” for themselves to get in on an academic career will start to write papers in this area in the next couple of years”

Made a decade ago, can now be seen as wildly over optomistic 😉

Clive Robinson January 25, 2022 7:27 AM

This is an attempted re-post of what I tried to post back January 15, 2022 7:50 AM

@ Authors have low citations, ALL,

The citation counts of the main author and the 3 co-authors are very low at 0, 0, 2, and 39.

Meaning what exactly?

I suspect you have no clue as you did not qualify it…

This subject came up yesterday in an entirely unrelated discussion between “@Winter” and myself,

hxxps://www.schneier.com/blog/archives/2022/01/people-are-increasingly-choosing-private-web-search.html/#comment-398602

Firstly though – I’ve been going on about malware “signitures” for years and likewise the “EM Radiation” asspects with regards security.

However it is still, a very very young field of investigation. Though to me atleast it’s been obvious since the mid 1980’s –as I’ve indicated on this blog over the years–.

Now with the price of “Software Defined Radios”(SDRs) being extrodinarily low and laptops of suitable power to run GNU Radio and similar common place, my advice to “pen testers” is they have to learn to use the kit not just as traditional “bug finders” but as software “bug finders” etc. Especially “Red Teams” who by definition can not take appart existing equipment to hook up probes and the like due to causing alarms and similar.

But getting back to those citation numbers, what they tell me is the authors are academically young and effectively just getting started on their careers. As such they are in probably the most creative / productive years and they are “under the wing” of a more prominent person who’s career trajectory has been changed by managment responsability and the like.

Due to the nature of the field of research being so young to academia and the way publishing works, I suspect a way more conservative “big name” in the citation game would not dare to even look at the paper. Even as an anonymous peer reviewer, let alone be a co-author… Which is just one problem with the “papers please system” academia is driven by, not the content of individual papers.

I’ve skim read the paper and much of it agrees with research I’ve done some years ago and talked about on this blog (see signitures in “Castels -v- Prisons”).

So I’m not surprised to find much of it in accordence with my own independent research. Where they and I have differed and they have moved away from my research is that I was looking at signitures to detect changes via hypervisors of activities of lightweight CPU/RAM cores in massively parallel arrays (something the computing industry is moving towards all be it with glacial slowness). Due to the proofs of security requirment the hypervisor I used had to be state machine based not Turing Engine based. Further to keep overhead low I used a micro-applet and matched filter techniques not Machine Learning.

The Authors of the paper have gone down the ML route as there instrumentation does not have to be lightweight. Also with a different objective in mind it is more appropriate with what is probably a single large CPU core and single block of Core RAM target, not one that is massively parrallel on a single chip. Obviously a decade later there are other equipment advantages as well.

Their objective differed to mine in that they are detecting the small signiture charecteristics of the actual malware and how they interrelate and use that as an indicator of malware being present. Mine was a lightweight method to indicate when parts of an application were not performing as intended.

But something further about you, I notice unlike the authors and myself, you hide your name?

Why?
What Reason?
What do you gain?

Or put it another way,

Why if you are trying to hide why on earth should anybody trust what you say and the potential motivations behind it?

Clive Robinson January 25, 2022 7:38 AM

@ Moderator,

I posted a response to “Authors have low citations” back on January 15 and it was accepted but failed to appear. So I reposted and I was told it was a duplicate.

So I have tried rewording my post to “Authors have low citations” as @ResearcherZero and others have suggested currently and in the past, with no joy as it was still not displaying either on this page or in 100 Comments.

So I made further mods to it again today and just tried re-posting it and it got, “held for moderation”…

I guess I’ll have to keep trying.

Clive Robinson January 28, 2022 5:15 PM

@ EvilKiru,

It’s been released!

Yes but with a whimper or a roar?

@ ALL,

The real point though is it gives a background to what others are observing.

I wish the authors well, and hope they hang in, in this particular area of research, as it realy is under investigated in academia and rarely talked about in engineering. So it hardly impinges on the ICTsec industry.

I can think of four other papers that have been written as almost “first papers” in this field, that have been given justified awards. But the authors then went down other avenues some out of academic research entirely.

It’s a shame, because my belly-aching about “it’s old stuff to engineers” aside, this research genuinely needs to be done, if the white hats want to stay ahead, above the black hats.

If you understand basic physics from high school you will know that there is no way that malware can ever be invisable in operation. Because it needs resources of all kinds in particular CPU cycles and thus time. Knowing how to measure resource usage is the fastest way to detect malware in use.

The problem is, that the direction computing hardware has gone in, is that detecting changes in expected operation or known signitures of malware is much harder than it otherwise could be.

As an industry we need to change for many important reasons. But that is only going to come with valid data.

The one thing the ICTsec industry realy lacks is “sensible measurands” to make the observed data sensibly testable. Obviously with out them genuine progress is going to be slow, at best. Whilst “snake oil” profitably flow and will both lubricate things that move to the point they are to slipery to grasp, and it will wick up in dim and smokey lamps, we call “dashboards” and the like, sheding more smoke and shadows than honest light to see by.

I’m sure there will be people that disagree with me, but if they want to make a point it should be data driven from usable measurands…

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.