Using EM Waves to Detect Malware

I don’t even know what I think about this. Researchers have developed a malware detection system that uses EM waves: “Obfuscation Revealed: Leveraging Electromagnetic Signals for Obfuscated Malware Classification.”

Abstract: The Internet of Things (IoT) is constituted of devices that are exponentially growing in number and in complexity. They use numerous customized firmware and hardware, without taking into consideration security issues, which make them a target for cybercriminals, especially malware authors.

We will present a novel approach of using side channel information to identify the kinds of threats that are targeting the device. Using our approach, a malware analyst is able to obtain precise knowledge about malware type and identity, even in the presence of obfuscation techniques which may prevent static or symbolic binary analysis. We recorded 100,000 measurement traces from an IoT device infected by various in-the-wild malware samples and realistic benign activity. Our method does not require any modification on the target device. Thus, it can be deployed independently from the resources available without any overhead. Moreover, our approach has the advantage that it can hardly be detected and evaded by the malware authors. In our experiments, we were able to predict three generic malware types (and one benign class) with an accuracy of 99.82%. Even more, our results show that we are able to classify altered malware samples with unseen obfuscation techniques during the training phase, and to determine what kind of obfuscations were applied to the binary, which makes our approach particularly useful for malware analysts.

This seems impossible. It’s research, not a commercial product. But it’s fascinating if true.

Posted on January 14, 2022 at 6:13 AM30 Comments

Comments

John January 14, 2022 6:47 AM

Hmmm….

Seems to me to be a diversion from really figuring out what is going on!

That said I put a wideband crystal set near an old cell phone. LOTS of rf packets all night??!!

Phone didn’t work well either. Missed calls, etc.

John

Jon January 14, 2022 7:13 AM

It’s possible.

See, the whole point of malware is to make it do something that it otherwise wouldn’t – and that activity is indeed detectable.

If it weren’t infected, it wouldn’t do that… 😉 J.

EvilKiru January 14, 2022 8:16 AM

Wouldn’t you need to compare it to an EM signature from before it got infected? And won’t that mean that if you install the (not yet available) detection software after you’ve already been infected, it wouldn’t alert you to that infection, but it it might alert you to the next infection.

Merson January 14, 2022 9:16 AM

…sounds like total B.S.

The referenced Abstract is painfully vague & worthless.

This is either a joke or scam.

Arclight January 14, 2022 9:40 AM

This sounds a lot like “before and after” fingerprinting, where you would characterize how the device normally behaves and then periodically check it again. IOT devices are usually pretty specialized, so it probably would change its CPU activity and wireless transmission patterns somewhat if it’s doing anything other than the narrow function it was built for.

“Wake up every 20 minutes and send the current temperature and status of the coffee bean hopper” should be distinguishable from “scan for active directory servers and exfiltrate all of the employee information.”

Clive Robinson January 14, 2022 9:44 AM

@ Merson,

This is either a joke or scam.

Sorry it’s neither.

It is however based on work I and others have done many years ago, and I have described it in part as part of “Castles-v-Prisons”.

@ EvilKiru,

Wouldn’t you need to compare it to an EM signature from before it got infected?

No.

The malware is in effect a program of it’s own, and is “determanistic”. This means it has it’s own signitures.

If you make a “matched filter” for the malware signitures, then the malware gets not just detected but identified on an increasing probability of the number of the malwares signitures used.

At some point using plain matched filters gets cumbersome, so you can replace them with something a little more flexible like “Machine Learning” that with the right training data will do the same as the matched filter set.

Ted January 14, 2022 10:15 AM

I’m not sure if I have access to the whole paper through the ACM digital library site.

However, this appears to be a pdf of the paper.

https://hal.archives-ouvertes.fr/hal-03374399/document

From the paper: “Our malware selection encompasses three types, which are accurately representing malware targeted on IoT devices in the wild: DDoS, ransomware, and kernel rootkits.”

Well, the paper does seem to aim towards the practical regarding iot devices. I haven’t read it yet though I would expect that it has some pretty interesting info.

Chelloveck January 14, 2022 10:41 AM

I’m sure I’m not the only old fart here who remembers debugging via AM radio. Early PCs had frequencies (or harmonics) in the AM band, and you could hear what they were doing setting a radio next to it an tuning it properly. It was pretty easy to distinguish the sounds to know (in broad strokes, anyway) what the computer was up to. This sounds like a similar concept, especially given that they’re targeting IoT devices (which have slower clock rates and fewer threads/processes than modern desktop machines).

TexasDex January 14, 2022 10:41 AM

Seems absolutely possible to me. It might take a lot of EM samples to train, but malware behavior patterns could absolutely cause noticeable changes to the RF emmissions of a device. Think of a crypto miner–the change in CPU usage would be pretty noticeable.

dj January 14, 2022 12:06 PM

It is a plausible premise. From the earliest days of computers techs listened to audio converted from bus signalling and to the RF emitted by systems to diagnose problems. I have done the same more recently when trying to determine the state of a system. It was very useful diagnostic method, especially where there are no blinkenlights or other external indication of state, and it likely still is.

EvilKiru January 14, 2022 12:45 PM

I remember a physics professor putting an AM radio atop a homebrew computer and having the computer PLAY MUSIC on the radio with no physical connection between the two devices, back in 1976.

MikeA January 14, 2022 1:01 PM

@EvilKiru

I can back that date up a bit. Music via RFI was common by the early 1960s.
We had a more practical use for it at one installation, where the characteristic sound of the idle loop was the signal to put on our parkas and go into the machine room to start the next job.

Some have suggested that the original reason a transistor radio was tucked
into the IBM 1403 printer (for Group Captain Lionel Mandrake to find) was such “debugging by ear”.

Z.Lozinski January 14, 2022 2:19 PM

It may sound far-fetched, but consider Multi-Static Primary Surveillance Radar, so using TV (and other signals in the ether) to detect aircraft. Research program around 2015 by the UK National Air Traffic Service.

If you know what you are listening to, there is a lot of information in the electromagnetic spectrum.

If the Great Seal bug worked, why shouldn’t you be able to do this.

At the end of the day, it’s all Shannon’s Theorem. How much information is a device leaving into the EM environment?

Security Sam January 14, 2022 5:50 PM

Articulate obfuscations
With dubious foundations
Triggering conversations
With doubtful impressions

Authors have low citations January 15, 2022 1:06 AM

It’s hard to tell the paper’s 4 authors’ credibility based on their citations. The citation counts of the main author and the 3 co-authors are very low at 0, 0, 2, and 39.

Clive Robinson January 15, 2022 7:54 AM

@Moderator:

I posted a response to “Authors have low citations”

It was accepted by the blog.

Yet it’s still not displaying either on this page or in 100 Comments.

So I re-posted it and got,

Duplicate comment detected; it looks as though you’ve already said that!

Twice now.

There is obviously something seriously up with the core display software.

Scott Fenton January 15, 2022 7:58 AM

I can see how this may work in theory since resource utilization should generate a specific fingerprint, e.g. encryption operations from ransomware would have specific resource utilization profiles… but I’m curious how that can be distinguished from, say, encrypting the drive intentionally with some other software that uses the same hashing algorithm. For this to work they would need to be able to detect with sufficient resolution all of the other operations that the ransomware performs and have that fingerprint significantly different from other activities or combination of activities. Computers are highly deterministic but I’m surprised that this works as well as they claim it does.

If I look at some ransomware that runs some overhead/persistence code and then encryption with algorithm xyz then you would need to be able to fingerprint the overhead/persistence code to a resolution sufficient to be beyond contaminating signals like thermal noise or the sum effect of other normal operation. Is that code really so unique, even when folks rely on a finite number of coding paradigms or even duplicate specific code so readily?

Interested enough to read more, thank you for highlighting!

Clive Robinson January 15, 2022 9:14 AM

@ Scott Fenton,

For this to work they would need to be able to detect with sufficient resolution all of the other operations that the ransomware performs and have that fingerprint significantly different from other activities or combination of activities.

Let me introduce you to the notion of “probablistic security”. At it’s simplest it says you can not 100% know things are secure or insecure, –there actually is mathmatical proof to say why– so it’s fairly pointless striving to do so.

But it also tells you something else, which is to guide you from low probability to higher probability. On such is to “turn up the sampling rate” which in this case gets you more data.

In the world of real human fingerprints, the prints are too complex to meaningfully describe as a whole. So getting on for two centuries ago they started breaking them down into a list of easily recognisable features like loops and whorls.

This enabled fingerprints to be not just catalogued but indexed. So if the print had X-a’s, Y,b’s etc you could pull out a small subset of fingerprints that had the same number of features. You could then perfore a more detailed examination and reduce things down to say the ten most likely matches. This information as a “list of suspects” would be treated to other methods of investigation.

The system is not perfect the wrong people have been found guilty on an incorrect identification, but over all it’s a fairly reliable –though not perfect– tool.

In essence that is what this software tool does and it is probablistic in two ways,

1, Does it have the malware in it’s catalog.
2, Does the malware have sufficient distinguishing features.

But there is one type of malware, which though not codless relies on inbuilt code in the OS or App. Obviously these are “shared features” but that does not stop this system functioning, because of,

1, Order: the signitures appear.
2, Time : between the signitures.
3, Duration : of signiture.

Provides three extra search space dimensions. The third relates to “loops” in a program you may expect a given loop to have a maximum or minimum number of iterations in standard execution. Call them “low water” and “high water” marks, anything outside them indicates unexpected behaviour, which should be investigated further.

But there is another signiture that can be used,

4, Locality of code.

Where code is or more corectly how local it is to current code, does effect the EM signiture via the memory subsystem and caching. Put simply on average the further away the next instruction is from the current instruction the longer it takes to access it. A compiler can localise code but any malware that uses it can not, it causes significant disturbance.

It’s these signitures that build a fingerprint of the malware, and whilst over a decade of independent research has told me there are ways to reduce them they are neither easy or perfect.

Clive Robinson January 15, 2022 10:10 AM

@ Moderator, ResearcherZero, JonKnowsNothing, SpaceLifeForm,

I have tried rewording my post to “Authors have low citations” as @ResearcherZero and others have suggested currently and in the past.

And I’ve not added any “naughty words”, “links”, or similar that have been “assumed to be” responsible for a post getting “held for moderation”

There has been a significant change in the way the blog software works this year, and many people are noticing.

In the past some of us “debugged” the behaviours, but that has undesirable side effects for other users. So doing the same again would be done with some reluctance, but it’s looking strongly like it is necessary.

MikeA January 15, 2022 11:35 AM

@Robert Russel
“I thought that was a plot contrivance.”

Well, Mandrake needed to have some way to discover the plot (at least two meanings), but the specific

“radio somewhat hidden in machine room”

feels like a shout-out to whoever was “plausibility/continuity” adviser for that scene.

If only there were more more relatively savvy such persons, so we could

get past the “hacking involves wearing a hoodie and typing really fast while random text scrolls down the screen” cliche.

John January 15, 2022 1:42 PM

@Clive,

I used this basic technique to check out our newly manufactured products.

Radio noise sounds the same. Clock OK, processor probably running OK. Change stuff and observe ‘usual’ changes in sound.

Different noise, check clock, check chip selects.

It was a pretty good first level debug step.

I haven’t tried it with my cell phone. Have you? Maybe need a higher freq. receiver?

John

Clive Robinson January 15, 2022 2:32 PM

@ MikeA, Robert Russel,

get past the “hacking involves wearing a hoodie and typing really fast while random text scrolls down the screen” cliche.

Do you mean I don’t have to wear the dark mirror shades?

Darn I was just getting used to groping around in the dark 😉

The reality is most of my best hacks back in the 1980’s involved bashing around on the keyboard to find things like “buffer overflows” in login programs and the like.

The 90’s needed a little more thought but finding some nifty faults in AT&T / Unix Labs Unix terminal emulator on PC’s which alowed me to copy a link to sh under a users permisions in a hidden directory under the guest directory as they logged in was a real doozy.

But I guess getting UK Prime Minister “Mad Maggie” Thatcher so irrate she tried to get me set up as being a criminal, probably is the topper, and I didn’t even hack anything… Not that I knew it at the time, I just thought my sixth sense and stubbornness had stopped me catching a bullet which slightly later caught to people I was friendly with. That was untill some one pointed out some documents that became available at the UK National Archives under the thirty year rule…

The thing is I still do “hack” but not in the way “hackers” are thought of by most, and never did. It was like lockpicking, I was fairly adept at it before I was ten, but I never used it to do “wrong” things appart from an occasional practical joke (like turning the entire contents of a persons locker into a mirror image) or puting in a birthday card too big to get through the slots or when wearing the green iron someones uniforms with the creases in the wrong places like 90degree around on the legs and arms oh and with verticle pleats made horizontal.

I guess what I do these days is called “Security Engineering Research” though to me it’s often a way to relax and unwind…

Bruce McNair January 15, 2022 2:41 PM

I find it interesting that they are claiming 99.82% success rate. Think about the number of events that need to occur to make that claim (1 failure in 555). Now, to have statistical significance, you need about 555 failures to be able to make that statement. Did they run 300,000 independent experiments to get that result. Forget about the security claims, just look at the methodology…

Ted January 15, 2022 5:21 PM

@Bruce McNair

I find it interesting that they are claiming 99.82% success rate.

According to Figure 5 (a), the accuracy of the CNN type classification was based on 21,161 traces. 39 fell outside the predicted value.

I don’t know if that makes this statistically significant however.

Eric Nepean January 19, 2022 9:50 AM

What they are claiming to do is to fingerprint operational behaviours of Malware.

Consider the EM shielded Tempest rooms that are used in secure facilities. One reason for these rooms is to prevent the same fingerprinting applied to wanted applications on the secured computers and keyboards.

One mechanism of stealing data is to fingerprint the operation of the microprocessor driven keyboard against each keypress. This is used in espionage activies, not hackers per se as it requires nearby physical access. Once you have the keyboard fingerprint set that you can steal the sequence of keystrokes.

So yeah, this has already been demonstrated in another theatre.

Clive Robinson January 19, 2022 10:20 AM

@ Eric Nepean,

So yeah, this has already been demonstrated in another theatre.

And another time… Back half a century ago if not more. In the 1960’s to my knowledge and later in the 1970’s. It even got mentioned on a topical science news program “Tommorows World” from the BBC presented by amongst others Raymond Baxter…

someone January 21, 2022 11:39 AM

@MikeA re: IBM 1403 – I used one of those at my first IT (then DP) job. It was interfaced to a System/3 (10D iirc). I can’t imagine being able to hear the audio from a colocated radio when it was in operation – it was a very noisy piece of equipment, even with the door closed. I’ll need to watch the movie again to see how plausible that really seems.

john January 23, 2022 9:38 AM

This is a load of garbage. Neural networks have become paper-writing machines for academic researchers because they don’t need to know how anything works. Look at Google scholar and tell me I’m wrong. Every paper in malware detection claims extraordinary results and yet when you look at their training methodology, it is all bullshit. Look at this paper’s confusion matrix and ask yourself “what is class imbalance”?

Chris January 24, 2022 11:18 AM

This would be a nice way to detect the NoReboot hack on an iPhone (or Android phone if someone writes such a hack for it). If the phone is really shut down it won’t produce any stray RF, if it’s still running it will. And it will be impossible for malware to tell if a nearly radio receiver is listening for stray RF.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.