Hiding Malware in ML Models

Interesting research: “EvilModel: Hiding Malware Inside of Neural Network Models.”

Abstract: Delivering malware covertly and detection-evadingly is critical to advanced malware campaigns. In this paper, we present a method that delivers malware covertly and detection-evadingly through neural network models. Neural network models are poorly explainable and have a good generalization ability. By embedding malware into the neurons, malware can be delivered covertly with minor or even no impact on the performance of neural networks. Meanwhile, since the structure of the neural network models remains unchanged, they can pass the security scan of antivirus engines. Experiments show that 36.9MB of malware can be embedded into a 178MB-AlexNet model within 1% accuracy loss, and no suspicious are raised by antivirus engines in VirusTotal, which verifies the feasibility of this method. With the widespread application of artificial intelligence, utilizing neural networks becomes a forwarding trend of malware. We hope this work could provide a referenceable scenario for the defense on neural network-assisted attacks.

News article.

Posted on July 27, 2021 at 6:25 AM17 Comments


Clive Robinson July 27, 2021 7:02 AM

@ ALL,

Let’s be honest it’s been a long time since “antivirus engines” in general have been of much use except as over privileged “boat anchors”. I actually don’t advise them in many environments as they are more potential trouble than they are worth as the NSA found with Kaspersky (the AV thinks the code you are developing is some kind of virus and punts it off to some repository is just one issue of many that happens).

But 178MB/37MB or ~20% of the code being malware that goes undetected is still a very intetesting figure, especially when the performance impact is down around ~1%…

But it kind of brings up not the specific case of malware but the more general case of we realy don’t understand ML systems terribly well. Thus you could push several toes up pachyderms down the vomitory through the orifice and into amphitheater without anyone realy noticing them stinking the place up…

It’s this general issue we realy should be getting to grips with after all the realiry is ML systems are not realy that special so why do we treat them as though they are?

echo July 27, 2021 8:33 AM

I read this some days ago and by chance a number of sources relating to information theory. This article conceptually cuts aross a number of areas relating to psychology, doctrine, organisations, culture, history, legal theory, and so on and so forth. I’ve also been reading up on financial law which was a bit of an eye opener. Not so much the law itself but the mindsets and motivations and cutting corners when a job title with authority wants something.

Heuristics and mutli-variate analysis and discrimination and human rights abuses can be a bit of a murky area especially when the bad actor is deeply embedded in a network and hidden from sight. It gets worse when said bad actors have their hooks into regulatory and monitoring systems or unofficial and unlawful access to legally privileged confidential data.

With regard to the IT industry I think a huge number of current security failings are down to both mindsets and networks as touched on in my first two paragraphs. You can track back the disaster of SPECTRE, Silicon Valley “agile culture” and Microsoft’s sieve like operating system to around a similar point in time. I would suggest that anti-virus and machine learning has a similar genesis.

As for Clive’s general comments the mid 1980’s were a point where doctrine pivoted. It takes around ten years on average for an idea to gain general traction so would have surfaced in the general population around the mid 1990’s. You can see traces of expedient reasoning in Elon Musk. The only reason SpaceX rapid development isn’t killing people is it’s pretty obvious when rockets go bang and there are regulatory and social constraints on what he can get away with. as for regulatory and social constrains the 1980’s and 1990’s were significant in terms of financial deregulation and the rise of financial engineering, shifts in economic models, and the degradation of “quality of life” issues including ergonomics, work loading, and wellbeing. Now we have monopolies, monocultures, “flat design”, the gig economy, and too many things being rented not bought.

Two scandals just dropped this week hardly anyone noticed. The DWP has been caught outsourcing “anti fraud” software to a major contractor who sub-contracted to a clever clever software developer who fumbled discrimination law wich created the suspicion the system was ageist. Now everyone has looked at each component in isolation and defended their job and pensions but nobody looked at the whole system. A major report also just landed on rampant sexual harassment within the UK military. It was so bad women said they considered male personnel a bigger threat than enemy combatants.

Machine learning? Hmmm… What is worse than an idiot is a clever idiot.

Fed.up July 27, 2021 8:49 AM

Which vendors are vulnerable? These Neural Networks make life and death decisions for the Government and private sector.

This is an operational issue. Breach forensics solely focus on the attack vector. But it is the operational deficiencies that allow attacks to happen. All attacks involve some element of insider cooperation – either intentional or negligent.

Sabotage is as common as insider theft.

Yesterday Bruce spoke about Bitcoin enabling Ransomware. The same holds true for the Government not publishing the names of the contractors and Managed Service Providers involved in each large-scale attack. If these MSP’s were publicly identified, they’d either stop enabling the attacks or they’d go out of business. Last week Saudi Aramco was attacked again and a few months ago Marsh & McLennan was attacked. The same data elements were stolen in each attack. Even the age of the data (early 90’s) was the same. I bet they share the same MSP/contractor who migrated EXPIRED data to the cloud. I bet those data leaks were not an accident. Someone purposely took data that should have solely been on offsite tape, restored it and migrated it to the cloud. If that’s not sabotage, what is?

Back to Neural Networks. We are quickly moving in the direction of nationalized versions of software. But there’s already laws that require this. In the past few months MasterCard, American Express and Citibank have exited India. India is enforcing its data residency laws. They will be expanding them too. The EU and US have data residency laws. These laws apply to IP too. Will India soon follow suit and disallow code from leaving India? The US is trying to enforce this too, as it should.

Most people today look at Cybersecurity as just encryption, monitoring and threat hunting. When that’s probably only 30% – and the remaining 70% is split between architecture and operations. But threat hunting shouldn’t even be in that 30%, because it is too complex and dangerous. The Government needs to do it in conjunction with the telco’s and cloud operators. It is crystal clear to me how this will shake out. But as with any societal shift there are those that won’t accept the change or growth that is necessary. Wall Street knows. Watch.

FA July 27, 2021 9:24 AM

Going quickly through the paper, I really didn’t find anything new. Data can be hidden in the least signficant bits of any format that contains numerical information to higher precision than strictly necessary. That includes picture and audio files, ML models, and probably a lot more. Still that’s only data. To transform it into malware something must extract the hidden data, transform it into executable code, and then actually execute it. That is were things go wrong. Why are we still using software that does this by design ?

Clive Robinson July 27, 2021 9:48 AM

@ FA,

That is were things go wrong. Why are we still using software that does this by design ?


1, It’s “sexy”
2, It’s “efficient”
3, It’s “clever”
4, It’s “cool”
5, It’s “Good on a C.V.”

Or any one of probably nearly 100 dumb reasons that look good to someone at the time.

After all who thought adding BASIC to a Word Processor was a good idea?

Fed.up July 27, 2021 10:04 AM

You don’t need to add malware to neural networks. All a bad actor needs to do is mess with algorithmic modeling.

What the heck happened in Texas last February that caused nearly the entire State to black out and took out every power plant, chemical plant and refinery nearly simultaneously? Bad algos in there somewhere.

Looks like that operator experience saved the day – after data science failed.

Should the people who write these algos be vetted and licensed by the US Gov? That’s too logical.

Reminder in 2010 there was a “Flash Crash” caused by a fat fingered algorithm https://en.wikipedia.org/wiki/2010_flash_crash#Early_theories

Bear July 27, 2021 1:03 PM

If we’re talking about ANN’s as vehicles for steganography, yes, obviously, and I’m upset with myself for not thinking about it before.

These models are millions or billions of floating-point numbers. As delivered most are encoded as ‘float’ or 32-bits (with 9 significant digits). During development, and sometimes as delivered, they are ‘double’ or 64 bits (19 significant figures in standard encoding). But in operation, usually only about four to six of the significant digits of each value actually matter.

I would guess that the authors of the paper determining how much can be encoded with one percent loss of function are being cautious. With only a slightly greater loss of function I expect that the model as delivered could contain malware up to 50% of its entire size. If the representation is in ‘double’ or 64-bit format, it could be 60%, easily, without even a loss of 1 percent function.

FA July 27, 2021 2:32 PM


These models are millions or billions of floating-point numbers. As delivered most are encoded as ‘float’ or 32-bits (with 9 significant digits).

Actually just 7 significant digits, with a 24-bit mantissa.

But any model that actually requires weights with that sort of precision in order to work would be quite unstable anyway, unless its inputs have the same precision, which is rather improbable. So there is indeed a lot of space for ‘noise’. Much more than in a typical audio or video file.

David Leppik July 27, 2021 2:52 PM

@Bear & FA:

Neural network hardware from Nvidia and Apple supports 16-bit floats, since the need for precision is extremely low. That’s 1-bit sign, 5-bit exponent, and 10-bit significand.

Super accurate floats July 27, 2021 3:41 PM

long int is 64 bits.

Take a pair of them.

Use one for exponent and one for mantissa.

There’s a sign bit. Use it for positive or negative, but the MSB of the mantissa is always one.

Do an integer MUL instruction and take the RDX register for the new mantissa. Shift left and fill in from RAX if the MSB of the multiply isn’t already one.

DIV is similar. Place the dividend mantissa in RDX and clear RAX.

Adds and subtracts are easy. Adjust the expone to for every shift.

Floating point coprocessors are archaic.

vas pup July 29, 2021 5:28 PM

India seeks to reform its military amid new security threats

“Threats from advancing technology

The Indian military also faces challenges from growing advancements in technology, says Vivek Chadha, a retired colonel and a research fellow at the Manohar Parrikar Institute for Defense Studies and Analyses. According to Chada, drones are now considered low-cost options.

“Similarly, the kind of investment one needs to do to carry out a cyberattack is minuscule as compared to conventional weapons systems,” he added.

Tarapore said “drones are just the tip of the iceberg.”

===>”Far more salient and long-lasting threats in decades to come are going to be the huge advancements in information technology, as it relates to war. Everything that can be affected by, improved by artificial intelligence and machine learning,” Tarapore told DW.”

JonKnowsNothing July 29, 2021 7:41 PM

@vas pup

re: military use of advancements in technology

Up in the Himalayas on the India border with China, India and China have been having some trench warfare exchanges. Anything adjacent to Tibet is a very sensitive topic in China but it appears that China may wish to extend their Tibetan border further toward India. India isn’t amused.

After a few gun fire exchanges, a disarmament happened and neither side is supposed to carry guns. So, they settled for the ancient alternative: some spiked clubs and lots of local rocks to brain each other with.

Still does the same damage it did millions of years ago.

The topic of Why? doesn’t get much review in the MSM. One hypothesis is that China may have polluted one of the major rivers that flow from the mountains and needs a clean water source; another is that they want to get better control of another river that is part of plans for a huge hydro-electric plant down in the flat lands.

lurker July 30, 2021 4:17 PM

@Jon: The topic of Why?

Check McMahon Line, Simla Convention.
Another “solution” applied to suit British Imperialism and never mind where the natives thought their boundary was…

SpaceLifeForm July 31, 2021 3:43 AM

@ Clive, ALL

For a laff. Funny tears. Unreal funny. I left that stuff out.
The snippets below are just to make it look on-topic 😉 (Conversely, seriously important with regard to the subject matter)
But, overall, this is seriously funny. You have to read the entire article, because I do not want to ruin the story for you. If you can not laugh over this, I really can not understand, unless you are a machine.


With the advances of machine learning and neural networks advancing the fields of natural language processing, many applications are being developed every day across the world to interface man and machine through Speech Recognition Systems (SRS).

This study showed that the key was to train a machine to speak and listen like a human.

These challenges and more were mitigated with novel additions to a traditional artificial network architecture commonly used in most Natural Language Processing (NLP) techniques.

That numerical value would then trigger different accented phrases processed in the later layers of the NLP network.

The addition of the backwards propagation technique was the only way for the network to converge to individual users.

That’s when we had to get creative. … With the 2.8 million minutes of annotated conversations in between 2015 and 2020, we were able to begin training the SRS.

the learning time became increasingly unusable for a commercial system. For the sake of science and too much pride to give up, we continued until we began looking into computer clusters for remote processing.

With our developed experience and intuition developing the algorithm we were able to achieve the baselined 85% turing test failure rate performance.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.