Manipulating Machine Learning Systems by Manipulating Training Data

Interesting research: “TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents“:

Abstract:: Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting.

From a news article:

Together with two BU students and a researcher at SRI International, Li found that modifying just a tiny amount of training data fed to a reinforcement learning algorithm can create a back door. Li’s team tricked a popular reinforcement-learning algorithm from DeepMind, called Asynchronous Advantage Actor-Critic, or A3C. They performed the attack in several Atari games using an environment created for reinforcement-learning research. Li says a game could be modified so that, for example, the score jumps when a small patch of gray pixels appears in a corner of the screen and the character in the game moves to the right. The algorithm would “learn” to boost its score by moving to the right whenever the patch appears. DeepMind declined to comment.

Boing Boing post.

Posted on November 29, 2019 at 5:43 AM12 Comments


Clive Robinson November 29, 2019 7:04 AM

@ Bruce,

With regards,

    Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time.

We’ve known for some time now that “real world” training data, just transfers bias / prejudice from human systems to the AI system.

Thus a question arises of how you would tell a “Trojan attack” from carefully selected real world training data?

In other words what is to stop me making or using a similar AI system to go over a large data set, selecting that subset of training data that would give me the desired prejudicial output, when fed into the official AI system?

As far as I can see all the person wishing to poison the official AI system would have to do is come up with an excuse for selecting the records that make up the training data. If they claim they were selected at random, there is little you can say or demonstrate after the fact to show it was deliberate prejudice…

Because currently AI systems are in effect a “black dox one way function” they in effect give perfect deniability as you can not gey into the system and work it backwards.

Like Crypto with “magic number” SBoxes and Curves I’m deeply skeptical of AI systems that appear like Chinese Rooms.

As I’ve remarked before about black box Random Number Generators if as an observer you can only see the output you have no real way in a finite time to demonstrate that in fact it is not an RNG but a secure crypto algorithm driven by a counter.

The exact same logic applies to black box AI systems.

It should be noted at this stage that our legal systems are very much based historically on the notion of being able to analyse an effect, thus determin the cause and be able to demonstrate it as factual proof to a group of peers to make judgment as to it’s truth or not.

Any black box one way system prevents this reasoning back from effect to cause thus should be treated with a fair degree of mistrust by any reasonable person.

Rj Brown November 29, 2019 11:37 AM

While the headline of this article is interesting, it is not particularly unexpected. I have not yet read the paper, but I will; however, at first glance, the effect appears to somewhat mimic the effect of hypnotism on naturally intelligent humans: some trigger condition causes an otherwise unusual behavior to occur.

SeeNoEvil November 29, 2019 1:34 PM

This strikes me not all that different from training young human neural networks using the kind of input data found in Sunday schools (and probably many other examples).

Clive Robinson November 29, 2019 2:31 PM

@ Darren,

Microsoft’s ‘Tay’

Ahhh I remember her story well 0:)

Proof if ever it were needed that even Trolls could out smart what was considered by her designers the best AI in existance…

Makes you wonder if they ever read the work of Mary Shelley’s 1818 novel 😉

David Rudling November 29, 2019 2:52 PM

Was there some assumption that machine learning was exempt from Garbage In. Garbage Out ?

SpaceLifeForm November 29, 2019 4:07 PM

In the olden daze…

There was ELIZA

And then SHYSTER.

What is out there now, anything to make a penny.

Clive Robinson November 30, 2019 2:23 AM

@ SpaceLifeForm,

What is out there now, anything to make a penny.

Oh it’s way more than that, but worse they appear to have reversed the old saying to,

    No news is bad news

Thus “any news no matter how bad is good news” as it gets the company name mentioned… The only proviso being that “marketing should be able to spin it up to ‘Infinity and beyond'”.

Thus Tay’s behaviour got spun up to being “Daring, Boundry pushing, Leading edge technology development” etc etc.

They got away with it twice, then just did option two, of “Change the name and push it out the door with big cludges all over it”… And “Zo” it was, untill earlier this year when Microsoft all but removed the still offending “brat” from public view.

RealFakeNews December 2, 2019 4:45 AM

Really? I’d say it was pretty damn obvious.

I’m really starting to think the people behind these systems are suffering some kind of grand illusion and they believe, sci-fi style, that these things are somehow “alive” or sentinent.

No – it’s just a bunch of algorithms running on a computer.

If you feed it garbage, it will produce %%%%.

TotallyNotBen December 9, 2019 1:00 PM

Is anyone aware of these types of tactics against systems like googles alphastar (AI that plays Starcraft)? Is is it just that its trained and thats it, no more learning? or is it such that the AI “learns” “adapts” to individual players? If this is the cause, its seems completely probable that this would be the case. Make the AI make a false association, and then at just the right time when its taken the bait…WHAMO!!! The hammer drops…

Me December 19, 2019 9:57 AM


I was going to say pretty much this same thing, people are subject to these exact same training vulnerabilities.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.