Availability Attacks against Neural Networks

New research on using specially crafted inputs to slow down machine-learning neural network systems:

Sponge Examples: Energy-Latency Attacks on Neural Networks shows how to find adversarial examples that cause a DNN to burn more energy, take more time, or both. They affect a wide range of DNN applications, from image recognition to natural language processing (NLP). Adversaries might use these examples for all sorts of mischief—from draining mobile phone batteries, though degrading the machine-vision systems on which self-driving cars rely, to jamming cognitive radar.

So far, our most spectacular results are against NLP systems. By feeding them confusing inputs we can slow them down over 100 times. There are already examples in the real world where people pause or stumble when asked hard questions but we now have a dependable method for generating such examples automatically and at scale. We can also neutralize the performance improvements of accelerators for computer vision tasks, and make them operate on their worst case performance.

The paper.

Posted on June 10, 2020 at 6:31 AM12 Comments

Comments

scot June 10, 2020 8:31 AM

I used the Tesseract OCR engine on a project years ago looking for text on blueprints. If I were to just send the entire image to the OCR engine, it could get lost in things like shaded sections for hours, trying to find text where there was none. I had to break the image down into groups of connected pixels and then filter those objects based on the size, density, and spatial frequency to find blocks of what was likely text, and pass just those pixels into the OCR engine. Neural networks do some amazing things, but how they do it is opaque, and they tend to be very brittle when you push the boundaries of their training set.

Phaete June 10, 2020 10:01 AM

Pareidolia in neural networks.
I guess it is to be expected, it’s not as much a human issue as it is an integral one with object recognition in 3D space with color variations (like shade) and perspective.

I’m curious if there is a way around this except for brute power.
Associative methods seem to fail here.

tfb June 10, 2020 10:35 AM

By feeding the human NLP system confusing inputs you can also cause it to take a long time and/or get wrong parses. Garden path sentences are a good example: “The old man the boat.” for instance. Garden path sentences usually lead you to an incorrect parse which you then need to revisit when it becomes clear that it’s hopeless.

So if this is surprising to anyone it really should not be. And in particular it should not be taken as some kind of ‘here is why computers are dangerous’ example: humans do all these things. Which, of course, it will be.

Jesse Thompson June 10, 2020 12:17 PM

I’m currently less interested in the claim of “a dependable method for generating [teergrub] examples automatically and at scale” against AIs, and more interested in the inference here that one could automatically and at scale do the same to teergrub humans.

For an example, we’ve all seen the adversarial AI systems capable of fooling AI image classifiers. Here’s one that can fool humans, as well. 😛

Clive Robinson June 10, 2020 12:23 PM

@ ALL,

    “Adversaries might use these examples for all sorts of mischief — from draining mobile phone batteries, though degrading the machine-vision systems on which self-driving cars rely, to jamming cognitive radar.”

Funny none of those were my first thoughts on what to make burn the resources…

My first thought was, “Whoopie Surveillance systems are going to get a beating”. Shortly befor “How long before what passes as democratic authority these days can make such things illegal…”.

For the past couple of centuries there has been two basic ways you can fritz a system via the input. One is force it into heavy negative feedback so the loop bandwidth drops to some fraction of a percentage. The other is positive feedback that makes it slam from position to position very fast, over shooting the mark causing it to slam back and hunt etc.

You can see the graphs etc in mostcundergrad control theory texts. Thus in analog systems such as pumps, lifts, aircraft control surfaces, gun turrets, etc such behaviours are highly undesirable.

But even the precision of digital systems does not stop such destructive behaviour, control theory applies to both digital and analog systems.

In digital systems energy consumption goes up with the required processing, and time to compleate any given function goes up with the latency of each specific functions sub components.

In fact precision or more correctly the lack of it is a very real concern. Many Neural Network systems try to gain an advantage in both speed of processing (reduced latency) and energy consumption (from gate count) by using very much reduced precision and by also a trade off by upping the sampling rate. It would not be to hard to search for input that needs increased precision or longer latency, a random search feeding into some kind of itterive selection process would do.

Whilst there are tricks for trying to keep precision requirments down that can be used as a pre-filter on input data the same is not as true for latency. For any “time bound” function which most Neural Networks are, attacks aimed at increasing latency would be the most devestating.

The last thing you want is the brakes to come on so late you have hit something, or your direction response be so slow you over shoot the mark and have to slew from side to side in long duration curves (think a drunk driver swinging back and forth across the road lanes).

There is no real cure for such latency issues other than the “design for worst case” by increasing the processing power thus cost and energy required. Which is not something most software engineers generaly do as most times they only need design for the average case.

MarkH June 10, 2020 3:03 PM

@scot:

Thanks deeply, for reporting your experience.

It’s been obvious for decades that “neural networks” is a fraudulent term for feedback-based algorithms to adjust matrix coefficients. They have no meaningful resemblance to neural functioning, nor to human intelligence.

Perhaps even more than other “AI” tools, they suffer from extreme brittleness: they function at some performance level within a highly restricted domain, outside of which they fail abruptly, severely, and without an error signal.

Their opacity — which is inherent and (I suspect) impossible to fix — means that their failures cannot be well predicted, and nobody really knows what they “learned” from training data.

If there has been much progress in remedying these defects, I’ve yet to see evidence of it.

The only thing intelligent in artificial intelligence is the way its boosters — from Marvin Minsky onward — have exploited public credulity to sell their unjustified hype. Millions of people believe that artificial intelligence exists.

On what basis?

David Leppik June 10, 2020 6:18 PM

This is fascinating because the defining characteristic of feed-forward neural networks is that they contain no loops. All this deep learning is one-way data flow from one layer to the next.

Because there are no loops, the amount of energy they can consume is limited. They are only allowed to do a fixed amount of processing.

What the paper says is that modern hardware exploits the fact that the matrixes are sparse. Most of the values are 0’s, and the hardware is optimized to avoid calculations in empty sections. Also the memory is compressed, to avoid communication with RAM chips.

These attacks feed the NNs data that doesn’t converge to mostly 0’s after the first few layers.

Most of the analogies above involve situations which require a lot of cycles of thinking/processing. But here there are no loops. A better analogy is if you have a reference desk at a library where each reference librarian has a different set of books. Normally you need to interact with only one of them to get an answer. But you carefully craft a question which requires a fact from each of them. If the reference desk is fully staffed (i.e. a completely parallel neural network) it will take the same amount of time as always, but it still uses up a lot of energy. If the reference desk is under-staffed and some are doing double-duty, it will also take longer. Real neural networks are more like real reference desks: lots of books, very few librarians, but most often they know the answer off the top of their head, so they are usually really fast.

Clive Robinson June 11, 2020 3:30 AM

@ Trudi Fenster-Klotz,

the approach of Stephen Grossberg

It’s been more than a few years since I played in the “analog” side of Neural Networks, but if memory serves correctly “Grossberg Networks” were based on “leaky integrators” that act not just as storage elements but also low pass filters.

Leaky integrators / lowpass filters turn up in a lot of places for good reason, not least is that they are inherently stable. But simplistically if you drive them with a series of high frequency pulses you get the average value out, which is actually quite an efficient way of doing things (see Class D amplifiers, SMPSUs etc and one or two RF amps I’ve designed).

However the less leaky you make them the more carefull you have to be about “input constraints” which is one reason Grossberg’s integrators had constants for not just how leaky they are but also maxima and minima on the input triggering.

So having “re-awoken” those old grey cells for those memories, I guess they are going to hang around that little bit longer 😉

Joe Loughry June 11, 2020 11:15 AM

@ David Leppik,

That was a fantastic summary of the paper! You wrote the kind of explanation that I always strive for when teaching classes. I don’t always succeed, but when you find it, it’s wonderful. Nice job.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.