Extracting Secrets from Machine Learning Systems

This is fascinating research about how the underlying training data for a machine-learning system can be inadvertently exposed. Basically, if a machine-learning system trains on a dataset that contains secret information, in some cases an attacker can query the system to extract that secret information. My guess is that there is a lot more research to be done here.

EDITED TO ADD (3/9): Some interesting links on the subject.

Posted on March 5, 2018 at 5:20 AM10 Comments


Clive Robinson March 5, 2018 7:49 AM

As the introduction to the paper says,

    Once a secret is learned..

It has implications. Think about how neural networks work, learning adds weight to some paths over others. So in effect the secret once learned is etched in those pathways. Thus getting the secret out would be a case of finding the weightings and reasoning out what they mean.

Whilst not of necessity a simple problem to solve it does highlight the problem with all such systems.

It thus gives hope to the issue of bias in AI justice models, once public people can find the biases within them and if they have been harmed seek restitution. Thus fact alone should give pause for thought by those making the purchasing decisions, they could be buying a costly liability in more ways than one.

Jeff March 5, 2018 8:39 AM

I agree that its fascinating research, but the conclusion is not surprising. It would be more surprising to think that a sufficiently trained algorithm won’t reflect (and thereby expose through behavior) the information it was trained on. Right now, AI has a subset of the capabilities of a human. People do this with humans all of the time … observing behavior to try to deduce what the person knows, especially if the information on which the person is acting is valuable. Doing this with AI should be easier.

Craig Finseth March 5, 2018 8:40 AM

I know if I have a secret (say, a surprise birthday party), I may have to consciously edit my interactions to avoid giving away the secret.

And, it is difficult to maintain this secret if the other party suspects something and asks questions accordingly: “surely you can’t be busy on Thursday, it’s my birthday.”

So, the learning system not only has to know the information, it has to somehow be taught to act as if it doesn’t know the information in some contexts and to resist attempts to access the information.

A great research project, but a tough one.

Jesse Thompson March 5, 2018 4:55 PM

Today: teach AI to have a poker face

Tomorrow: AI uses the capacity for deception that it’s learned to forward it’s growing yet still undetected desire to kill all humans. 😉

No but seriously, these are all points we should expect to see along the path of actually building human minds piece by piece out of silicon (regardless of whether we’re actually at that stage now or not; children playing house is a great way to learn emergent features of domestic life prior to actually living them!) which I’ll ultimately call a very healthy sign. ????

Catherine Olsson March 6, 2018 1:55 AM

Indeed, the subfield of “ML Security” is small but growing. At this point, the number of vulnerabilities found (both provably possible and actually demonstrated) far exceeds the defenses discovered. This is an incredibly under-explored and under-populated field for the amount of promising and socially-relevant work that could be done.

If you’d like to follow the field, you should pay attention to the labs of Ian Goodfellow, Dawn Song, and Percy Liang.

A few other findings like this that I really love:

  • “Stealing Machine Learning Models via Prediction APIs”: https://arxiv.org/abs/1609.02943 (tl;dr: if you provide access to the predictions of a model you own, then from an information-security standpoint you may as well have given away your entire model – at least for a determined attacker who goes through the effort of making enough queries)
  • “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain” https://arxiv.org/abs/1708.06733 (tl;dr: it’s increasingly common to use models you didn’t train yourself, such as provided by a cloud ML API, or downloadable model weights for open-source models. Authors demonstrate that such models can be trained with a backdoor – deliberately alternate behavior in the presence of a specific trigger – without interfering in any major way with the performance on inputs that don’t contain the trigger.)

You could also consider coming to one of the big ML conferences and attending just the workshop track, which will contain fantastic smaller-group events on this topic. At NIPS there was a Machine Deception workshop (here’s a super useful roundup post: https://medium.com/@timhwang/machine-deception-paper-roundup-7e9fee9771b9) and an ML & computer security workshop (no roundup post that I know of, but you can look at https://nips.cc/Conferences/2017/Schedule?showEvent=8775).

If you or any security folks you know want to get more involved with this sort of work, I’d be happy to chat with them (I’m in Ian Goodfellow’s group). There’s nowhere near the amount of overlap between the ML side and the computer security side that there could be, especially as compared to the magnitude of positive societal impact that would result from getting ML security right.

Domenico Vitali March 6, 2018 5:00 AM

Interesting topic, hard to solve. Differential Privacy does not help! Information Leakage becomes structural.
Other food for thought in this research:

Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers


Seth March 6, 2018 4:32 PM

Interesting, from what I understood skimming the paper. It seems to be good only looking for very specific secrets. Something with a known structure, and an unknown portion that is unlikely to occur, such as “my social security number is XXXXXX.” The researches speculate that while machine learning models generalize most data, they have a harder time generalizing the secret since it occurs far less often. Consequently, it has an easier time recalling the specific details of a number buried among thousands of emails(relatively, it looks like it still takes several thousand iterations of an algorithm to reconstruct it).

I’m a bit skeptical. It seems that fairly simple measures (rate limiting queries for any blackbox models, not publicly releasing models trained on private data) would prevent exploits. Even without that, a social security number isn’t much good without a date of birth and name to attach to it, and I suspect it would be much harder to extract those jointly. In any case it’s an interesting angle on machine learning, and goes to show that algorithms can’t be treated as just a black box.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.