Vulnerability Finding Using Machine Learning

Microsoft is training a machine-learning system to find software bugs:

At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month. These items get stored across over 100 AzureDevOps and GitHub repositories. To better label and prioritize bugs at that scale, we couldn’t just apply more people to the problem. However, large volumes of semi-curated data are perfect for machine learning. Since 2001 Microsoft has collected 13 million work items and bugs. We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99 percent of the time and accurately identifies the critical, high priority security bugs, 97 percent of the time.

News article.

I wrote about this in 2018:

The problem of finding software vulnerabilities seems well-suited for ML systems. Going through code line by line is just the sort of tedious problem that computers excel at, if we can only teach them what a vulnerability looks like. There are challenges with that, of course, but there is already a healthy amount of academic literature on the topic—and research is continuing. There’s every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.

Finding vulnerabilities can benefit both attackers and defenders, but it’s not a fair fight. When an attacker’s ML system finds a vulnerability in software, the attacker can use it to compromise systems. When a defender’s ML system finds the same vulnerability, he or she can try to patch the system or program network defenses to watch for and block code that tries to exploit it.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

Fast-forward a decade or so into the future. We might say to each other, “Remember those years when software vulnerabilities were a thing, before ML vulnerability finders were built into every compiler and fixed them before the software was ever released? Wow, those were crazy years.” Not only is this future possible, but I would bet on it.

Getting from here to there will be a dangerous ride, though. Those vulnerability finders will first be unleashed on existing software, giving attackers hundreds if not thousands of vulnerabilities to exploit in real-world attacks. Sure, defenders can use the same systems, but many of today’s Internet of Things (IoT) systems have no engineering teams to write patches and no ability to download and install patches. The result will be hundreds of vulnerabilities that attackers can find and use.

Tags: AI, cloud computing, cybersecurity, machine learning, Microsoft, security engineering, vulnerabilities

Posted on April 20, 2020 at 6:22 AM • 24 Comments

Comments

Rj • April 20, 2020 8:10 AM

This is an interesting subject. I suspect that a deep learning neural network approach will be tried first, but will meet with some pushback because these systems are not very good at explaining why and how they reached this decision.

Attackers probably don’t care, so the deep learning neural network approach will work for them. It has the advantage of being faster at finding bugs once it has been trained than say an ID3 type of decision tree induction system. The advantage of the decision tree systems is that they can explain how they reached a decision in a manner that is tractable to humans.

Therefore, I predict that the attackers will still have the upper hand until these systems get perfected and used to find bugs in new code before it is released.

Still, experience tells me that after that code is released, some bugs will still remain, and these are more likely to be found more quickly by the attackers’ neural nets than by the defenders’ decision trees.

Bruce Schneier • April 20, 2020 9:36 AM

@Rj:

“This is an interesting subject. I suspect that a deep learning neural network approach will be tried first, but will meet with some pushback because these systems are not very good at explaining why and how they reached this decision.”

If I can verify that the vulnerability is a real one, do I really care how and why the neural net found it? This feels like an optimizing compiler to me: I don’t worry about how the compiler is deciding how to optimize the code — I just care that the code is faster, smaller, and so on.

Z • April 20, 2020 9:56 AM

All the tools and practices available to the attacker are also (at least generally) available to the defender. The reason why pentesting and code review activities aren’t done systematically for all new software release is because of a lack of resources – trained and skilled humans able to spot vulnerabilities.

The day we can replace these activities by automated systems, they can be done way more often. Being able to find and fix a large number of vulnerabilities before software are even published would be a major game changer, freeing security professionals from doing a lot of work that is currently necessary and constant around patch management, etc.

Of course, all this depends on how good automated systems can become in finding software vulnerabilities. But they only have to be better and faster than current human vulnerability researchers to be massively useful, and turn a significant chunk of the industry on its head.

F. Wisner • April 20, 2020 10:20 AM

The public record indicates that Microsoft has cooperated with U.S. intelligence to make their products “exploitable.” It would be naive to simply assume that such behavior is simply a thing of the best. Though CEOs will jump cartwheels to reassure users that they take security seriously.

How many ML bugs do you suppose get passed on to spies? Plausibly deniable backdoors. The kind that watchers like best…

Keith Douglas • April 20, 2020 1:01 PM

Like with anything: I’d love to know what false positives are found, what false negatives, etc.

I too also wonder about verification: If we tell developers “use this thing, fix what it tells you and ignore the fact that it can’t explain to you why” I wonder if we are setting ourself up to making the “missed stuff” harder.

Also, since security is relative to an environment, I wonder how that comes into play? This feeds into the other questions, needless to say.

Drone • April 20, 2020 1:20 PM

If you need to resort to using ML just to track bugs in your software product, it’s time to ask yourself: Why is my product so crappy to begin with?

Tech wannabe(too old) • April 20, 2020 2:14 PM

@F.Wisner
Be careful or people here will call you a tin foil hat nut job or some such.
@Drone
Makes sense to me.

JonKnowsNothing • April 20, 2020 4:02 PM

Well… if anyone has ever worked with or looked at any Engineering Bug Database (not the one they give to Marketing and definitely not the numbers they share with finance + upper management) it’s no surprise about the scale of the problem.

I really do not think a magic-ray-tagging-wand will make any difference because the problem isn’t with the system of tracking, the problem lies in the mindset of programmers and development cycle.

engineers waste time on false positives or miss a critical security vulnerability that has been misclassified

Our goal was to build a machine learning system that classifies bugs as security/non-security and critical/non-critical with a level of accuracy

So the big FIX here is to tag bug reports differently. This isn’t much different than flipping all the previous years P2 to P3 and deleting all the P4s on a periodic basis.

It doesn’t do anything for corner or edge cases, it might find some syntax issues (buffer overruns) but it really isn’t going to do much for logic errors or wrong implementations or implementations that were OK in one release but are deprecated or changed in another except for specific release changes. Those items that piggy back on “other” aspects are still orphaned and once that Dev is G O N E the whole source pile is orphaned.

The concept of “engineers waste time…” is rooted in “no unit test”, “I don’t do documentation”, “I write to my own standards” and “I have something in my software-dev-kit that will work” and “I don’t like the way J did this section”.

The folks that compile bug lists are fully aware of “classifications” and for the most part, those are ignored or reset during the weekly bug scrub/scrum.

What they will get is a nice listing of stuff that will take a million-man-months to fix.

The old adage of is still true:

80% of the full capabilities of the software are used by less than 10% of the clients.

No one is going to fix anything that isn’t on the critical path to release. They might “fix the worst in the next release” or “the next decade”. Which is about what can be expected.

Nothing really new here, just some: “this is not the bug you are looking for…” hand waving.

Phaete • April 20, 2020 4:02 PM

It’s a nice step forward.
Lets hope they just bundle it with their bug producing tools and make it cheap.
I fear however they won’t make it cheap, so we remain with one of the biggest problems of why bugs exist.

It is usually not economically viable or wanted to make perfect functioning bugfree code because it means less profit.

And we have gotten too much used to bugs and companies managing our expectations, saying what we want to hear, there are much cheaper alternatives then bug fixing.

I really hope they bundle it cheap with their studios but i expect it will be just an expensive service, like their malware removal team.

Rj • April 20, 2020 4:32 PM

@Bruce:
“I don’t worry about how the compiler is deciding how to optimize the code — I just care that the code is faster, smaller, and so on.”

I do a lot of safety critical software work, both development and test. If you have a medical device implanted inside your body that is there to keep you alive, you a just a bit pickier about it being bug free that if you are just playing a video game. If you are a passenger in a commercial airplane, you want to feel like it has a chance of ataying in the air. If your country is being defended and software is part of that defense, the stakes a really high. If you are controlling a complex industrial process and need to watch for dangerous conditions, you don’t want another Chernople with your name on it.

Moral: Not all bugs are security vulnerabilities; some carry the death sentence!

There have been many times when I did have to go in a double check the code generated by an optimizing compiler, usually to make sure it was still checking for overflow, underflow, NaN, etc. Also to make sure critical sections were not being corrupted by optimization efforts in multi-threaded code, etc.

Some of us are not permitted the luzery of putting blind trust in their optimizing compilers.

vas pup • April 20, 2020 5:42 PM

PC Owners Rush To Get New Windows Protection 2020 For Free…
https://blog.totalav.com/new-free-windows-security-2020/

Q: Is this real or just honey pot to load program which will secretly collect your data under pretext of security?

Sorry for my attitude, but old Jew told me once ‘whatever is offered for free it’ll be cheaper to pay for’. I trust his wisdom – working usually.

Impossibly Stupid • April 20, 2020 6:17 PM

If I can verify that the vulnerability is a real one, do I really care how and why the neural net found it?

I really hope you’re just feigning ignorance to further discussion. To a scientist, the how and why matters more than any single anecdote or data point ever can. We care about all the vulnerabilities brought to our attention that aren’t real (false positives) and all the ones the ML will miss (false negatives). We want to determine causal factors, not just correlations. We want to see if the class of vulnerabilities detected might point to another set of related vulnerabilities that a human can see, but that no ML has yet been trained to spot. And, like @Drone says, from a business/organizational perspective, we should be asking what mistakes we made in our hiring and development process that got us into such a sorry state to begin with. These systems would probably show the most bang-for-the-buck if they were applied to the high-priced mistakes of high-priced executives, but it’s funny how those in power never seem that eager to apply this tech to their own jobs.

La Abeja • April 20, 2020 7:32 PM

At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month.

The Dems REFUSE to break up the Microsoft monopoly under their very own Sherman Antitrust Act, which is left unenforced by non-commissioned officers of the military-industrial complex.

La Abeja • April 20, 2020 7:38 PM

I mean, imagine how productive all those Microsoft employees could be in hundreds of small businesses, or some of them starting businesses of their own, hiring more of us.

Break it up and strike the non-competes.

JonKnowsNothing • April 20, 2020 8:43 PM

@Impossibly Stupid @All

re:

the how and why matters more than any single anecdote or data point ever can

My anecdotal experience, is that the majority of programmers that I’ve worked with did give a fig, but not two figs.

You get generically two classes: one that fills the bug report with extreme detail and the other that just fills out the report with /bug.

I’m not sure how ML is gonna fix that problem. I can envision ML as doing a pre-complier detail analysis but few programmers are OCD enough to even clear out “warnings” much less “exceptions” that don’t prevent the compile from completing.

Getting details is a lot better than none.

ancient story:

I had a critical bug dropped on my desk. I looked at the details and code references, hunted up the proper section of code, determined the fix and presented it as a solution, got OKed an implemented it. Color me “surprised” when it came back as “not fixed”. I looked and yes – there was the fix. I checked the source repositories and yes the code fix was there. I recompiled and submitted. It came back as “not fixed”. I made the “forbidden journey” over to the testing lab to find out “what the problem was”? Well, the error was in a different section of code that was very similar to the one where I stuck my fix. The original fix I did fixed an error that had not yet surfaced and I repeated the fix for the “twin” code piece.

Getting details that pointed to the “true” error might have shaved 2 days off my ToDo List but it was pure serendipity that I found the mirror-twin code and fixed that too. No extra charge.

I am not sure ML could have done better.

JonKnowsNothing • April 20, 2020 9:02 PM

@Impossibly Stupid

re:

we should be asking what mistakes we made in our hiring and development process that got us into such a sorry state to begin with

I wouldn’t disagree but I think they answer points to the unemployment line – permanently.

There has been lots and loads written about this from management books to programmer development cycle manuals. None of it really did much.

There is the:

Too Technical
Not Technical
Too Young
Too Old
Wrong zip code
Wrong University
No University
Too educated MS or PHD
Too cantankerous
Too wishy-washy
Too Smart
Too Stupid
Wrong sex
Wrong marital status
Wrong accent
Wrong nationality
Only checks N-1 from list of requirements
Must have experience
Must not have experience

The list of “wrongs” goes on and on.

Fixing the hiring process isn’t easy even if some of these are “officially” not acceptable. No boss hires someone smarter than they are, the alpha-team-member is gonna be sure you know your place at the feed trough, and the inside-favorite is going be sure that they get the good bonus reports and you get the dumpster version.

disclosure: I’ve had all of these and more aimed in my direction.

Alex • April 20, 2020 9:23 PM

Did anyone actually look at this Microsoft result?

From what I can tell this system is not scanning the source code of software. It is a tool for classifying already-produced bug reports. It takes the bug report title as input and outputs whether it thinks the bug report is security-related or not. Maybe I’m missing something but this seems to have nothing to do with the idea Bruce was talking about.

Does anyone know if these bug reports are somehow produced by the machine itself or are these just human-created and human readable documents?

La Abeja • April 20, 2020 10:02 PM

@JonKnowsNothing

the error was in a different section of code that was very similar to the one where I stuck my fix. The original fix I did fixed an error that had not yet surfaced and I repeated the fix for the “twin” code piece.

The bosses want a clean fresh build of the entire source tree for every little change. They aren’t willing to rely on (or even work with) the ‘make’ process to avoid duplicating the work of compilation for already compiled modules throughout a working source tree.

JonKnowsNothing • April 21, 2020 1:08 AM

@La Abeja

re:

The bosses want a clean fresh build of the entire source tree for every little change. They aren’t willing to rely on (or even work with) the ‘make’ process to avoid duplicating the work of compilation for already compiled modules throughout a working source tree

There are several good reasons to require this and it’s primarily that programmers are already “over clocked” and do not want to take time for a daily code merge.

If programmers compile against an existing .obj file there is no guarantee that .obj file in their dev-machine is the same version on the main build machine.

Additionally, doing a full source build validates you actually have all the source files. Relying on a build-directory-folder with existing .obj files does not validate the source can be compiled without halt-errors.

It depends on what sort of environment you have too. One truck-Many Drivers or Many Trucks-One Driver. The more programmers hashing about in a module the more merge-clashing there is and the potential for relying on an outdated .obj or even a deprecated one is more hassle than running a full build.

It also depends on how long your build run takes. If it’s only an hour or so, well that used to be java-time. If it’s overnight you might get a call at 3am that the build crashed. If you have a build that takes multiple days and many iterations to build cross-dependent and cross-linked .objs that requires some special timing to get it done and often means On-Call Weekends and Nights while the full process happens.

For some reason, folks trust source-code-repositories with their bread and butter intellectual property. Having a repository failure is yet another reason to validate that your code hasn’t been hovered up by the sweeper mechanism.

La Abeja • April 21, 2020 1:26 AM

If programmers compile against an existing .obj file there is no guarantee that .obj file in their dev-machine is the same version on the main build machine.

And people manage huge projects with a several-hour or days-long build time this way.

Additionally, doing a full source build validates you actually have all the source files. Relying on a build-directory-folder with existing .obj files does not validate the source can be compiled without halt-errors.

You’ve got to be able to fix the show-stopper bugs and pick up the compilation where it left off.

A fresh clean build of the entire tree should only be necessary for a major commit, branch, or release.

Clive Robinson • April 21, 2020 8:50 AM

@ Bruce, rj, ALL,

As @JonKnowsNothing observes,

“It doesn’t do anything for corner or edge cases”

Which is part but not all of the problem…

At the end of the day ML systems are a “crutch” that you limp by on. Thus you get weaker or fail to build the ability to stand on your own two feet.

In the end ML systems will “dumb people down” which is a problem. Because the more interesting and devestating security faults will probably not be found by ML but by humans who have “learnt to think hinky” which is a skill that takes time and real world examination to “get the feel” for when things are wrong.

If you use ML systems then you will end up with coders not programers, who will not develop the “feeling” of when things are wrong.

Now it could be argued that ML systems will “learn to think hinky” but actually I don’t think they can (see Chinese Room argument).

We have been promised AI especialy “Hard AI” is just around the next corner, than I suspect any of us have actually been alive… I see no reason to believe it’s going to change any time soon or for that matter later, and I hope not at all (the privacy and securiry implications are frightening and I don’t mean “Hollywood Style”).

Humans obviously have a usually rare skill that technology can not reproduce. Using ML Systems will reduce the number of people with the rare skill, and I realy do not think that is a good idea.

Racholland • April 21, 2020 10:50 AM

Well… if anyone has ever worked with or looked at any Engineering Bug Database (not the one they give to Marketing and definitely not the numbers they share with finance + upper management) it’s no surprise about the scale of the problem.

I am a developer at a medium-sized software company, and am shocked that the developers are only generating 0.64 bugs/month on average. I can hardly go a day without noting a potential bug or feature request (often in software/documentation from other teams), and a query of our tracker shows I’m originating about 15-20 per month.

Granted, maybe people at MS are more likely to find that the bugs they’re discovering are already in the tracker; but, on the other hand, they have so much more code and complexity that I’d expect any developer to report something every week or two (which would be 3 or 4 times the stated rate).

vas pup • April 21, 2020 4:23 PM

@F. Wisner • April 20, 2020 10:20 AM
That is highly possible.
Please see my post related:
vas pup • April 20, 2020 5:42 PM

The Company never exactly specified what kind of updates they are offering to load meaning how those updates improve their product functionality, security and privacy OR just want to load some kind of surveillance tool without your clear understanding or/and to degrade software you are currently using so to force you to acquire new version of their product. They do need more transparency on that, then it is going to be substantially less doubt in their intentions taking into consideration their monopolistic status on the market.

By the way, don’t pay attention to negative comments: it is always better be safe than sorry, and even paranoid has own enemies.

'; DROP TABLE COMMENTS; -- • April 21, 2020 11:43 PM

Schneier, as has been raised by Alex before, you misinterpreted the topic of the Microsoft article. This isn’t about finding vulnerabilities, it’s about classifying (human-generated) bug reports by severity (critical or not).

Schneier on Security

Comments

Leave a comment Cancel reply