Machine Learning to Detect Software Vulnerabilities

No one doubts that artificial intelligence (AI) and machine learning (ML) will transform cybersecurity. We just don’t know how, or when. While the literature generally focuses on the different uses of AI by attackers and defenders and the resultant arms race between the two I want to talk about software vulnerabilities.

All software contains bugs. The reason is basically economic: The market doesn’t want to pay for quality software. With a few exceptions, such as the space shuttle, the market prioritizes fast and cheap over good. The result is that any large modern software package contains hundreds or thousands of bugs.

Some percentage of bugs are also vulnerabilities, and a percentage of those are exploitable vulnerabilities, meaning an attacker who knows about them can attack the underlying system in some way. And some percentage of those are discovered and used. This is why your computer and smartphone software is constantly being patched; software vendors are fixing bugs that are also vulnerabilities that have been discovered and are being used.

Everything would be better if software vendors found and fixed all bugs during the design and development process, but, as I said, the market doesn’t reward that kind of delay and expense. AI, and machine learning in particular, has the potential to forever change this trade-off.

The problem of finding software vulnerabilities seems well-suited for ML systems. Going through code line by line is just the sort of tedious problem that computers excel at, if we can only teach them what a vulnerability looks like. There are challenges with that, of course, but there is already a healthy amount of academic literature on the topic—and research is continuing. There’s every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.

Finding vulnerabilities can benefit both attackers and defenders, but it’s not a fair fight. When an attacker’s ML system finds a vulnerability in software, the attacker can use it to compromise systems. When a defender’s ML system finds the same vulnerability, he or she can try to patch the system or program network defenses to watch for and block code that tries to exploit it.

But when the same system is in the hands of a software developer who uses it to find the vulnerability before the software is ever released, the developer fixes it so it can never be used in the first place. The ML system will probably be part of his or her software design tools and will automatically find and fix vulnerabilities while the code is still in development.

Fast-forward a decade or so into the future. We might say to each other, “Remember those years when software vulnerabilities were a thing, before ML vulnerability finders were built into every compiler and fixed them before the software was ever released? Wow, those were crazy years.” Not only is this future possible, but I would bet on it.

Getting from here to there will be a dangerous ride, though. Those vulnerability finders will first be unleashed on existing software, giving attackers hundreds if not thousands of vulnerabilities to exploit in real-world attacks. Sure, defenders can use the same systems, but many of today’s Internet of Things systems have no engineering teams to write patches and no ability to download and install patches. The result will be hundreds of vulnerabilities that attackers can find and use.

But if we look far enough into the horizon, we can see a future where software vulnerabilities are a thing of the past. Then we’ll just have to worry about whatever new and more advanced attack techniques those AI systems come up with.

This essay previously appeared on SecurityIntelligence.com.

Tags: AI, economics of security, essays, machine learning, vulnerabilities

Posted on January 8, 2019 at 6:13 AM • 56 Comments

Comments

Weather • January 8, 2019 7:00 AM

It has to not have bruteforce logic, as there are bugs that the search space is to large to find, and then there are compiler bugs.
They could continue what a Ida debugger add-on did which detect user input then traced all paths to see if they was a crash.
The paths though can have 3 or more directions with each option another 3 to make a tree, but the 3 can go up to dword values.
It won’t remove all bugs Ai or Ml but it should get red of a lot.

me • January 8, 2019 7:00 AM

I think that it will be possible too, also because of “safe languages” like c#, python, … that prevent whole class of vulns because it doesn’t allow to allocate memory, use pointers, access outside bounds…
( compilers/native code written using unsafe languages)

But i think that most of the bugs we see today are “child” of “type confusion”:
today we have 1 memory that store:
-code
-data
-stack (again data)
-return address of functions (today on stack, but with different meaning)

all of this in a single memory, where if you overwrite any of these it will be confused for something different.

what if we have more than one memory?
one for data
one for code
one for return addresses
in this way you can’t overwrite most of the memory:
-data: the problem is still there but less dangerous
-code: in my theoric model it will be read only and on a different memory
-return addresses: in my theoric model only call/ret instructions can write there and there is no way to tamper with it.

in this way even if you have a write-what-where primitive or buffer overflow you can only overwrite data, but data usually is considered untrusted and to be sanitized before used.
you might crash the program but not gain rce.

Tom C • January 8, 2019 7:38 AM

Let me get this straight:
1) the problem today is companies are not spending the resources (money and time) to find/fix bugs and vulnerabilities.
2) The reason for this that the market rewards speed and low cost versus quality and security.
3) The solution to this is machine learning: software built into the SDLC that automagically finds the vulnerabilities and fixes them before the software is released.

Right now, we have SAST, DAST, and IAST products available to find vulnerabilities. These tools should be part of any software developer’s toolkit but aren’t due to cost both in real money and time/resources. Some are even open source, but still aren’t being adequately adopted because the free version doesn’t handle all of the management/tracking issues and because there is still an investment needed to plug in the scanner(s) into the build/deploy/CI/CD systems.

I don’t agree with your vision of ML impacting the overall quality of software within 10 years. The same drivers for bad software show no signs of going away, at least in the US and Asia. Europe may be a different matter, though. Plus, someone has to invest time/money in ML systems for this to happen and they will expect some return on this investment — look at how much a commercial license costs are for SAST, DAST, and IAST systems: tens of thousands of dollars for a single named user or even single application license. If ML is to be much more effective, then I should expect the ML vendor will be asking at least as much.

The answer to the vulnerability problem is not a new tool, but in removing the incentives for producing bad software in the first place.

Iggy • January 8, 2019 7:45 AM

If by “the market” you mean the vendors, agree, they want it fast and cheap, with their sales force ripping the new goo-gah from the hands of R&D before it’s ready. A classic, neverending tension because, humans. So far as I know, the only way to prevent that is to keep something so secret, sales simply cannot get its hooks on it. On the other hand, it can be argued that R&D would never let the product out because they can tinker with it til they keel over. Be careful what you wish for, ML sounds a little like inventing a bug to kill bugs and unleashing it into the wild on a hope and a prayer.

JG4 • January 8, 2019 7:46 AM

Some long and tedious rants from yesteryear at least touching on the intractability of exhaustively searching the relevant parameter spaces. I probably still was a bit short of magnesium then, and the hospital visit to turn on the lights about potassium was coming up. If we define intelligence as the ability to connect cause and effect, then searching for vulnerabilities is very clearly using AI. Using an OODA loop without humans in it to defend against attacks is very clearly AI.

https://www.schneier.com/blog/archives/2015/10/friday_squid_bl_498.html#c6708900

https://www.schneier.com/blog/archives/2017/06/friday_squid_bl_581.html#c6754660

https://www.schneier.com/blog/archives/2017/08/friday_squid_bl_590.html#c6759138

https://www.schneier.com/blog/archives/2017/09/bioluminescent_.html#c6759669

Tangentially-related

A few months ago, as a result of something that Clive mentioned, I stumbled into the fact that Bruce Ames figured out a long time ago that nutritional deficiencies are the least-known root cause of the lifestyle diseases. And that magnesium deficiency is epidemic in the US with 56% of the population critically deficient. The sickcare crime cartel likes it that way. They are part of a network of interlocking crime cartels that sell poison as food, medicine, appliances, toys and housing, then profit from the results, including lifestyle diseases and the angst that makes people buy the poison plastic junk manufactured in a network of re-education camps. Everyone who cares to know already knows that sugar, television and soybean oil are risk factors, but how many people can articulate that our family and friends also are dying of mineral deficiencies? I was looking up Bruce Ames to see if his assay for mutagens had been updated to test the PFAS and PFOA materials that Clive mentioned. Health security is a bit off-topic, but AI is going to revolutionize it. There is a crying need for real-time chemical sensing in the human body and in the environment to detect the toxins. And for connecting cause and effect in matters of health. They are connecting dots faster than we are:

https://www.nakedcapitalism.com/2019/01/kiss-left-medical-data-privacy-goodbye.html

George H.H. Mitchell • January 8, 2019 7:53 AM

Bruce, I’ll take your bet that we will get to a future in which software vulnerabilities are a thing of the past, except that I surely won’t be around to collect.

“Safe” computer languages are a chimera. Any programmer who believes he/she is using one will be less inclined to devote any effort to finding problems ahead of time. The Java community is already thoroughly brainwashed into not worrying about memory use/fragmentation, thread construction/destruction, and races because they believe the compiler can take care of that without any programmer effort at all.

No doubt machine learning will mitigate these problems, but it won’t ever solve them completely. And how confident will we be in the diagnoses provided by those machine learning systems? And who assesses the vulnerabilities in the machine learning systems themselves?

Jane • January 8, 2019 8:02 AM

I am glad to hear that this brand new ML scanning will be cheaper and faster than anything we have today, and relieved that my employers do not allow* static code analysis only because it doesn’t exist yet…
* it would be a security risk to allow me to install, or have someone else install, free and open source software. Only expensive things are safe, but we won’t pay for them.

Weather • January 8, 2019 8:03 AM

@me
At boot you can set any number of global descriptor tables, the Os can use those for what you said.

Michel • January 8, 2019 8:10 AM

If we use ML to fight vulnerabilities, be sure THEY use ML to find vulnerabilities.
Take a look at this simple tool I found that checks PCs for known vulnerabilities simply based on OS, Browser etc.
https://www.safetydetective.com/vulnerability-tool/

It goes to a DB of known issues and just tells the user (90% of times the cure is to update the win version, which is something I do, but non of my family understands they need to).

If simple tools like that, that don’t really use ML, but rather exiting DB of vulnerabilities, are not known or used by anyone (I dare you if you knew this tool existed), I don’t see how the fact that tech improves is a benefit, when mankind still is well… not that smart:)

Derek Jones • January 8, 2019 8:39 AM

The most productive tools for detecting vulnerabilities are the ones mostly hand written for the job. The “using machine learning to…” is just jumping on the AI bandwagon, or researchers who don’t have the skills to hand craft a tool.

Machine learning in software engineering is mostly a train wreck:
http://shape-of-code.coding-guidelines.com/2015/11/23/machine-learning-in-se-research-is-a-bigger-train-wreck-than-i-imagined/

But there are a few interesting uses, e.g., automated fault fixing:
http://shape-of-code.coding-guidelines.com/2009/11/27/software-maintenance-via-genetic-programming/

Things have improved since that post, but I don’t know of any system in production use; it’s coming.

JohnnyS • January 8, 2019 9:04 AM

AI and ML finding bugs in code before release is going to be like fusion power: We’re going to be “almost there” for the foreseeable future.

The reason is that in most “IT practices” like hardware and software architecture, network design, system administration, etc, the “opponent” is a piece of hardware, software, a process or a business goal. But as a wise colleague pointed out: ITSec is the only practice in IT where the opponent is a human mind, with all the imagination and creativity that is within human scope.

Since ML and AI can’t tell the difference between a nude and a sand dune, how is it going to be able to detect a Spectre or Meltdown problem?

Amit Wertheimer • January 8, 2019 9:19 AM

I think that ML\AI does have a lot to contribute to the vulnerability finding problems, but I think that your forecast of “decade or so” to obliterate the existence of vulnerabilities is off by a century or two.

Static code analysis is with us for a while now, it is still not a common enough practice (getting there slowly, I believe).
Vulnerabilities are a shifting constant – a decade ago it was ok to use SHA-1 and 3DES. Today it is less so. Vulnerabilities change with attacks being developed and technology evolves. Any scanner is only effective for the time it is running.
I’m not nearly as familiar with the topic as I would like to be, but wouldn’t an ML based system be limited to only coding mistakes? Many vulnerabilities are a matter of unfortunate design choices, which are not always visible at the code level.

Phaete • January 8, 2019 9:58 AM

@Michel

Take a look at this simple tool I found that checks PCs for known vulnerabilities simply based on OS, Browser etc.

It’s a simple gadget site trying to sell you antivirus and/or malware protection.
Their privacy policy says they can collect all data (IP, personal, email etc) and use it for whatever they want. (D-15) in Privacy Policy
You can sue them only in Israel if you have any issues, their terms of use says.

So keep their ‘commercial spirit’ in mind with what this website tells you to buy, or the information it wants.

Impossibly Stupid • January 8, 2019 10:47 AM

No one doubts that artificial intelligence (AI) and machine learning (ML) will transform cybersecurity.

I do, because I recognize hype when I see it. We are nowhere near having even a rudimentary AI for anything, and ML is nowhere near powerful enough to find most currently unaddressed software vulnerabilities.

AI, and machine learning in particular, has the potential to forever change this trade-off.

You have that backwards. ML is a lesser advance than AI would be. Just because ML exists today as a hot technology doesn’t mean it has limitless potential. Indeed, anyone familiar with the field who understands how CNNs work, if they’re being honest, can tell you that they’re really not well suited to do complex code analysis.

Going through code line by line is just the sort of tedious problem that computers excel at

And they already do that, without the need for any ML/AI buzzword nonsense. As others have said, code analysis tools have existed for quite awhile, and keep getting better and better. All the low-hanging fruits have been picked, and that’s the difficulty for anyone thinking they can throw ML at the problem and suddenly get a ton of new bugs/vulnerabilities detected. Everything that remains is going to be rare, mostly high-level semantic issues involving Halting Problem levels of errors. An ML can’t be trained with so few examples, with code that is spread over many lines and involves many conditional calls to subroutines, and quite possibly many threads of execution on both local and remote machines. ML isn’t going to keep an IoT device from becoming part of a botnet that launches a DDoS attack.

There’s every reason to expect ML systems to get better at this as time goes on, and some reason to expect them to eventually become very good at it.

No, there is no reason to expect this. Someone is disrespectfully lying to you and/or you’re letting yourself be convinced by the ML hype. Yes, it’s an incredibly useful tool that will get better (like all good technology), but a rigorous look at what ML does and what software security needs should be all that any naturally intelligent human needs to understand that ML, for attack or defense, will not be changing the world of software security.

Not only is this future possible, but I would bet on it.

Then let’s bet. I’d like to bet big, but I’ll settle for a gentleman’s wager. Let me know what you’re comfortable with.

But if we look far enough into the horizon, we can see a future where software vulnerabilities are a thing of the past.

Pure fantasy. Eliminating errors of syntax and (some) semantics is a useful task that is already being done by non-ML tools, but no tool short of genuine AI is going to be able to tackle the bigger semantic issues or the vulnerabilities that arise from systems simply being used with a valid intent that doesn’t match their designed use case.

Erik • January 8, 2019 10:59 AM

Wait… what? Isn’t there a premise somewhere in here that we even know how – from a proven systemic, methodological perspective – how to develop secure and / or correct code at a nontrivial scale? Even theoretically? Throwing up one’s hands and saying “market failure” for something that has never existed and nobody knows how to make is a bit much.

That being said, my personal perspective is that figuring out how to do this should be a top priority even though it would completely upend the entire software industry (not really a bad outcome).

TheInformedOne • January 8, 2019 11:07 AM

Machine Learning cannot be used to teach AI common sense. Intelligence is learned from watching others (the current state of AI). Wisdom is acquired by learning not to repeat the mistakes of others (the current state of “some” humans). Therefore, what the industry is currently calling AI is really just a slightly higher form of automation, which can improve the efficiency of some jobs currently performed by humans. While this may in certain cases allow a machine to emulate some human actions, it can’t make a machine think like a human.

Tony • January 8, 2019 12:11 PM

If your AI system is that good, why would you have humans write buggy code and have the AI fix it? Why not cut out the intermediate step and just have the AI system write the code?

Godel Fishbreath • January 8, 2019 12:15 PM

Sorry that this is not directly relevant.
But this is such a computer interesting article.
I hope you have some comments/insights on it.
http://www.graphicjournalism.com/the-other-side-of-silicon-valley-2/

Sed Contra • January 8, 2019 12:37 PM

Except for errors (Sir, I dropped my card deck and didn’t get them back in the right order. You’re fired!), one person’s bug is another’s feature. Waiting for the first language incorporating he DWIM instruction. Is Milner’s ML (not Machine Learning) close ?

major • January 8, 2019 1:12 PM

Ask a software engineer about AI/ML and he will tell you it will revolutionize humanity and put you out of a job. Ask him if it will do his job and he says it must be banned.

POLAR • January 8, 2019 2:05 PM

This reminds me of the saying “All wheel drive’s purpose is to get stuck in god-forgotten places you wouldn’t have been instead”.

(I’ll mention only ML because true AI means a degree of danger to our species far greater than puny nuclear weapons)

ML shifts the undetected bug quality from a “basic/human” level to “very nasty/what?/wtf”.
And exotic, rare, weirdly difficult bugs are exactly what you don’t want in an IoT/ML/AI/self-thinking-and-acting environment, which is the nearby future.

tfb • January 8, 2019 2:15 PM

For ML (which, really, means neural networks in this cycle) systems to be good at finding bugs they need two things: a large set of training data, properly labelled, and for what characterises ‘a bug’ to be rather small compared to the size of the training data. NN’s have got really good at spotting pictures of cats, because there are hundreds of billions (perhaps only tens of billions) of pictures of cats on the internet, all labelled as such, for instance, along with trillions (perhaps only hundreds of billions) of ‘not-cat’ images, also carefully labelled.

Is this true for bug databases? Of course it’s not. Sadly, Bruce has drunk the kool-aid of the current AI hype cycle: I will add this article to my list of things to poke fun at in ten years when we’re in the next AI winter.

Jesse Thompson • January 8, 2019 2:31 PM

No one doubts that artificial intelligence (AI) and machine learning (ML) will transform cybersecurity.

@Clive Robinson. @Clive Robinson doubts that AI and ML will transform cybersecurity. He has stated before that (at least with current algorithmic approaches) there is no metric by which the artifice is “intelligent”, nor any process by which these machines are “learning” anything.

If I understand his position properly, it’s that gradient descent is no more novel than thinking that the sieve of Eratosthenes “learns” all of the primes.

That said, my position is that AI is less us teaching machines to think and more us learning what “thinking” even is when we do it.

EG: if we can remain unimpressed by increasingly human-like judgement from algorithms it’s only because AI research teaches us enough to eat away at the magic of human thought slowly over time.

One example of this being how AI safety research has become the defacto effort to rigorously, mathematically formalize what we think of as morality.

Wael • January 8, 2019 3:07 PM

such as the space shuttle,

First of all, the space shuttle program had its share of “hiccups”, but I’m not interested in debating that point.

the market prioritizes fast and cheap over good.

I’d like to explore that to a deeper level: why’s that the case? Is the “market” the root cause, or is it something more nebulous? There is an economic factor, of course, but there are other factors beneath that.

Wael • January 8, 2019 3:20 PM

Out of order …

if we can only teach them what a vulnerability looks like.

AlphaZero taught itself with remarkable results (although not independently verifiable.) Give AI the fundamentals and let it run with it. AI developing AI; a nice bootstrap evolution. Better than off-shoring, I guess 😉

We just don’t know how, or when.

The “when”, I can’t predict. The “how”: why stop at AI pointing out “bugs”? Have AI do the full development cycle! We? We just sit on a beach and drink from a cup with a little umbrella in it 🙂

Theo • January 8, 2019 3:41 PM

This is wrong headed for the same reason pretty much all software development is wrong headed.
Software, like Gaul, can be trichotomized.
1. Software that is proven to have no bugs.
2. Software that we don’t know if it has bugs or not.
3. Software that is known to have bugs.

Far too much effort is spend on identifying type three and almost none on finding type one.

In my experience in embedded systems many techniques such as dynamic memory allocation, recursion and multi-threading are used to hide static allocation issues we could (just about) deal with, with dynamic allocation problems we couldn’t. Thus turning type 3 software into type 2 software with no realistic hope of showing they worked correctly.

To train the AI we don’t need examples of bugs, we need many examples of correct programs. We also need to reject programs when the AI says it can’t understand them. Good luck with that. Squared.

Keith Douglas • January 8, 2019 3:51 PM

Writing a program (of any kind) to detect all and only vulnerabilities is impossible – that’s more or less the IT security version of Rice’s Theorem. So what does the “automated” process do with inevitable false positives and “interestings”. (I.e., not what the tool classifies as but worthy of investigation on its own right.)

gordo • January 8, 2019 4:31 PM

As applied to the topic at hand, ML-AI sounds like the genetically-modified seeds of the problem it’s been tasked to solve, i.e., the security-as-afterthought issue in the software industry:

The idea that AI is making decisions outside of the knowledge of its human masters is not a new concept and is commonly referred to by computer scientists as the ‘black box’ problem. The basic premise is that in the early days of AI research, the approach was for humans to specify the rules to be followed, and ask the AI to follow them.

Having failed to take off due to it being so complex and time-consuming, machine learning – and later deep learning – became the name of the game. So, instead of letting humans determine AI’s logic, vast amounts of information can be entered into it, and it would determine its own course of action. But how do we know what led to it making a decision?

https://www.siliconrepublic.com/machines/darpa-david-gunning-military-ai-siri

Rob DuWors • January 8, 2019 4:56 PM

One has to wonder if the standard for this enhanced ML security will be higher than the challenge question to leave a comment. Wouldn’t expect Kabuki Theater Security here.

OK, have to be a bit doubtful about technology that relies upon “and then a miracle happens”. What in current AI gives such hope? But maybe start with a more basic question: What is a bug? Seriously, let’s ask what is a bug as opposed to a feature.

Well obviously to an attacker a bug by whatever means or type is a potential feature. Users on their part can’t declare something officially a bug, they can just moan about a poor user experience or beg for a review of system behavior (which is a dubious proposition through most customer support departments). Please consult the nearest User Agreement.

So declaring a bug must belong to the system provider, more accurately to the intended and authorized system sponsor. A bug is not a technical decision, it is a business decision which ultimate is value choice: is this new thing that has manifested itself good or bad? More precisely, a bug is a system functionality which the sponsor did not intended to be in the system but is now displeased to find it present. Thus bugs extend a system’s functionality. Oops we have to have an independent standard of “what at should be” in order to decide if it is a bug, feature, or neutral (the “don’t care” state). This among others things has always been the curse of Computer Security that will never make it possible to fully automate.

Furthermore requirements errors are possible whether they involve the direct kind of contradiction with underlies all of math and logic (and thus in theory detectable) or they are a failure to map to external reality as judged by the sponsor of the sponsor or at a business interaction level by customers, business partners, regulators or other external actors. Those bugs are beyond the reach of all logic and mathematics in themselves.

So somehow the miracle of AI is going to cut through all that semantic jungle. For that ML is about as credible as pre-AI Winter expert systems which hinted at but never came close to delivering. Dumping a crap-load of “training data” changes little to nothing regarding the fundamental problem of even determining if an observed behavior is a bug or not. Also, the other obvious strategy which actually changes nothing in the fundamental problem would be to have AI do all code generation. Pretty nifty, huh? Of course we could trust that surely. Two words: formal undecidable. Undecidable is the bane of all overblown Computer Science claims. Most ML is pretty resistant to any form of proof (about which logic programming had better odds). In fact ML tends to be a collection of heuristics much more than providing algorithmic certainty. In that aspect ML does follow along a path similar to human thought with the similar resulting conundrums beyond resolution by logic and mathematics.

We can expect some improvement in a portion of the errors created by mechanical translation between levels of abstraction from business objectives, to requirements, to implementation, to production, and to the next system iteration. How much is not clear. But $1 will get you $100 it wouldn’t be over 90% at the most extreme, thus making the effectively infinite occurrence of bugs, well, still effectively infinite. Worse yet, it will not get nearly as far with the emergent functional extensions of the system which also create “bugs” at each level rather than mere translation errors. Because judging each one of those means making a value choice – for example the requirements did not define what to do with a color blind person which in that case may define a bug by error of omission. Uh oh, effectively infinite is turning out to be more like literally infinite.

Perhaps when AI takes over all of civilization there will be fantastic means to resolve value choices entirely within the non-biological intelligence community aka AI. It will seem unintelligible to humans who can’t possibly expect to understand its operation. But AI people don’t like that kind of talk because they and the rest of humanity will have literally nothing to say about it. So in the limit, the topic may become moot.

Clive Robinson • January 8, 2019 5:15 PM

@ All,

Does anyone know why the robin is called “robin red breast” when it is clearly orange?

You can actually go and look it up, because it “is known why” but not by very many people.

And this is a problem, before we talk of “machines learning” and being “artificially intelligent” do we actually know what “learning” and “intelligence” actually are in humans or are we just “hand waving”?

But as I’ve noted in the past vulnerabilities (bugs) come in three flavours of instance and class,

1, Known, Knowns.
2, Unknown, Knowns.
3, Unknown Unknowns.

We should be able to test for and find all known instances in known classes of attacks(1) because we have sufficient information to not just build rule but algorithms.

With known classes of attack we may be able to find new instances(2) by an algorithmic or rule based system.

Which leaves us unknown instances in unknown classes(3) of attack. We don’t have examples so no rules by which an algorithm might be built.

How would we expect a determanistic machine to do any better than humans who some would surmise are very nondeyermanistic at the best of times?

Well there is a way but it’s not realy practical. That is you take a piece of software through every possible input staye for every possible input state in a brute forcr manner. And at some point in time either an error will be found or every state will have been exhaustively tested…

Well we know that there is not enough matter/energy or time in the universe to go through even a fraction of a very modest number of states and even quite small programs you could write in not many minutes could easily exhast that number of states.

However that’s not to say we don’t try it. We just sprinkle the idea with a little magic randomness and call it fuzzing. The realy appaling thing fuzzing shows is just how badly code is written because fuzzing turns up bugs quickly… when, if the code we wrote was even moderatly OK, it would not find anything in our life times. But no it finds things in minutes to hours…

But in neither case is there signs of what most would consider “intelligence” or “learning”.

Which makes it a bit of a type 2
or “black swan” issue, because we know what the class of swans looks like even though we may never have seen an instance of a “black swan” only “white swans”.

However we do no with fairly good probability there is a spiecies of birds we have never seen / recognised and for all we know they could be white / black / pink / green / blue or some colour we don’t even have a name for yet (which is what happend with robins, the colour orange was not even named when it got christend “red breast”).

@ Jesse Thompson,

Aside from you trying to put words in my mouth, you have said above,

… is no more novel than thinking that the sieve of Eratosthenes “learns” all of the primes.

The “sieve” learns nothing in any way, that’s why it is named as such. It follows a highly determanistic algorithm from a couple of basic axioms. The Primes are what fall through as the first example of what number to strike out next in the infinite list of successor numbers.

There is no learning only the sieving out from a very simple determanistic algorithm.

Worse to rub the point in as the sieve does not store the primes it’s not even learning as much as a database which you could argue is “‘learning by rote’ with a repitition of one”.

It’s easy to see that if the sieve did remember the primes it had found then it could be speeded up enormously by reflecting around primorials and their multiples.

Clive Robinson • January 8, 2019 5:51 PM

@ Bruce,

Getting from here to there will be a dangerous ride, though.

Yes, and also for another reason you have not mentioned.

It is said that “To err is human” and “humans learn from their mistakes”. The last of those sayings is only partialy true, we only realy learn from our mistakes “when we correct them”.

We only put “;” at the end of statments in C for instance because the compiler punishes us for not doing so.

It would be trivial to get the compiler to silently put the “;” in if the original language designers had not erred in their design.

But lets argue that a compiler could do it, how long do you think it would be before the humans got lazy and did not bother putting in any “;” at all and left it to the compiler to do?

In fact why bother teaching about “;” at all…

The point is humans would not learn as there would be no need to learn.

And the real danger by not learning simple mistakes we will in most cases not learn how to deal with mistakes so when there is a mistake the ML AI software can not deal with, who’s learnt how to go beyond it’s limitations?

But this is not supposition, we are actually seeing currently we are not learning by past mistakes, except in the very short term. Then in a little time say half a decade the majority either have forgotten or more importantly never learned from past mistakes. Thus we re-live them like an endless nightmare.

@ Wael,

I’m guessing you are going to point out the example of the computer learning to play chess shows that ML and AI can rise above humans…

But follow through what that thought realy means?

Do we realy want to become the “cats” or “dogs” or other “pet substitutes” for the computers?

ML or AI does not have to have “physical agency” and do anything even remotely hostile to destroy humanity. Loss of self worth / esteem will be the death knell of society as we currently remember it. What that will mean for humanity I’m not sure. But some years ago it was noted that the brain size of domesticated creatures was about a quater less then their close still wild brethren from which they had come.

Is that to be humanities fate, shrinking brains due to non use and no expectation of use?

Wael • January 8, 2019 10:42 PM

@Clive Robinson,

computer learning to play chess shows that ML and AI can rise above humans…

Computer chess engines rose above human capabilities long ago! They can obliterate the world champion. AlphaZero rose above that in a mere 4 hour training session with itself!

But some years ago it was noted that the brain size of domesticated creatures…

Whales have bigger brains than humans, but we are smarter … I guess. But you’re right: use it or lose it!

Is that to be humanities fate, shrinking brains due to non use and no expectation of use?

Seems to be the case. That or nanites… forgot the movie, my brain shrunk 🙁

Wael • January 8, 2019 11:07 PM

@Clive Robinson,

That or nanites… forgot the movie, my brain shrunk 🙁

The day the earth stood still! Brain back in business 😉

Wael • January 8, 2019 11:15 PM

@Clive Robinson,

1, Known, Knowns.
2, Unknown, Knowns.
3, Unknown Unknowns.

You’ve got to be kidding me! “Unknown knowns”? That’s a “logical heresy”!

You need some sleep, chief! Kick her out and hit the bed! It’s Known unknowns, and unknown unknowns… (I don’t believe in that, btw.)

Gerard van Vooren • January 9, 2019 12:18 AM

And there you have it: The economical (Anglo-Saxon) model that relies on all this. Very powerful.

Clive Robinson • January 9, 2019 4:46 AM

@ Wael,

You’ve got to be kidding me! “Unknown knowns”? That’s a “logical heresy”!

The comma’s gone astray it’s suposed to be “Unknown, knowns”. That is an “unknown instance” of a “known class” ie the second type.

That is you know of “white, swans” and thus long in the past you had your first instance “white” of a new class “swans” (technically ‘genus Cygnus’). You could have predicted that there might be “grey, swans” from the cygnets colouring thus even hypothesized “black, swans” without ever seeing one. Not only was it possible, in fact history shows there were indeed discussions about them from stories of travelers etc. We also know from other birds (Flamingos) we could –maybe– end up with “pink, swans”, but as yet –as far as I know– nobody has yet seen –or made[1] one, but since we’ve had white abd black swans “grey, swan”[2] sort of have appeared.

The point is how reliable are our predictive abilities towards new “instances”. We do get new instances in given classes of attack happening fairly regularly as hindsight shows.

But the real question is how does a new instance arise?

There are two possabilities,

Firstly, “knowing of a class” you can investigate it in other ways. An example is when I talk about “serial signals” and “time based covert channels”. Every serial hardware signal you see you now know can be used as a covert channel, all you have to do is work out the two steps to exploit it (modulate, receive).

Secondly, “not knowing of a class” you investigate a system and see anomalies that you then work out how to do the extra steps to turn it from a potential vulnerability into an exploit. All the first instances of new classes by definition come about this way (although we might not realise it at the time).

Arguably the first method can be turned into rules or algorithms and used to predict new instances. But unless you are an attacker why bother? Because having defined a class you can use it’s characteristics to ensure you modify all systems to be immune to that class, thus no more new instances in that class can happen with new systems.

Which brings us back to @Bruce our hosts point, about such ML AI tools will first be used on “existing systems”…

So yes I expect ML AI to be able to find “prospective instances of known classes” it’s not exactly difficult to do when you understand the base rules of the “fully defined class”.

But how do you find “proto classes” then find there “envelope” to turn them into a “fully defined class” from which you can extract the “base rules” on which to build the required rules or algorithms to find “proto instances”?

We don’t currently know how to do this, when we see it done we “hand wave” or give the process a name such as @Bruce did with “thinking hinky” and hope we can over time work out some methodology or process behind it.

In a way it’s just like the process name “Random” we define it by what it is not which is we say “it is not determanistic”. But that actually describes two things,

1, Random.
2, Determanism we can not see.

Which potentialy means that “Random” does not exist, that is it is all determanistic, but we just can not see why. That is “God does not play dice with the universe” but god probably does shoot pool as a shark[3].

Thus the question arises “Will ‘thinking hinky’ ever be fully determind as a method or process?”. I would say it’s unlikely even if the universe is not fully determanistic.

Currently ML AI are carried out on fully determanistic machines. Alan Turing knew that you had to have a source of “Random” in computers which is why he insisted that it be built into the design of the computer he worked on.

But even if Random is real and we can use it effectively, I don’t think we ever will be able to find all classes of attack. That is whilst we may be able to deduce new classes from “known classes” there will always be some that have no commonality with existing “known classes”.

Further I think that will still hold for ML AI, it might be able to do it faster or differently but it still has to utilize resources. Because even using Random as effectively as possible does not solve the problem of finite resources. We have no reason to think that either instances or classes of attack are finite even though we believe our universe is finite.

Does this realy matter well kind of. Chess is a fully determanistic game you can learn everything there is to know about it thus you can determanistically go through every possible combination of piece positions and moves and produce a map of them. Once you have done this you can –if you have the resources– design a “Mechanical Turk” to play it. As with the much simpler “noughts and crosses” / “tic tack toe”, your Turk will play an optimal game to it’s logical conclusion from any starting point or mistake the opponent makes. The only advantage two such Turks could get is by somehow forcing an opponent to make a mistake, which by definition they can not do, so the game is decided before it begins…[4]. Thus an ML or AI system that has sufficient resources that it’s opponent does not have will win. It’s just a question of it finding a strategy that it’s resource limited opponent will get confused by most quickly. Which in effect is what you saw happen, so no greate surprise there.

If classes of attack are either infinite or beyond the resources of the ML or AI system then it comes down to a match of “Who has better or more resources”. If they also use a Random element then the whole thing colapses down to what many would call “A toss of a coin” (Which I don’t[3] but, I also don’t have a better punchier little phrase every one else believes, hence I have sympathy for Einstein’s predicament 😉

Which turns the whole argument into the old escalaing ECM, ECCM problem.

But worse, it will in the process have “dumed down” the programmers, who will then not be able to make that “quantum leap” that some people think the human mind might be capable of.

Which brings up another point, if attack classes are infinite, and the ML or AI systems do use Random effectively, will the so far elusive quantum computing make a difference? The simple answer is “not realy” whilst it might improve some things, infinite is after all not finite.

Oh and by extension the same applies to the software being tested. The resources it will have to run on are very finite so the software will be quite constrained in what it can do in the way of defending it’s self…

So even if an ML or AI system could find attacks the software being tested is vulnerable to, the question arises of “Can you do anything about it anyway?” to which the logical answer is “Not beyond reason”…

Thus we are always going to have vulnerable software systems that there is nothing software can do to solve.

But then I’ve been saying this for a very long time hence my reasoning for C-v-P and “probabalistic security”. But this time last year we had “proof positive” made public with Meltdown et al…

Software only goes so far down the computing stack, for the majority of software written it goes down only as far as the ISA. For some it’s microcode and for others it’s RTL that’s the limit on where software can go and deal with in the computing stack.

Below this level you have the likes of the MMU and DMA and below that memory. Anyone who can fritz with the MMU or the DMA systems can arbitarily change memory and there is absolutly nothing the CPU or above can do about it in a reliable way.

But the memory it’s self even though it can not be protected by software, can be attacked by software which is what Rowhammer and similar demonstrated. That is the software could do a “reach around” attack beyond the lower hardware protection measure put in place by the MMU and alter memory.

Whilst there are things that could be done in hardware to protect against Rowhammer and similar the price tag will be high. Likewise Meltdown et al, “Can we aford the cost?” Possibly, “Will we pay the cost?” Probably not.

Are there less costly mitigations, yes I’ve talked about them for years, but they all have side effects one way or another. The hardest one to bare is “loss of efficient connectivity”…

Thus my personal belief is vulnerable software and vulnerable systems are here to stay for the majority of users. Which I’m sure will make the SigInt agencies et al happy, oh and also the cyber-criminals steeling from software wallets.

Which reminds me with regards,

You need some sleep, chief! Kick her out and hit the bed!

Yes I did, as I was watching the clock[5] 😉 but now I’ve had some but she is still lurking and in that time somebody has reset the clock at 9:33.

[1] Flamingos are apparently naturaly greyish white. And they do loose their pinkness in captivity. The pink colour comes from their diet of brine shrimp and and blue-green algae

[2] https://ww2.rspb.org.uk/birds-and-wildlife/bird-and-wildlife-guides/ask-an-expert/previous/swaninterbreeding.aspx

[3] Yes it’s an Einstein quote that we now know is false (and he probably did as well). Throwing dice like shooting pool is a fully determanistic process and there is a coin fliping device that shows this. What we “meer mortals” have is “insufficient information”, because we are neither omnipresent nor omnipotent. Hence “hidden variables” or processes as an idea to explain away our current lack of ability. And as Cantor showed with his diagonals there are always going to be things we do not know, becsuse big as it is the physical universe is bounded thus only has a finite entropy.

[4] It was realising this and that the out come of a chess game are not just win / draw / stalemate, which is the reason for the non optimal “fifty move rule” which made me entirely loose interest in the game.

[5] It appears that “Old wine in new bottles” has tried to hide out in the Republic of Macedonia as it’s capital… so the time has not yet elapsed to drag them into the light from under their new choice of bridge.

Wael • January 9, 2019 5:02 AM

@Clive Robinson,

You raised so many issues, each of them is worthy of a thread of its own. We’ll do them one by one. I’ll start when my headache goes away or subsides a little.

Leon • January 9, 2019 6:15 AM

No. Using AI to find vulnerabilities in source code is much more easy than using AI to find vulnerabilities in running systems. White box versus black box. So I am not so pessimistic.
Though not AI, automated code security analyses are gaining track: Microsoft Github.com now checks for know CVE’s and Sonar checks for bad code:
http://nakedsecurity.sophos.com/2017/11/21/github-starts-scanning-millions-of-projects-for-insecure-components/
https://blog.sonarsource.com/pragmatic-application-security

As more and more companies are using these tools (it’s not only for opensource!), automated security analysis will became the new normal 🙂

tfb • January 9, 2019 7:10 AM

@Clive Robinson

If QM is correct (which seems extremely likely) then throwing dice is not fully deterministic in any interesting way: it may be deterministic if you believe in superdeterminism, but we can’t know what the result will be based on the state beforehand, which is the same thing outside of some philosophical niggling. This is the case because you can arrange situations where the way a die falls depends on some bit of quantum state.

However even in a deterministic Newtonian universe (so ignore rubbish like Norton’s dome) this does not actually matter either. There’s a famous (or it should be famous) thought experiment called ‘the electron at the edge of the universe’. In this we imagine an ideal box full of ideal billiard-ball atoms of gas: everything about the box is known and everything is completely deterministic. And we know everything about everything outside the box as well, except we don’t know the position of an electron at the edge of the universe, and so we don’t know its gravitational field (there is no electromagnetism in this world). So, we let the gas in the box run forward and ask: how many collisions will a given particle experience before we lose track of the microstate of the gas due to the gravitational effect of the unknown electron, where this is defined by a particle exiting a collision at 90 degrees to where we predict it will. The answer is ‘about 50’.

Determinism may be philosophically interesting but it is in practice entirely irrelevant to physics, even classical physics.

Wael • January 9, 2019 8:06 AM

@Clive Robinson,

which is the reason for the non optimal “fifty move rule” which made me entirely loose interest in the game.

I play against Fritz, Shredder and HIARCS on my phone. Mostly puzzles as I, like you, lost interest in playing full games.

The comma’s gone astray it’s suposed to be “Unknown, knowns”.

I seriously doubt Donald Rumsfeld had any notion of class vs. instance when he gave us a piece of his mind. He was talking about WMD without necessarily thinking of classes (or may be he did, but did not mean that the “unknown unknowns” are different classes.) I think he just peanut-buttered the whole thing on a loaf of WMD toast, so to speak 😉 But I have a question: did big bad Ronald borrow this expression from you?

if attack classes are infinite

Ever-expanding for a given solution, yes. Infinite, no!

“probabalistic security”. But this time last year we had “proof positive” made public with Meltdown et al…

Wasn’t exactly a shocker, was it ? 🙂 ho-hum event for some of us.

Thus we are always going to have vulnerable software systems that there is nothing software can do to solve.

This needs proof. Show me one, and might just buy it.

I just picked things randomly in your post…

Sancho_P • January 9, 2019 10:18 AM

I’ve stopped reading at:
”… the market doesn’t reward that kind of delay and expense. AI, and machine learning in particular, has the potential to forever change this trade-off.” (@Bruce, my emph)

Oops.
So AI and ML will change politics?
The problem is systemic, not partial.
– Wait, government run by AI and ML may be the solution …?

Wael • January 9, 2019 1:50 PM

@Clive Robinson,

known knowns, known unknowns, and unknown unknowns.

EvilKiru • January 9, 2019 4:26 PM

@major

"Ask a software engineer about AI/ML and he will tell you it will revolutionize humanity and put you out of a job. Ask him if it will do his job and he says it must be banned."

Off topic point #1: There are no software engineers, because nothing about software involves engineering. Most of engineering revolves around facts and proofs, which are in scant supply in software.

On topic point #1: Only incompetent software practitioners fear AI/ML.

Off topic point #2: Those of us who are semi-competent software practitioners know that AI will never attain consciousness and will never be able to replace the core of what we do, which is thinking and reasoning about the problem at hand.

@Tony

"If your AI system is that good, why would you have humans write buggy code and have the AI fix it? Why not cut out the intermediate step and just have the AI system write the code?"

See my off topic point #2.

John Beattie • January 10, 2019 2:13 AM

I’m glad to see the the comments mostly see this as a ‘silver bullet’ with all the negative connotations associated.

One semi-good sign is the current battle about Huawei kit. The solution seems to be to build infrastructure kit in a trusted environment, which means paying for the trusted environment and might mean an increase in quality levels.

Clive Robinson • January 10, 2019 9:05 AM

@ Wael,

But I have a question: did big bad Ronald borrow this expression from you?

Not that I’m aware of, it’s not original to me, If I remember correctly it was being used in the mid 1990s by traders in “One London Bridge” (a butt ugly building[1] that can be seen from way up and down the River Thames, especially with the Shard standing out like a broken lava lamp close to it).

Ever-expanding for a given solution, yes. Infinite, no!

It depends on your view point about the information universe and the physical universe. Some people believe rightly or wrongly that the physical universe “sheds” information universes by the billion and each leads it’s own future in the same way shedding it’s own billons of universes. Whilst this may or may not be true, some also think you can move between them in the downwards direction, but never able to return. So if any of that is true there’s a vast amount of information out there, but your choice is limited.

But there is also the “information is never destroyed” idea, that is you could wind the physical universe backwards because all the information that made it the way it currently is, is there to be unravaled.

Apparently like playing a movie backwards “not” because that would create even further information…

What ever the arguments used you always end up with a lot more information than you start with. Oh and a lot more information capable of being stored inside the physical universe…

Thus the physical universe is in effect a subset of the information universe. In much the same way as an individual number is a member of the infinite set of the natural numbers. As the argument is information can not be destroyed you also have to consider it can not be created either, that is everything that can be, thus can be known is in that infinite set, our “knowledge” is thus just a limited view window with information coming into view and leaving view at a constant rate[2]

Thus “pick a model” most indicate that the classes of attacks is actually going to be infinite, even though the physical universe is finite.

But lets assume for a moment attacks sit on a surface or plain or even just a line the question of granularity comes up. That is how do you differentiate instances of attack, but also say they are parts of classes of attack. Without some kind of granularity there is an infinity of possibilities of division just as there are fractional numbers between two adjacent natural numbers.

With regards,

Wasn’t exactly a shocker, was it ? 🙂 ho-hum event for some of us.

Some would say “entirely predictable due to the normal human condition”. I remember an argument about “speed” and how to ensure people respected it. The cause of the argument was that the more speed colisions happened at the more damage destruction and death it caused thus people should drive more slowly[3]. One person pointed out sarcasticaly that for head on colisions the simple solution was to put a razor sharp spike in the middle of the stearing wheel and drivers should be sans raiment. That is the problem was making cars safer just made people feal safer thus they drove faster, and that doing the opposit might make them drive slower. I remember thinking at the time that would not work because of “marketing” cars were never sold on safety features just speed, luxury and rubbing other peoples noses in it…

Thus when buying a computer are you going to buy one that has a “safe” CPU or one that gives you the “speed” of twice as many frames per second in Doom? Which is how it’s going to be “marketed” to you…

So yes “entirely predictable”…

And that brings us to,

This needs proof. Show me one, and might just buy it.

OK a simple one,

Software development is “past tense” for any given software package, and attack vectors will be “future tense” for the same software package.

That is you can not conciously design for “unknown, unknowns” but you might get lucky with some but not all, which is why the “low hanging fruit” argument is important. That is you can design and develope defensively, such that both attackers of opportunity and targeted attackers will pick something else to attack that is the “easier route” to what they are after. It’s why I keep going on about the fact that “messaging apps” are never going to be secure as long as an attacker can get past the security end point of the app.

The defensive design is go beyond the communication end point the attacker has, which is onto a seperate device/token etc that has to “put the human in as the comms channel” as a fire break, thus a seperate device. It’s something I’ve been saying since before “smart” devices came about, back last century when talking about “authenticate transactions not authenticating the communications channel” with online banking…

Oh and you don’t need to “buy it” your past comments say you’ve bought it a long time ago 🙂

[1] https://www.architecturerevived.com/number-1-london-bridge-london/

[2] As we know/believe our physical universe is finite that is energy/matter is neither created or destroyed there is also a finite amount of information it can hold at any one time.

[3] A classicaly flawed argument as it ignores “relativity” even on the ordinary dictionary sense.

Clive Robinson • January 10, 2019 11:05 AM

@ John Beattie,

One semi-good sign is the current battle about Huawei kit. The solution seems to be to build infrastructure kit in a trusted environment, which means paying for the trusted environment and might mean an increase in quality levels.

It already is in the case of Huawei, however the “commercial arm” of GCHQ that has people working there appear to have had “political direction” from Trump Towers. They produced a suprise report that basically claimed that the best of commercial practice was now nolonger good enough for “Huawei” but obviously considerably less than mundane is OK for US and EU manufacturers in tight with the US…

So US spyware infested systems is OK for the UK but Huawei systems the US can not get into is not…

Such is the standard thinking process of hawkish US national security adviser John Bolton. Who has amongst other things arangrd for a hostage to be taken via trumped up charges. He has also made an excoriating attack threatening the international criminal court (ICC) with sanctions because they had the temerity to mention the well established fact of US war crimes in Afghanistan. With such right wing nationalistic thinking like that which pales in comparison to “My Country right or wrong” it’s amazing things are not kinetic by now…

In fact of the two John Bolton scares a lot more people outside the US than Trump does. Because although Trump is a narcissist and not overly smart, in comparison Bolton is clearly off the reservation by a very long way.

Wael • January 10, 2019 2:14 PM

@Clive Robinson,

Software development is “past tense” for any given software package, and attack vectors will be “future tense” for the same software package.

You left out the maintenance cycle. Software development is past tense and present participle: the so-called SSDLC.

That is you can not conciously design for “unknown, unknowns”

We can! We’ve discussed that aspect repeatedly and we had several ideas on some plausible approaches – C-v-P?

Oh and you don’t need to “buy it” your past comments say you’ve bought it a long time ago 🙂

I’ve been taken for a ride, then 🙁

Wael • January 10, 2019 3:44 PM

@Clive Robinson, CC: @Nick P,

Regarding “Unknown unknowns”:

we had several ideas on some plausible approaches – C-v-P?

Perhaps we haven’t brought C-v-P to closure yet. Here’s a refresher: focus on the Complete Awareness aspect. That thread is scattered all over the blog and spanned a three-four year active discussion period. Perhaps, if I have the time, I’ll collect the relevant discussions and chronologically list them, then identify what needs to be completed. I know we have left out a lot.

Clive Robinson • January 10, 2019 11:26 PM

@ Wael and Nick P (if you are still around),

I know we have left out a lot.

Yes, and the world has moved on which means revising other bits.

Every so often a thought pops up that maybe I should find the box the hardware prototypes went in and dust things off again try and read my scrawled hand written notes and write it all up.

I guess it would be easier though to pull all the posts together first and sort them out, then cut, past, correct, update and finalise it.

Wael • January 11, 2019 12:14 AM

@Clive Robinson,

pull all the posts together first and sort them out

Leave that to me.

Alyer Babtu • January 11, 2019 2:06 AM

Perhaps a way to find the unknown unknowns?

https://statweb.stanford.edu/~candes/papers/FDR_regression.pdf

Wael • January 11, 2019 5:00 PM

@Alyer Babtu, cc:@Clive Robinson,

I was reading the paper when @Clive Robinson mentioned it on a different thread…

Perhaps a way to find the unknown unknowns?

I couldn’t see how the paper applies to finding “Unkown unknowns” in the context of this discussion, for two reasons: I’m not good with statistics, although I like probability theory. Statistics… not my cup of tea. Only read the first two pages and understood absolutely nothing. I forgot the other reason.

My probability theory knowledge limit ends here. Great book, by the way. A classic! Spoke to one of my PhD friends who took that course. I told him parts of it were challenging. He said this course is a child’s game. Mathematical statistics is a … every problem in the book involves some sort of a trick. Nothing is straight forward. Told him if the title has the word statistics, then it’s not for me.

Perhaps you can explain to us what the paper says 🙂

Alyer Babtu • January 12, 2019 10:16 AM

@Wael

how the paper applies to finding “Unkown unknowns” in the context

I had a moment of madness where it seemed there was an analogy. In the discussion, unknown unknowns seemed of interest because they may exert effects in a situation we are concerned with. We, at least conceptually, put up for consideration a wide range of factors, then hope to work to the critical ones. This is just like the paper. The author’s methods reveal which inputs of the model really are operating (no matter that we start with a large superset) and can be expected to always account for most of the response.

Still reading …

Re statistics as tricks – a book that provides a principles based treatment to some major parts of statistics D. A. S. Fraser, Structure of Inference, Wiley, 1968.

https://archive.org/details/FRASERD.A.S.TheStructureOfInference.1968./page/n3

Or from Amazon etc.

David Manheim • January 15, 2019 6:18 AM

There’s a gloriously appropriate example of the current state of machine learning used for bug fixing recently, which happened to the Yelp! App; https://itunes.apple.com/us/app/yelp-local-food-services/id284910350

“We apologize to anyone who had problems with the app this week. We trained a neural net to eliminate all the bugs in the app and it deleted everything. We had to roll everything back. To be fair, we were 100% bug-free… briefly.”

Clive Robinson • January 15, 2019 6:44 AM

@ David Manheim,

There’s a gloriously appropriate example of the current state of machine learning used for bug fixing…

Yes, I realy thought “It’s not yet April” when I read it.

However the,

To be fair, we were 100% bug-free… briefly.

Did make me smile after the “and it deleted everything”…

I’d love to know where “the bad code” is the App or the neural net bug hunter, my money would be on the neural net.

Undisclosed • January 16, 2019 9:36 AM

As a developer I can talk about bugs and security. Those are the product of two things which are not related to programming.

First is that writing good quality code requires experience and hard work, knowledge to acquire. Companies want to pay as low as possible and will tend to use cheaper developers, without the required experience. You get as a result crap because you’re not willing to pay properly the best developers, and they go work for more intelligent people that will pay good money for good security (banks especially). To give you an idea, a Java dev in France is paid from 0 to 2 years of experience around 2100/2200 euro at minimum and experience it’s 2500/2600. I have been offered jobs where they wanted experience + solid knowledge of crypto and security for wages starting at 1600 per month and up to 2000 euro per month. Knowing that someone that only knows Java, with 0 experience, gets jobs that pay starting at 2100 at an absolute minimum. So of course, I won’t even bother to answer. You get what you pay.

Second, programming is a creative job, that requires not to rush things. We have to think seriously before programming. Sometimes, we have to think days or a full week and explore a few libraries and possible implementations before making technical and architectural choices. Most companies will look at you shocked with big round eyes when you tell them “we have to think on this, at least a week and up to a month, because you want a full rebuild with security at each step to replace your existing system”. So we are not given enough time to do the job properly, we are asked to rush things and instead of doing things properly we end up with code in production that is “to be redone” (we call that technical debt that will come bite you in the ass later). So next time you want to upgrade or change something.. well. You have to rewrite half or all of it because you didn’t take the time to do it with care, properly, at first. In other words : stupid project managers that think “delivery date” and “funds” instead of “security” and “how much this is going to cost the company if we fail”.

You also have a serious culture problem. In France, and Europe, you are not allowed to start a project that can fail. So no one will invest money or time in anything that might fail, but if it works, will generate a lot of money. That is why you have no Google, no Facebook, no Youtube, no Github, no major player on Internet coming from France. They are not considering development as a serious job, but a temporary job where people work for 2 to 5 years before moving to a “real job like project management”. So most developers lack experience, are not paid properly, rushed, under-funded, and they get new rear-holes in their ends regularly added by the NSA, the Chinese intelligence services, and seriously, they deserve it.

If you want to be a developer, and a really good one. And do that as a main job in France, their culture here is totally borked. The wages are extremely low compared to other countries, the jobs are considered like modern factory workers, and if you have not moved out of development after 5 years people look at you strangely (with the exception of banks and a few sectors that recruit intelligent people that have international experience and don’t use/consider developers like 90 % of those stupid French companies).

Machine learning won’t help. No more than Sonar Lint currently does. Because projects are rushed. Because security is considered like an add-on you add at the end but not “at each step”. Because developers are pushed out of development because it’s not considered like a real job you can do all your life. Because project managers are vastly incompetent, do not listen, and don’t care.

A security hole is usually a bug that is exploited as a security hole, but first a bug… So fixing bugs and avoiding them is reducing the attack exposed surface. Most security holes I have seen were not security bugs, but simple bugs that get exploited… so coding properly, thinking before working and being listened to is important.

I keep seeing in code “temporary stuff to redo” that never gets redone. Kludges like adding some “full access rights” to some applications because they need to do something and it’s so badly build that if you keep the standard rights/ACL management you cannot find a solution, except by rewriting parts of the code crap in production, with costs because you have to re-test everything. People without experience introducing all the time bugs/security holes.

It’s a mess. And the more years that go by, the more stuff we put on computers. The more fun the hackers are having. The more devastating and costly each hack and data theft is going to be. Wait until we have automated cars and automated stuff and cameras/microphones on every house for the hack-fest to begin at a large scale.

Development is a job. A full time job where people need to do that all their life. Get experience. Be allowed to experiment (like Google letting people have 20 % of their work time to experiment stuff that may end up becoming the next Google Mail that was a pet project…). Some countries like France are so.. out of touch with reality I think it’s too late for them to recover. Their project managers, IT people are so.. dumb they’re easy targets and unless you fire all those morons for a new intelligent generation, they’re screwed.

I write crap code that gets exploited. Thanks to companies and IT managers that want delivery dates, to rush things all the time, have no respect for the position, the job, the craft, the experience it requires. I write code, without Unit testing because “we don’t have time”. No documentation, temporay code all around that becomes permanent.

The future is bleak. I’m sorry but this is not going to get better. There is a serious culture problem in IT and a skill/experience problem to fix. And with companies that want to push as many people into programming as possible, mainly to reduce wages, intelligent people that want good wages will leave that field, leave it to crap, unexperienced, not fit for people, and things are only going to get worse.

They won’t get worse for intelligent people that want security. And for intelligent people like criminals that pay twice my montly wage for me to write code for them. And almost 4000 euro monthly to write exploits and C2 control tools is way more interesting than my current 1900 euro shitty monthly pay.

So guess who gets the best code I do ? Criminal entities. They pay well, they like what they get.

Schneier on Security

Machine Learning to Detect Software Vulnerabilities

Comments

Leave a comment Cancel reply