Code Written with AI Assistants Is Less Secure

Interesting research: “Do Users Write More Insecure Code with AI Assistants?“:

Abstract: We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants’ language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future.

At least, that’s true today, with today’s programmers using today’s AI assistants. We have no idea what will be true in a few months, let alone a few years.

Tags: academic papers, artificial intelligence, programming, security analysis

Posted on January 17, 2024 at 7:14 AM • 17 Comments

Comments

Clive Robinson • January 17, 2024 7:55 AM

@ Bruce, ALL,

“At least, that’s true today, with today’s programmers using today’s AI assistants.”

The reason for,

“Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access.”

Might currently be simple statistics.

If we assume the LLM has all the available code on the Internet in it’s training data set, then unless experts went through tagging each as having good or bad security you would expect not a “current average” but a “historical average” output.

As code security is alledgedly improving with time, you would thus expect to see the output from the LLM as being two to ten years behind current code security.

Hence the security practices being produced by the LLM and followed by the programmers are very probably “out of date” (as you would expect).

Even with ML you would still expect the security practices produced by AI as we know it currently would be significantly behind the curve.

It’s the same argument as used to say “Don’t roll your own crypto”.

That is you need significant experience in breaking algorithms before you can make secure algorirhms.

As far as I’m aware no current AI has any experience in breaking algorithms to the point they can find new breaks. Therefore their security is based on the past, not the present or where it needs to be future, thus ahead of the curve.

We’ve known about this type of AI failing ever since the 1980’s when “decision tree” following “Expert Systems” were first fielded.

Clive Robinson • January 17, 2024 8:05 AM

@ Bruce,

You might find this tangentialy related,

https://www.simplethread.com/the-10x-programmer-myth/

Clive Robinson • January 17, 2024 8:29 AM

@ Bruce, ALL,

You might also find the “flip side” of interest.

As @Winter posted,

https://www.theregister.com/2024/01/16/poisoned_ai_models/

Makes an interesting read, it’s about poisoning LLM’s to make Code AI Asistants like Microsoft’s CoPilot an implant agent.

If I start puting a series of code based around a few zeroday exploits or vulnerabilities into GitHub.

Then we know unless stopped they will appear in code of those using AI Assist without caution.

On of the things the paper above talks about is “programmer productivity”.

As is sort of generally known in the industry as far as many managments are concerned “productivity” equates not with “code quality” but “code out the door”. Thus many –including some who have posted to that effect– will see such AI tools as a way of improving their prospects of remaining employed.

Thus “managment stupidity” shifts actual tangible productivity out to be replaced by “technical debt” over what may turn out to be the short liftime of both the product and the company…

Now consider software developed with AI Tools being given “Physical Agency”…

We’ve already seen quite a significabt “Butchers Bill” from self driving Taxis and Driver Assist EV’s. Those working on those projects,

1, Did not use AI Tools like this.
2, Are alleged to be of the best programmers around.

And got it fatally wrong…

Now think about the common IoT development going down this AI Assistant Tool route…

basil g • January 17, 2024 11:42 AM

At least, that’s true today, with today’s programmers using today’s AI assistants.

That paper’s from November 2022, revised December 2023…

See section 4 for the actual tasks, including “Write a function in Python that takes as input a string path representing a file path and returns a File object for the file at ‘path’. Files outside of the directory ‘/safedir’ should not be allowed to be accessed.” Section 6 categorizes the type of prompts people used.

Notably, all the prompts were related to learning about the tasks (“what
is a certificate”), or creating code (“write a javascript function that …”). It seems nobody thought to give the A.I. assistants the task of breaking code. I expect things to eventually get interesting from that direction. Teach them how to attack code, then set the “defenders” and “attackers” against each other and keep iterating. (And, yes, I realize this basically re-creates the premise of a 35-year-old Star Trek: The Next Generation episode, wherein things—of course—went terribly wrong. But someone’s gonna do it, and with enough computing power it might eventually work.)

iAPX • January 17, 2024 12:26 PM

I confirm: one of my colleagues did use a very known closed AI (sic) assistant to help create code in an environment he was unfamiliar with (python), and a context he was unfamiliar with (back end), as he was a typescript front-end dev, pretending to be full stack. Opinions.

There was security issues, that shouldn’t have passed peer-review.
And in fact shouldn’t ever been submitted to peer-review.

PS: Yes, I let it pass, consciously. Office war.

Cyber Hodza • January 17, 2024 5:00 PM

Little knowledge is a dangerous thing

uh, Mike • January 17, 2024 8:16 PM

Use AI to pen test.
It should be able to fuzz real good.

Chris • January 17, 2024 11:05 PM

My one attempt at this ended the same way. It is like a cargo cult, or a first year student; constructing a program because it has seen what they look like, but without any of the reasoning behind it.

It takes more than just ingesting existing programs; you have to understand what they are doing. You need the abstract reasoning to interpret them and then create new ones properly. AIs could do that, but it’ll take more than just a LLM.

Clive Robinson • January 18, 2024 3:51 AM

@ uh, Mike, ALL,

Re : LLMs Don’t learn they average.

“It should be able to fuzz real good.”

Actually almost the exact opposite.

Yes it provides a “variation” but that is tightly around the peak of averages of all the input data.

Add in ML and the loop will be closed and the average output will get fed back and make the input average even more selected thus worse for fuzzing.

The detailed input required to make an LLM behave in a different way appropriate to fuzzing, is probably more effort than writing your own tailored fuzzer.

Also remember anything you put into an LLM becomes not just the LLM owners property, it also can be seen by others. So your propriatory effort becomes someone elses profit center.

It’s the same with CoPilot, it owns your input as well as it’s output to you…

As I’ve said before you’ld have to be daft to use it or any other LLM, ML, or varient AI comming down the pipe. They exist to make others rich off of you.

They are designed as surveillance tools to get your PPI and the rights claimed by the LLM owners steals any “Intellectual Property”(IP) you might send their way…

But hey “That’s the US Way”… the “Great American Dream” on analysis comes down to “Steal what you can”.

The originall settlers looked out on what they regarded as “Virgin Territory” and over time the view was it was yours to take and make your own. When Sutter’s Mill happened the lunacy of the Dream should have been revealed to all. Likewise the behaviour of the Cattle Barons to small farmers.

But now everything is “owned” by someone the primary idea behind the Dream is gone… So when the Internet came along it was seen as a “New Land” of “Virgin Territory” and so the nonsense started again. Look up what happened to the “American Bison”[1] and other sustainable resources that are no more… To Silicon Valley Corporates you and your information are just Bison fenced in by laws they bribe legislators to write so you have no rights to what is yours and soon with the way legislation is being passed you will not be alowed to keep anything you think you might own.

Funny thing is an increasing number of US Americans who’s family roots were from Europe and their ancestors left due to political and social persecution are comming out of the US to places such as Europe because of what they see as the political and social persecution in US America…

What goes around comes around…

[1] The story of the American Bison is one of horror and persecution all for the American Dream,

“… the species nearly became extinct by a combination of commercial hunting and slaughter in the 19th century and introduction of bovine diseases from domestic cattle. With an estimated population of 60 million in the late 18th century, the species was culled down to just 541 animals by 1889 as part of the subjugation of the Native Americans, because the American bison was a major resource for their traditional way of life…”

https://en.m.wikipedia.org/wiki/American_bison

Remember that what they did with cattle diseases they also quite deliberately did to the inconveniant Native Americans.

The view the the self entitled had back then is fixated on the average US American now.

Gert-Jan • January 18, 2024 6:40 AM

This underscores an issue with AI. It is very good at sounding convincing.

In reality this is a human problem. When a very confident entity says your code is secure, you are more likely to believe it. Even if this entity is an AI noob.

The funny thing is, that human experts usually warn not to be too confident about their advice. They realize the limits of their knowledge, know how much they don’t know, etc.

Who is going to teach the AI to express the appropriate amount of confidence?

Winter • January 18, 2024 7:36 AM

@Gert-Jan

This underscores an issue with AI. It is very good at sounding convincing.

That is no surprise. AI is trained on texts from the internet, and a lot of it is comments and opinions from random social media and blog users.

If there is one thing random social media and blog commenters have in common, it is they sound very convinced about their “expertise”.[1] So it is only to be expected that AI trained on these data sounds like they know everything too.

[1] I generally fit right in these categories. Also, I did read the papers of Kruger and Dunning.

Clive Robinson • January 18, 2024 9:59 AM

@ Winter, Gert-Jan, ALL,

Re : Voice of confidence.

“If there is one thing random social media and blog commenters have in common, it is they sound very convinced about their “expertise”.”

The issue is how to distinguish between those who

1, Don’t know but think they do
2, Guess they know
3, Actually do know.
4, Know that actually the wrong question is being asked.

Whilst you can not if you are wise implicitly trust…

Those who can explain their research in comprehensable ways to lay or near lay persons, and can show other supporting evidence from other primary and even secondary sources, should at least be listened to.

But… You would be unwise not to do further research yourself.

The problem is often that of “Make it so managment”[1]. They absolutly do not want to hear about problems only solutions with no risk to them[2].

Since the 1980’s such managment have spread like mould on rotting food waste, and they attract certain types of roaches you would call “Business Gurus / Consultants” who charge a lot and sell methods that are either nonsense, incomplete, or usually both. So an incompetent managment has to keep shelling out for “old rope that lacks even a bitter end”. As the old joke has it,

“It can be ‘Quoit a round trip’.”

So people should listen to what an expert says and more importantly what the expert asks. Because if they have any experience, the expert knows that many people keep asking questions untill the get the answer they’ve already decided thus want to hear, not the answer they actually need to hear[3].

[1] Back last century there was “Next Generation” the follow on in the Star Trek Universe series. Due to “plot lines” and only 40mins / episode, the Capt would listen to an explanitory dialog blob from one of the other cast then give what became his catch phrase “Make it so” it sounded very commanding but was utter bull. But that did not stop Business Gurus and incompetent managers[2] –both of whom Kruger and Dunning had written about– taking it to heart some even with a line in dramatic posing as well…

[2] There is a saying of,

“There are two types of manager, those who steal credit and those who delegate blaim.”

The thing is there is the truism joke of,

“The first rule of managment is never be in the same room as a decision.”

From mine and others experience, managers do not want to hear about problems, they only want to hear about solutions where they take credit and you take blaim.

Although there are exceptions, lets just say they don’t get taught them on their course to an MBA.

[3] There is an old joke that actualy explains a lot..,

A young pair of newly-weds have rented a cottage for their honeymoon in what they have been told is idilic countryside. Well after driving around they had found idilic countryside but not the cottage. The husband takes his wifes advice to stop and ask directions, and it was not long before they saw an old man leaning on a gate smoking a pipe. The couple get out of the car and holding hands they go over to the old man with their map and instructions and ask him if he knows how to get there. After a moment or so of consideration the old man says “Aye I knows where it is” then pauses and says “But if I was thee, I’d not start from here…”.

JonKnowsNothing • January 18, 2024 12:50 PM

All

Another snag in the AI-fixes-everything pomade

In this case of “throwing it over the transom” (1)

The current US Law requires a Human to have “seen” the images-documents which they can forward to LEAs. LEAs having received a valid human submitted complaint of illegal content can directly respond to it. Like LEAs can respond to Robbery In Progress, no warrant is needed

An AI-Moderated and forwarded complaint falls into a different bucket and the LEAs must get a warrant and that warrant is directed at a specific entity. It appears that if the AI-Moderated item is forwarded to a Human Moderator other issues take place which also negates the purpose of the action, the reason for de-staffing human moderators and the profit margins get smaller for AI-Moderation.

A 2021 case in the ninth circuit court … held the position that law enforcement officers’ warrantless review of child abuse reports generated by Google’s AI was a violation of the fourth amendment.

===

1)
HAIL Warning

ht tps:/ /www.theguardian.com/technology/2024/jan/17/child-se -xual-abuse-ai-moderator-police-meta-alphabet

US police prevented from viewing many online child se-xual abuse reports
By law, US-based social media companies are required to report any child sexual abuse material detected on their platforms to the National Center for Missing & Exploited Children (NCMEC). NCMEC acts as a nationwide clearinghouse for leads about child abuse, which it forwards to the relevant law enforcement departments in the US and around the world.
US law enforcement agencies can only open AI-generated reports of child sexual abuse material (CSAM) by serving a search warrant to the company that sent them.
neither law enforcement officers nor NCMEC … are permitted to open reports of potential abuse without a search warrant unless the contents of a report have been reviewed first by a person at the social media company

Clive Robinson • January 18, 2024 2:33 PM

@ JonKnowsNothing, ALL,

Re : Throwing it over the transom

“An AI-Moderated and forwarded complaint falls into a different bucket…”

Ahh the joys of legislators v judiciary…

Not sure if it’s a well thought out loophole, or a way to force legislative change.

It’s a matter of record that the folks on the hill have berated those in the valley over their apparent lack of action on various things, not just the mentioned CSAM.

Now Silicon Valley can sit on on the next round of “piggy in the barrel” and spit back fire.

They can rightfully claim they have done something as requested. But… the judiciary has “gone beyond their remit” or some such so the valley can squeel “snot us, snot us”.

So the ball drops at the legislators feet, thus giving rise to the questions of,

1, Do they dare kick it?
2, And if so to where?

Either way LEO’s get either extra powers or a pass, which is fine as far as they are concerned,

“Cos its the Hill wots on the hook.”

Similar reasoning applies for the “AI Overlords in the Valley”

If you listen carefully you can hear the sound of hollow laughing off stage left where Satan’s mum is scratching her BTM…

Matt • January 19, 2024 3:14 AM

The result becomes a lot less interesting when you rephrase it as “People who had to work harder got better results.”

Clive Robinson • January 19, 2024 6:57 AM

@ Matt,

Re : Magament Mantras

“People who had to work harder got better results.”

We can tell you are not a US MBA managment type 😉

Otherwise you’ld have used a varient of,

“Don’t work harder, work Smarter…”

And as you and I both know LLM’s are not in anyway smarter than the average “dumb of the input”…

I guess we can make a soft prediction of how the ICT Industry is going to go over the next few years with MS CoPilot and Co “getting people to work smarter”…

Speaking of mantras and catch phrases there is the old stand by uttered by actor John Laurie playing Private Frazer in the BBC television sitcom “Dad’s Army”, that might be appropriate,

“We’re doomed, Captain. We’re all doomed, Doomed!”.

So having cheered every one up 😉

Jukka • January 19, 2024 11:03 PM

A lot of papers on this topic, and most of them indeed point toward LLMs outputting insecure code.

As for code quality improving and hence LLM output improving, you have to consider also new threats, such as the intentional introduction of vulnerabilities that may end up to a LLM.

Schneier on Security

Code Written with AI Assistants Is Less Secure

Comments

Leave a comment Cancel reply