On Generative AI Security

Microsoft’s AI Red Team just published “Lessons from Red Teaming 100 Generative AI Products.” Their blog post lists “three takeaways,” but the eight lessons in the report itself are more useful:

Understand what the system can do and where it is applied.

You don’t have to compute gradients to break an AI system.

AI red teaming is not safety benchmarking.

Automation can help cover more of the risk landscape.

The human element of AI red teaming is crucial.

Responsible AI harms are pervasive but difficult to measure.

LLMs amplify existing security risks and introduce new ones.

The work of securing AI systems will never be complete.

Tags: AI, computer security, cyberattack, LLM, Microsoft

Posted on February 5, 2025 at 7:03 AM • 9 Comments

Comments

Clive Robinson • February 5, 2025 8:18 AM

It’s not a list I would actually much agree with by the way it’s worded.

Take

“4. Automation can help cover more of the risk landscape.”

Really? In What way?

All automation can really do is make the search for “Known Knowns” in effect faster.

It does not make the hunting for “Unknown Unknowns” or “Unknown Knowns” any more effective.

In theory AI systems can “look in the gaps” between “known knowns” and find ways they morph from one into another. Thus show any new “unknown knowns” they find along the way.

The point is it’s a very very rich target environment, any new “unknown knowns” found that way have a very low probability of becoming in active use.

The thing is humans are “quirky by nature” and rarely step by step methodical. Thus most new “unknown knowns” will be of little or no interest as they in effect lack the “fun factor” of breaking new ground finding “unknown unknowns”..

Thus the use of future AI systems to find “unknown knowns” is most likely to be by those who want to,

“Industrialise vulnerability usage”

Which by and large is not researchers or for profit type criminals. But those looking for endless supplies of exploits against ordinary individuals, such as journalists and political opponents.

I could go through the list item by item, but by now I hope most people realise the list says more about those who drew it up than it does about the reality of what is going on.

Bob • February 5, 2025 11:42 AM

@Clive

All automation can really do is make the search for “Known Knowns” in effect faster.

And more complete. Faster and more complete sound like wins to me.

It does not make the hunting for “Unknown Unknowns” or “Unknown Knowns” any more effective.

The human element of AI red teaming is crucial.

To some degree or another, this stuff is here to stay. Forward-thinking has to accept that. To me, a lot of the criticism around AI is echoing what I heard about cloud computing and cryptocurrency back in the day.

If you refuse to consider cloud solutions, you’re a fool. If you don’t have any crypto in your portfolio, you’re a fool.

You’re taking something that changes by the second, pointing to its limitations from yesterday, and making reactionary pronouncements regarding its future. I’m not retired. I don’t have the luxury of pretending the future isn’t coming.

I'm sorry, Dave. I'm afraid I can't do that. • February 5, 2025 2:16 PM

The case studies in the paper illustrate Clive Robinson’s point.

Being intelligently cautious is good for one’s career. The reason that you often see semi-retired or retired professionals behaving cautiously is that cautiousness is one of the traits that kept them alive in their profession.

Anonymous • February 5, 2025 10:01 PM

Wow, I don’t know which adjective to apply there. Insightful or introspective? Nobody else could have have come up with that level of banalities.

Winter • February 6, 2025 1:38 AM

@Bob

To me, a lot of the criticism around AI is echoing what I heard about cloud computing and cryptocurrency back in the day.

I might be older than you and want to add

Who needs a mobile phone? Only business people who want to look important. Vodafone predicted it would sell only a million mobiles max.
https://www.theguardian.com/business/2010/jan/01/25-years-phones-transform-communication
The Internet is a fad/will fail
https://www.newstatesman.com/science-tech/2016/08/25-years-here-are-worst-ever-predictions-about-internet
Email was considered a useless hype that would go away
https://www.theguardian.com/uk-news/2018/dec/28/national-archives-john-majors-aides-emails-were-passing-fad
There Is No Reason for Any Individual to Have a Computer in Their Home
https://conversational-leadership.net/blog/no-reason-computer-in-home/

Clive Robinson • February 6, 2025 5:15 AM

@ Bob,

With regards,

“Faster and more complete sound like wins to me.”

Only for a very brief time at best for the attacker.

You need to think a little further about why I said “known knowns” finding.

As I’ve used in the past I will note a well known equivalent of humans in workplaces and similar and “Fire Drills”.

There are lots of issues both natural and man made that need people to be practiced in evacuating a place or area quickly and safelt. Not just for their safety but the safety of others (think rescuers etc).

Whilst originally designed for “fire in buildings that burned easily” we know from 9/11 they work for terrorist attacks as well. Japan and other places have similar drills for earthquakes, volcanos, mud slides, flooding, tsunamis etc etc.

The point is that the “drills” are not for one instance in one class of issue, they are designed to cover as many instants in as many classes in just the one drill. To accomplish this certain trade offs have had to be made in “building design” and is one of the reasons we have building codes.

The software industry has in the main not yet reached that level of maturity, and for some reason does not appear to want to (time to market, excess features, poor resource usage, etc).

The use of AI will as I said turn out lots of variations on “Known Knowns” and the response will fairly quickly be to stop “fixing the instance” as it mostly is currently but to start “fixing the class” of attack. Thus “the bang for the buck” goes up a lot even though the initial resource cost has risen.

Fixing classes of attack rather than variations on known instances in known classes will make the effectiveness of current LLM and ML AI systems fairly moot fairly quickly and attack windows much shorter in duration.

The thing about current LLM and ML systems is they neither think nor reason, they pattern match and wobble randomly in their output (hence “stochastic parrot”, “hallucination”, and “Hard Bullshit” terms).

They can only work with “known knowns” as input and put slight wobbles that are again within “known sets” on the output.

Think of AI in this respect as a form of “guided fuzzy testing”.

Yes fuzzing increases “testing” speed by going through variations on known patters automatically. But it does not really add “originality”.

The inevitable solution to fuzzing is to make “fixes” less specific (ie instances) and more general (ie classes). Part of this will inevitably be a change in the way software is written, with more engineering and less art, as happened with architecture and building codes and all forms of physical product engineering.

That is we already know how to fix the use of current AI LLM and ML systems for hostile vulnerability finding and exploitation, we just need the incentive to go down that route. That is more engineering less art, which means expending more resources “up front” but getting greater gains in the long term.

As the very old truism from anti-aircraft missile system development had it,

“Rocket science this ain’t, safe design it is.”[1]

(Which I always thought was unfair to Rocket Scientists).

[1] Something it would appear the Russians have not learnt with the number of “unexplained” launch and flight “high order” test failures they’ve had of recent times. That have been “witnessed” by even commercial satellite reconnaissance and other monitoring systems for amongst other things radiological and geological safety.

Peter • February 6, 2025 6:22 AM

Did you guys heard about Deepseek test for data privacy?

https://www.ndtv.com/offbeat/is-deepseek-lying-to-you-it-trainer-and-youtuber-exposes-the-shocking-truth-7641800

lurker • February 7, 2025 10:05 PM

Why is Nr. 8 The work of securing AI systems will never be complete
like a throwaway line at the bottom? If it was Nr. 1, then it defines the problem, or as Sun Tzu said “Know your enemy”

Aah but this is from MS. They always had difficulty getting things the right way round…

lurker • February 7, 2025 10:32 PM

@Peter

I’m unconvinced by his so-called “test” of what packets went where. I would expect a leasst some metering packets to be sent to the parent company, who is using the app, when, where. But when he asked the bot a question he didn’t show us if the question went to, and the answer came from Singapore, which is the nominal domicile of the engine.

On Generative AI Security

Comments

Leave a comment Cancel reply