Hacking ChatGPT by Planting False Memories into Its Data
This vulnerability hacks a feature that allows ChatGPT to have long-term memory, where it uses information from past conversations to inform future conversations with that same user. A researcher found that he could use that feature to plant “false memories” into that context window that could subvert the model.
A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.
Subscribe to comments on this entry
Clive Robinson • October 1, 2024 9:37 AM
Well not quite yet the season to be jolly but,
Does give me a fun idea 😉
If it can “go to any web site” then it should be possible to make it do like “Alexa once did” and get it to buy you an Xmas Present or three…
Just saying 😉
More seriously we’ve not given these current AI ML LLM systems “physical agency” but they do have “informational agency” to a certain extent.
Getting an AI to “time delay a purchase” is a POC of other “time delay payloads” that in turn have “physical implications” in the “real world”.
I doubt anyone at these AI Vendors in their rush to get their next big thing out to market have actually thought about their systems being used to getting “physical agency” in the “real world” via having the ability to act as an “information agent”, using non obvious “embedded” commands.
Funny thing is Issac Asimov got there with a story back in the 1950’s. But more famously in a more refined form is the story arc in
Stanley Kubrick’s “2001: A Space Odyssey” which was developed at the same time as Arthur C. Clarke’s book back in the 1960’s before many of those AI Wonks were born…
For those that don’t know the AI in the film called HAL had secret information put in it’s memory about an alien artifact, that the astronauts onboard were unaware of. In trying to deal with two different realities HAL developed symptoms like hallucinations that enabled it to kill the astronauts in “accidents”.