Comments

Tom January 22, 2024 8:12 AM

Something I’ve been expecting to see for a while now is LLM chatbots asking questions on sites like reddit and stack overflow so that the answers can be used to further train the model. But how would we know it was happening?

Clive Robinson January 22, 2024 9:58 AM

@ Tom, ALL,

Re : LLM interrogators.

“But how would we know it was happening?”

Because they currently,

“Speak lightly randomized average.”

Every example I’ve seen so far has that vague feeling of “Marketing Speak” or “Ad copy” and “lack of Empathy”.

Almost like one of those dred psychopathic carefully prepared “PR Spokes persons” syatments. Given by some one who stands in front of the 24hour News cameras and spouts the faux “We understand your feelings of loss at this time and our organisation offers it’s sympathies…” when the organisation has figuratively “crashed and burned” the bus by “driving it off the cliff” as an almost direct result of the policies laid down by senior managment.

Whilst we can spot this now, things will change, faking human warmth, empathy and similar is surely high on the “todo list”.

However if you are able to ask questions LLMs will just give average replies from a limited stock or “go way off of the reservation” (so called AI Hallucination).

Currently the only way for operators of LLMs to stop this is by “Real Life”(RL) “Human Intervention” which can not be done in “Real Time”(RT) currently even with ML giving corpus feedback.

These are “obvious tells” against LLMs even with ML adjustment.

However these “obvious tells” will get to be either faked out or mitigated out. One such way is to stop “unexpected questions during human interlocution”.

One way is ensure only questions which have already been answered by the LLM safely.

It’s a game and the amount of money to be made from even LLM AI without human or AI feedback is immense for a chosen “self selecting” few. Therefore the likes of Micro$haft and Giigle who have both most to gain and most to loose from interogative AI are going to invest a large chunk of change ironing the more obvious tells out.

The ability to remove overt tells is something you realise has been going on for some time now when hearing just a few “prepared” PR Disaster statments from various organisations. But there are still a whole slew of less obvious tells. Working out how to “covert question” to make these less obvious tells more visable is a skill that some have already developed…

So expect this to develop into an “arms race”.

Clive Robinson January 22, 2024 10:30 AM

@ Bruce, ALL,

“I hadn’t thought about this before: identifying bots by searching for distinctive bot phrases.”

That is a very “obvious tell” almost “first order”.

Then there is less obvious tells based around behaviours such as response times.

Then you get into comprehension tells,

Then style tells.

And so on.

The problem for the AI owners trying to fake human response, is the massive level of load it puts on the bots.

One way they will try to get around this is by “sniping tactics” or “drive by comments”.

The thing is not only does the load increase for the AI operators, trying to close down one type of tell just opens up a diffetent type of tell.

Consider it like a Gus Simmon’s Prisoner Problem. As each tell gets in effect randomised the level of redundancy must go up. As the redundancy goes up the information bandwidth also has to go up. Thus any side channel bandwidth also goes up…

Like Traffic Analysis reveals by meta-data analysis the same applies to finding bots.

Importantly though, it’s not just the presence of meta data it’s also it’s absence, especialyy by meta-meta-data analysis.

The thing is there are some real experts out there looking at meta-meta-data and the statistics of the holes it creates. It’s been an less than obvious but growing part of daya forensics that first kind of came to light with what some called “The archeology of hard drives” where tampwring could be spotted because those tampering did not fully understand the OS algorithms for using hard drive sectors and the likes of free lists and how HD performance was enhanced by making sure “minimum head travel” techniques were employed.

I’ve occasionally talked about it when also mentioning “Paper, Paper, NEVER data”. The process of “printing out” or putting files sequentially through a converter to a “virgin drive” that is “non-journaled”. Such processes remove the basic stratification many forensic architecture techniques rely on… Further it puts things in a known time order and mostly –but not always– removes tail end cluster issues, where data in buffers has not been overwritten befor the buffer gets written to the drive.

bisento January 22, 2024 11:00 AM

One could spot AI generated content even on amazon by looking for error messages
https://futurism.com/amazon-products-ai-generated

A sideboard description reads “I’m sorry but I cannot fulfill this request it goes against OpenAI use policy,”

These are the early stages. Its all downhill for the web from here (some say since the eternal september)

Clive Robinson January 22, 2024 11:33 AM

@ bisento, ALL,

From the “futurism.com” article you link to,

“lists a variety of goods ranging from dashboard-mounted compasses for boats to corn cob strippers and pelvic floor strengtheners.”

It does not say if it was an “all in one” list. Such a list of product features in one could potentially bring tears to your eyes 😉

@ ALL,

But on the more serious side, unless it’s a large “General Store” type site, that eclectic range is a clear indicator of,

“Avoid at all costs as you will be ripped off.”

For those that have watched “Futurama” from the same people who did the “Simpsons” they regularly did spoof adverts that would bring a mixture of laughs, winces, and tears to your eyes, just thinking about them.

Thay say that parody holds a mirror to life, well sometimes life just goes the extra light year.

Clive Robinson January 22, 2024 12:27 PM

@ Bruce, ALL,

This from Cory Doctorow is probably relavant,

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

It mostly agrees with things I’ve been saying, but before anyone asks,

“No I’m not Cory in a wooly over the head one piece, nor vice versa.”

What Cory briefly notes but does not amplify on is that the jobs that get hit by AI will be those of the upper middle class well educated. Something you don’t hear a lot about. It’s going to happen this way because AI is very expensive, so can only compeate with those on high wages. And that is not something that is going to change unless there is a very real break through in the way we design micro electronics / nano machines. The human brain runs on less power than a netbook and not much more than some pocket smart devices. In almost all more than basic tasks humans get to the solution faster. Richard Feynman noted Nobel Physicist told a story about a man using an abacus. He was quick and accurate on the basic stuff but complexity left him in the dust.

The same is actually true for all AI sysyems we currently have, and it’s unlikely to change any time soon.

The other thing Cory did not mention that I almost always do is that LLM AI is the next generation of “surveillance technology” for the likes of Giigle and M$ and I’ll be honest and say as it’s predicated on the supposed value of “Personal and Private Information”(PPI), they are likely to get a very cold bath.

As can be seen with the continued downward slope on advertising income for X/Twitter, and similar issues for Meta, the PPI market is a bubble that is deflating and in all honesty the AI Bubble is to the PPI Bubble what a boil is to a pigs backside.

My view is LLM AI has missed it’s spot in the spotlight, thus the real question is,

“Are there other types of AI comming down the pipe, that will keep the AI bubble inflated?”

So far the signs do not look good on this. Thus I suspect LLM’s to go the same way as Crypto-Coins and ML to be much like the following NFT market.

Where Cory is spot on is that of “Cost” it realy does Kill. And so far “Humans are way less expensive thus more productive for any metric where cost is included.

And it’s why I can see only very high waged niche jobs on the edges of academia with respect to teaching not research being effected. Though those knowledge workers such as traditional librarians that support researchers will get hit fairly hard unless they broaden their services beyond those where AI can compete.

lurker January 22, 2024 12:45 PM

@bisento

The scary thing about those Amazon blurbs is that they were bots using AI for the product placements. When those bots get tuned to filter out the AI-GPT disclaimers, is time to go back to the village market, run by people, for people.

@Tom

Right now the chatbot GPTs have no curiosity, they are unable to ask questions by their construction, and like witnesses under (cross-)examination in a law court, by the rules of the game. When it happens I expect like @Clive that we will detect it by the language they use, and by the disclaimers “I’m sorry OpenAI policy prevents me asking this,…”

lurker January 22, 2024 7:17 PM

@David in Toronto

The internet died seven years ago, or earlier …

‘https://en.wikipedia.org/wiki/Dead_Internet_theory

Snarki, child of Loki January 22, 2024 7:33 PM

“parody holds a mirror to life”?

Absolutely. But all too much of modern culture is parody-invariant.

“if x is some statement, Px is the parody of x (i.e., parody transform of x). Normal statements have Px=-x: the statement gets ‘inverted’ when subjected to a parody transform.

But when Px=x, it’s parody-invariant and has the same meaning even when parodied”

(Some will note the similarity to “parity”. And so it is)

Clive Robinson January 23, 2024 1:28 AM

@ David in Toronto, lurker, ALL,

Re : It’s not just the baby with the bath water…

“In relate[d] news, the Internet is proving to be a race to the bottom”

And down the plug hole it went…

As I’ve noted before LLM AI has no intelligence, it’s a “matched filter” that is tuned to match with “average” plus a little random noise.

Think of it if you like, as standing above a crowd at an outdoor event. Untill something causes the crowd to “get” not just “on message” but also saying it “in sync” what you hear is a jumbled mess of random.

You can tune your system for say “Male Germanic” frequencie ranges and phonems (as mobile phone CLEP does) but in the process you loose other voices.

Keep tuning to what is average and all you get is more of the same average plus noise…

One of the reasons “fan chants” are simple is to increase the “on message” and “in sync” effect so,

“The chant rises above the rest”

And in the case of Sports Fans hopefully “drowns out the rest” or atleast “the other team message”.

So the LLM AI does the same…

1, “tunes in”
2, gets “in sync”
3, thus “On message”
4, and “shouts it out”
5, to “drown out the rest”

I’d say it’s “working as designed”.

The only problem is “tunes in” is what the user asks for… Consider the average of “garbage’ is still garbage but with less usefull content (ie low pass integration). To “fake content” the LLM adds “shaped noise” but the average of “shaped noise” is the integration of the shaping curve… So the result is the “shaping curve” gets reinforced, and if not correctly restrained it “howls around” just as an audio system does when the microphone picks up the speaker output.

So in essence it’s a “feedback GIGO” system at work… It even howls but they give it fancy names “AI Hallucination” being just one such “nonsense phrase” designed to cover up there is no “intelligence” what so ever in the system.

The fact Mr AI Sam Altman is easily bamboozled by a journalists simple question and flaps around like “a fish out of water” should be telling people something important.

“LLM AI is a con game designed to seperate idiots from their money.”

It’s not even a new idea, in fact it predates computers by a century or two[1].

See Mozart’s “Musikalisches Wurfelspiel” though he was not the first, he’s the most remembered.

He was also a bit of a practical joker. He wrote a piece of music to be played on the piano. He produced it at a party as a new composition. Every one gathered around to hear the parties host play it… But he could not as the music had the player with his hands far appart on the keyboard and a note from the middle was needed… The host reasonably said after trying, it could not be played. Mozart said it could and after a little banter sat down and proved his point, by hiting the note with his nose, much to everyones laughter (yup entertainment was a little limited back then 😉

Broadly Mozart’s musical dice game system was,

You throw two dice and add them together to get a number. Repeate this sixteen times. Use the numbers as an index into musical phrases. Play the phrases in the order you wrote down the numbers to get your very own Minuet.

There are two basic things to remember,

Firstly all the musical phrases must start and end at an average point otherwise the resulting minuet will sound “discordant”.

Secondly adding two dice together does not give a flat distribution, but a crude “first order approximation” to the “Normal Distribution Curve”[2]. So you need to put your musical phrases in an appropriate order in the table.

In more recent times people have even “computerized” this…

http://www.lottemeijer.com/create/?p=286

The important point is LLM’s are the language version of a “Musical Dice Game” the difference is the users query produces a much more complex distribution to modulate the table by (the table is in effect encoded by the weights in the “neural network”).

So looking behind the Wizards curtain reveals there is no magic, which is why Sam Altman stumbled with the journalists question.

There is a song from the mid “sixties” from the “Mamas and Papas” that has the words,

“You’re going to trip, stumble and fall”

https://m.youtube.com/watch?v=t6EgQFXYxbg

Sam Altman has done the first two steps, thus the question is “how long before the fall?”

In fact the whole song lyrics could be sage advice for Sam and friends at OpenAI and other “bandwaggon establishments”,

If you turn off javascript you can read the lyrics,

https://songmeanings.com/songs/view/3530822107858646547/

[1] “Musikalisches Wurfelspiel” basically translates to “musical dice game”. They were popular in the latter half of the 1700’s when “Home Entertainment” even for the very rich was at best limited… )think about it like those Commercial Radio Stations at the end of the 1900’s which had a “half the tunes every hour must be “Top Ten” policy). So variety, any variety was very definately “the spice of life”.

https://en.m.wikipedia.org/wiki/Musikalisches_Würfelspiel

[2] As Donald Knuth explains in his discussion about generating random numbers with a non flat distribution the more dice values you add up the closer the approximation gets.

Jos January 23, 2024 2:15 AM

@Tom

On LLM questions:

That’s how Quora.com operates nowadays, although there is an opt-out for answer writers.
They used a “partner program” which rewarded users to ask questions for a while, before terminating it (likely they trained the automated process sufficient and a fair number of humans were just spamming the site with garbage questions).

Based on what happened/happens on Quora, I expect that automated question generation would show a fair increase in questions where you would start to wonder what happened, such as: How can we prevent more people from dying and being exploited as they migrate? (https://www.quora.com/unanswered/How-can-we-prevent-more-people-from-dying-and-being-exploited-as-they-migrate)
It does not necessarily mean that all questions are this bad, and if they are not bad we might argue that answering them does add value.
It would be more interesting if one LLM started to generate questions, and another answers. Quora is developing Poe (.com), in which they aim to aggregate other automated creative content creators as well as their own tools.

As LLM’s become more sophisticated I expect that the quality of questions will increase, which might even lead to questions we didn’t think of asking yet.

Since we are still far from deductive and abductive reasoning in AI – as far as I understand – I do not expect that any recent LLM question generator can generate a fair amount of questions to feed back to the LLM without generating a lot of low quality questions as the one above amongst them. On Quora it’s the reason that many experienced users muted the “Quora promp generator”, the bot generating the questions.

I predicted this would happen years ago, prior to the LLM rise, when Quora started their partner program. I told people the input would be used to automate the “question” process. They just needed humans to train the bots.

echo January 23, 2024 6:56 PM

There’s nothing new in this topic or whole discussion. People exposed to far right activity and dark money funded activity have been detecting and filtering this stuff for years. I even use a community developed tool which flags verified and unsafe sources. It doesn’t cope with the deluge but it makes life a little easier even if it’s just a visual confirmation that the interaction is with a bad actor. There are other tools to manage the flow but they’re not currently integrated. At some point point you acquire enough formal and tacit knowledge as well as domain expertise to filter this kind of stuff automatically even if it might appear to be innocuous at first glance to most people. I don’t see anyone on this blog who has half a clue even when they’re beaten around the head with a stick. It’s disappointing but not a surprise. Habit and biases can become somewhat ingrained. It’s tiring but the shift in demographic curve will deal with that over time in spite of the efforts of well funded and well organised people to wind the clock back.

Grima Squeakersen January 24, 2024 3:05 PM

@echo: “I even use a community developed tool which flags verified and unsafe sources”
Those would be sources that regularly espouse viewpoints that disagree with your subjective personal biases, no doubt. You appear to be strong evidence that some examples of homo sapiens are capable of exhibiting the defiencies here attributed to LLM/AI engines.

echo January 24, 2024 4:45 PM

No. It’s easy to separate the subject matter from politics and office politics. The subject matter and community I’m involved with eats bad faith PhD’s and wriggly bots for lunch. The last time I checked AI tools showed how dumb they were. Multiple data pools they work with are so tainted with low quality or bad data it’s not funny. You can go “off book” easily with an AI without trying.

Accusing someone of being a bot without just cause sticks out very badly as a political comment or office politics. It’s not going to fly especially from a random no-name drive-by.

C U Anon January 24, 2024 8:04 PM

Grima Squeakersen :

This realy is quite funny,

echo : It’s not going to fly especially from a random no-name

You’ve been around for a lot longer than the actual random no-name crash and burn.

Coney familiae January 25, 2024 7:02 AM

My mest re-sent posts haff bin ^avto^-midurated in2 slish deff nul. Luuks leik Aim knit uelkum,

Fere uell.

pup vas January 25, 2024 6:12 PM

Who Controls Your Thoughts?
https://nautil.us/who-controls-your-thoughts-498055/

=We caught up with McCarthy-Jones, who walked us through the history of criminalizing “thought crimes,” the physical boundaries of thinking, and how architecture and urban planning are essential for truly free thought.

I think there are four fronts in the battle: There are threats from states, threats from corporations, threats from individuals, and then there’s the

threat from the law—in the sense that if the right to freedom of thought is defined very narrowly, we can leave a lot of thought unprotected from the threats from a new technology.

This last one is a hard one. We talk a lot about brain-reading devices being on the horizon. Elon Musk’s Neuralink keeps popping into the headlines with the idea that he’s going to create some kind of brain-computer interface that will allow us to have our thoughts directly translated to a computer. But the question is: How realistic is this type of technology—and is it a threat we need to think about now? My concern is that this has the potential to slip into a bit of a moral panic.

I think what’s maybe a more immediate threat from new technologies is not brain-reading but more what is called >behavior-reading. That is, the idea of measuring our observable behavior—what we like on Facebook, what websites we visit, what music we like, etcetera—that from knowing those facts about us,

people could impute our mental states and can have a good idea of what it is we’re thinking—and knowing what kinds of buttons they should press to get us to act in a certain way. The combination of that knowledge with AI technologies could be a really huge threat to our autonomy.

!!!In ancient Greece, there were concerns about sophists—people who would use argumentation not to reach the truth but to support certain politicians or political ideas.

now we’re looking at AI as a digital sophist, with a huge power imbalance between what it knows about us and how our minds work—and us as mere mortals.

I think we have to recognize that in front of a persuasive AI, we are in deep trouble.

“To let others think for us is to let others live for us.”

There are instances where you can limit someone’s speech if it’s defamatory or false advertising or fighting words. But thought is unimpeachable, you can create absolute protection for people’s minds.

In 2021, the U.N. issued a special report that included four pillars of freedom of thought: Immunity is the idea you can’t be punished for your thoughts. Integrity is the idea that you can’t manipulate other people’s thoughts. Privacy is the idea that you have a right for your thoughts to be kept private. And fertility is the sense that the state has a duty to create environments for its citizens for freedom of thought.

we’re more likely to get closer to truth when we think together as a group, rather than as individuals.

Psychology research also suggests that in order to have the best chance of getting closest to a truth, you need to have a group of people with a diverse range of ideas in a room. So it’s about creating those spaces.

There is a way to design tech products to—instead of pushing us, through algorithms, in a direction we’re already rowing in—a default mode of more “free-thought” options, giving you a more diverse range of opinions. This would help with the problem Donald Rumsfeld talked about—”the unknown unknowns,” the things we don’t even know we don’t know.

“If we sacrifice that ability to think, to what extent are we still human?”

with the help of reliable brain-reading devices that can detect lies, creating “zones of obligatory candor.” For example, in a courtroom, you might be mandated to wear this technology to detect if you were telling lies. On the one hand, this would be a gratuitous violation of freedom of thought. But would society say that’s okay in that space?

these days, we increasingly seem to be accomplishing these tasks with headphones on, with YouTube and podcasts. So are these inputs helping us think? Or are we stopping ourselves >thinking by importing other people’s thoughts, every hour of the day?

I’m well aware Twitter can be a fantastic source of new ideas. But there’s a balance to be struck between the extent to which it grabs your attention and draws you away from your own goals. For me, it did more harm than good.

to try to engage more with other people in different forms of conversations, in spaces that are conducive to free thought—spaces where you’re talking to people who you trust, in a safe environment, where you can challenge your own ideas with a diverse range of opinions.=

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.