Trust Issues in AI

This essay was written with Nathan E. Sanders. It originally appeared as a response to Evgeny Morozov in Boston Review‘s forum, “The AI We Deserve.”

For a technology that seems startling in its modernity, AI sure has a long history. Google Translate, OpenAI chatbots, and Meta AI image generators are built on decades of advancements in linguistics, signal processing, statistics, and other fields going back to the early days of computing—and, often, on seed funding from the U.S. Department of Defense. But today’s tools are hardly the intentional product of the diverse generations of innovators that came before. We agree with Morozov that the “refuseniks,” as he calls them, are wrong to see AI as “irreparably tainted” by its origins. AI is better understood as a creative, global field of human endeavor that has been largely captured by U.S. venture capitalists, private equity, and Big Tech. But that was never the inevitable outcome, and it doesn’t need to stay that way.

The internet is a case in point. The fact that it originated in the military is a historical curiosity, not an indication of its essential capabilities or social significance. Yes, it was created to connect different, incompatible Department of Defense networks. Yes, it was designed to survive the sorts of physical damage expected from a nuclear war. And yes, back then it was a bureaucratically controlled space where frivolity was discouraged and commerce was forbidden.

Over the decades, the internet transformed from military project to academic tool to the corporate marketplace it is today. These forces, each in turn, shaped what the internet was and what it could do. For most of us billions online today, the only internet we have ever known has been corporate—because the internet didn’t flourish until the capitalists got hold of it.

AI followed a similar path. It was originally funded by the military, with the military’s goals in mind. But the Department of Defense didn’t design the modern ecosystem of AI any more than it did the modern internet. Arguably, its influence on AI was even less because AI simply didn’t work back then. While the internet exploded in usage, AI hit a series of dead ends. The research discipline went through multiple “winters” when funders of all kinds—military and corporate—were disillusioned and research money dried up for years at a time. Since the release of ChatGPT, AI has reached the same endpoint as the internet: it is thoroughly dominated by corporate power. Modern AI, with its deep reinforcement learning and large language models, is shaped by venture capitalists, not the military—nor even by idealistic academics anymore.

We agree with much of Morozov’s critique of corporate control, but it does not follow that we must reject the value of instrumental reason. Solving problems and pursuing goals is not a bad thing, and there is real cause to be excited about the uses of current AI. Morozov illustrates this from his own experience: he uses AI to pursue the explicit goal of language learning.

AI tools promise to increase our individual power, amplifying our capabilities and endowing us with skills, knowledge, and abilities we would not otherwise have. This is a peculiar form of assistive technology, kind of like our own personal minion. It might not be that smart or competent, and occasionally it might do something wrong or unwanted, but it will attempt to follow your every command and gives you more capability than you would have had without it.

Of course, for our AI minions to be valuable, they need to be good at their tasks. On this, at least, the corporate models have done pretty well. They have many flaws, but they are improving markedly on a timescale of mere months. ChatGPT’s initial November 2022 model, GPT-3.5, scored about 30 percent on a multiple-choice scientific reasoning benchmark called GPQA. Five months later, GPT-4 scored 36 percent; by May this year, GPT-4o scored about 50 percent, and the most recently released o1 model reached 78 percent, surpassing the level of experts with PhDs. There is no one singular measure of AI performance, to be sure, but other metrics also show improvement.

That’s not enough, though. Regardless of their smarts, we would never hire a human assistant for important tasks, or use an AI, unless we can trust them. And while we have millennia of experience dealing with potentially untrustworthy humans, we have practically none dealing with untrustworthy AI assistants. This is the area where the provenance of the AI matters most. A handful of for-profit companies—OpenAI, Google, Meta, Anthropic, among others—decide how to train the most celebrated AI models, what data to use, what sorts of values they embody, whose biases they are allowed to reflect, and even what questions they are allowed to answer. And they decide these things in secret, for their benefit.

It’s worth stressing just how closed, and thus untrustworthy, the corporate AI ecosystem is. Meta has earned a lot of press for its “open-source” family of LLaMa models, but there is virtually nothing open about them. For one, the data they are trained with is undisclosed. You’re not supposed to use LLaMa to infringe on someone else’s copyright, but Meta does not want to answer questions about whether it violated copyrights to build it. You’re not supposed to use it in Europe, because Meta has declined to meet the regulatory requirements anticipated from the EU’s AI Act. And you have no say in how Meta will build its next model.

The company may be giving away the use of LLaMa, but it’s still doing so because it thinks it will benefit from your using it. CEO Mark Zuckerberg has admitted that eventually, Meta will monetize its AI in all the usual ways: charging to use it at scale, fees for premium models, advertising. The problem with corporate AI is not that the companies are charging “a hefty entrance fee” to use these tools: as Morozov rightly points out, there are real costs to anyone building and operating them. It’s that they are built and operated for the purpose of enriching their proprietors, rather than because they enrich our lives, our wellbeing, or our society.

But some emerging models from outside the world of corporate AI are truly open, and may be more trustworthy as a result. In 2022 the research collaboration BigScience developed an LLM called BLOOM with freely licensed data and code as well as public compute infrastructure. The collaboration BigCode has continued in this spirit, developing LLMs focused on programming. The government of Singapore has built SEA-LION, an open-source LLM focused on Southeast Asian languages. If we imagine a future where we use AI models to benefit all of us—to make our lives easier, to help each other, to improve our public services—we will need more of this. These may not be “eolithic” pursuits of the kind Morozov imagines, but they are worthwhile goals. These use cases require trustworthy AI models, and that means models built under conditions that are transparent and with incentives aligned to the public interest.

Perhaps corporate AI will never satisfy those goals; perhaps it will always be exploitative and extractive by design. But AI does not have to be solely a profit-generating industry. We should invest in these models as a public good, part of the basic infrastructure of the twenty-first century. Democratic governments and civil society organizations can develop AI to offer a counterbalance to corporate tools. And the technology they build, for all the flaws it may have, will enjoy a superpower that corporate AI never will: it will be accountable to the public interest and subject to public will in the transparency, openness, and trustworthiness of its development.

Posted on December 9, 2024 at 7:01 AM6 Comments

Comments

Michael Singer December 9, 2024 8:02 AM

“This essay was written with Nathan E. Sanders. It originally appeared as a response in Boston Review‘s forum, ‘The AI We Deserve.’”

Editing comment: it would have been useful to put this statement at the beginning of the post, along with an indication that the post it is responding to was by Evgeny Morozov. Then, the references in the post to “Morozov” would be less confusing.

Clive Robinson December 9, 2024 12:59 PM

Hmm… The premise behind the title of,

“Trust Issues in AI”

Is incorrect and is one of those things that come about due to journalists writing but not listening or learning.

What it actually is is,

“Trust in People”

AI in the current form of ML and LLM are “deterministic systems” that people have famously called “stochastic parrots” are actually not even “just parrots”.

The “stochastic” or “random” comes from “user input” fairly much like it does in any other deterministic system or program.

You can actually build an LLM from a “Relational Database” and a few python or other scripts. It’s grossly inefficient, slow, and of such a limited capability that you would think it all but usless. However the point is it’s working to the recipe on the tin and turning out what you would expect.

Thus with “Trust Issues in AI” you have to realise any trust issues and the likes of hallucinations are due to “Human failings” not the deterministic system.

Which brings us to,

“We agree with Morozov that the “refuseniks,” as he calls them, are wrong to see AI as “irreparably tainted” by its origins.”

Actually current AI systems are 100% “tainted by its origins” it could not be otherwise when you think about it, such is the way humans progress technology.

The two real question that should be both asked and aswered is,

“What is ment by ‘tainted’? and does it actually matter?”

Oddly for many the answer to the second is a simple “NO” because as anyone who has played enough with AI over the past half century will know,

“Words only mean what we currently define them to mean, and in most languages, English especially so definitions are by no means constant.”

It’s one of the reasons those getting on for a “Billion Dollars to Build” AI models are in effect,

“Out of date before we finish building them.”

They then get steadily worse over a quite short period of time, with the “End Of Life”(EoL) Expectance actually measured in months at most…

There are a couple of other issues to consider,

Firstly the current systems actually do not scale with resources… Which “General Knowledge Models” are already a bit of a bust. And throwing more and more GPU’s and even nuclear reactors –Three Mile Island is mentioned– to power them is not going to help.

Part of the reason is that,

“The speed of light is finite.”

An issue I’ve talked about on this blog before.

Secondly, LLM / ML systems actually neither “learn or reason” in the way humans do. In that respect they are little different to a large library or Database. That gets stored in the “weights” of the LLM Network, and indexed in odd ways to give output, that can vary due to the way “questions are asked”.

So if it helps think of them like being a hot desert where the real physical objects visible are actually quite limited. The strange things you see are like heat inversion mirages and similar distorting what is real to appear as something that only looks like it might be real to human eyes but in reality never will be.

Thirdly that “indexing” is “odd” in so many ways it can take quite a while to get your head around just the basics let alone the fundamentals (so I’m not going to start in on it).

However whilst indexing even odd indexing can find information in a database that does not imply that there is any learning or reasoning in the human sense.

Look at it this way,

“Can a book, learn or reason?”

No so don’t expect current AI systems to do so either. Unless you subscribe to the notion of “Stochastic Information Multiplication”, or those,

“Infinite number of Monkey’s where one might in an infinity get the works of Shakespeare out of them.”

Random gets you next to nowhere fast hence the famously fatuous “Drunkards Walk”. To be of use random has to be filtered by appropriate “Fitness Functions” which is why “evolution” in it’s many forms works. And the Drunkard eventually has to go one way or the other along the street.

Remember it’s not just “Learning and Reasoning” that some biological systems can do the issue of “Trust” and “betrayal” are beyond a certain level almost entirely a human thing, though machines can be programmed to fake them more than adequately for those that are not “Learning and Reasoning”.

Jonathan Wilson December 9, 2024 2:30 PM

Anyone who won’t share every single byte of data that went into training their AI shouldn’t be claiming said AI is “open” IMO.

No-one is going to do that of course because it would expose all the copyright violations going on in the AI industry.

ResearcherZero December 13, 2024 1:37 AM

The U.S. lacks a high-level expert advisory mechanism.

‘https://www.nature.com/articles/d41586-024-03906-0

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.