Can We Build Trustworthy AI?
AI isn't transparent, so we should all be preparing for a world where AI is not trustworthy, write two Harvard researchers.
We will all soon get into the habit of using AI tools for help with everyday problems and tasks. We should get in the habit of questioning the motives, incentives, and capabilities behind them, too.
Imagine you’re using an AI chatbot to plan a vacation. Did it suggest a particular resort because it knows your preferences, or because the company is getting a kickback from the hotel chain? Later, when you’re using another AI chatbot to learn about a complex economic issue, is the chatbot reflecting your politics or the politics of the company that trained it?
For AI to truly be our assistant, it needs to be trustworthy. For it to be trustworthy, it must be under our control; it can’t be working behind the scenes for some tech monopoly. This means, at a minimum, the technology needs to be transparent. And we all need to understand how it works, at least a little bit.
Amid the myriad warnings about creepy risks to well-being, threats to democracy, and even existential doom that have accompanied stunning recent developments in artificial intelligence (AI)—and large language models (LLMs) like ChatGPT and GPT-4—one optimistic vision is abundantly clear: this technology is useful. It can help you find information, express your thoughts, correct errors in your writing, and much more. If we can navigate the pitfalls, its assistive benefit to humanity could be epoch-defining. But we’re not there yet.
Let’s pause for a moment and imagine the possibilities of a trusted AI assistant. It could write the first draft of anything: e-mails, reports, essays, even wedding vows. You would have to give it background information and edit its output, of course, but that draft would be written by a model trained on your personal beliefs, knowledge, and style. It could act as your tutor, answering questions interactively on topics you want to learn about—in the manner that suits you best and taking into account what you already know. It could assist you in planning, organizing, and communicating: again, based on your personal preferences. It could advocate on your behalf with third parties: either other humans or other bots. And it could moderate conversations on social media for you, flagging misinformation, removing hate or trolling, translating for speakers of different languages, and keeping discussions on topic; or even mediate conversations in physical spaces, interacting through speech recognition and synthesis capabilities.
Today’s AIs aren’t up for the task. The problem isn’t the technology—that’s advancing faster than even the experts had guessed—it’s who owns it. Today’s AIs are primarily created and run by large technology companies, for their benefit and profit. Sometimes we are permitted to interact with the chatbots, but they’re never truly ours. That’s a conflict of interest, and one that destroys trust.
The transition from awe and eager utilization to suspicion to disillusionment is a well worn one in the technology sector. Twenty years ago, Google’s search engine rapidly rose to monopolistic dominance because of its transformative information retrieval capability. Over time, the company’s dependence on revenue from search advertising led them to degrade that capability. Today, many observers look forward to the death of the search paradigm entirely. Amazon has walked the same path, from honest marketplace to one riddled with lousy products whose vendors have paid to have the company show them to you. We can do better than this. If each of us are going to have an AI assistant helping us with essential activities daily and even advocating on our behalf, we each need to know that it has our interests in mind. Building trustworthy AI will require systemic change.
First, a trustworthy AI system must be controllable by the user. That means that the model should be able to run on a user’s owned electronic devices (perhaps in a simplified form) or within a cloud service that they control. It should show the user how it responds to them, such as when it makes queries to search the web or external services, when it directs other software to do things like sending an email on a user’s behalf, or modifies the user’s prompts to better express what the company that made it thinks the user wants. It should be able to explain its reasoning to users and cite its sources. These requirements are all well within the technical capabilities of AI systems.
Furthermore, users should be in control of the data used to train and fine-tune the AI system. When modern LLMs are built, they are first trained on massive, generic corpora of textual data typically sourced from across the Internet. Many systems go a step further by fine-tuning on more specific datasets purpose built for a narrow application, such as speaking in the language of a medical doctor, or mimicking the manner and style of their individual user. In the near future, corporate AIs will be routinely fed your data, probably without your awareness or your consent. Any trustworthy AI system should transparently allow users to control what data it uses.
Many of us would welcome an AI-assisted writing application fine tuned with knowledge of which edits we have accepted in the past and which we did not. We would be more skeptical of a chatbot knowledgeable about which of their search results led to purchases and which did not.
You should also be informed of what an AI system can do on your behalf. Can it access other apps on your phone, and the data stored with them? Can it retrieve information from external sources, mixing your inputs with details from other places you may or may not trust? Can it send a message in your name (hopefully based on your input)? Weighing these types of risks and benefits will become an inherent part of our daily lives as AI-assistive tools become integrated with everything we do.
Realistically, we should all be preparing for a world where AI is not trustworthy. Because AI tools can be so incredibly useful, they will increasingly pervade our lives, whether we trust them or not. Being a digital citizen of the next quarter of the twenty-first century will require learning the basic ins and outs of LLMs so that you can assess their risks and limitations for a given use case. This will better prepare you to take advantage of AI tools, rather than be taken advantage by them.
In the world’s first few months of widespread use of models like ChatGPT, we’ve learned a lot about how AI creates risks for users. Everyone has heard by now that LLMs “hallucinate,” meaning that they make up “facts” in their outputs, because their predictive text generation systems are not constrained to fact check their own emanations. Many users learned in March that information they submit as prompts to systems like ChatGPT may not be kept private after a bug revealed users’ chats. Your chat histories are stored in systems that may be insecure.
Researchers have found numerous clever ways to trick chatbots into breaking their safety controls; these work largely because many of the “rules” applied to these systems are soft, like instructions given to a person, rather than hard, like coded limitations on a product’s functions. It’s as if we are trying to keep AI safe by asking it nicely to drive carefully, a hopeful instruction, rather than taking away its keys and placing definite constraints on its abilities.
These risks will grow as companies grant chatbot systems more capabilities. OpenAI is providing developers wide access to build tools on top of GPT: tools that give their AI systems access to your email, to your personal account information on websites, and to computer code. While OpenAI is applying safety protocols to these integrations, it’s not hard to imagine those being relaxed in a drive to make the tools more useful. It seems likewise inevitable that other companies will come along with less bashful strategies for securing AI market share.
Just like with any human, building trust with an AI will be hard won through interaction over time. We will need to test these systems in different contexts, observe their behavior, and build a mental model for how they will respond to our actions. Building trust in that way is only possible if these systems are transparent about their capabilities, what inputs they use and when they will share them, and whose interests they are evolving to represent.