Page 82

LLMs and Tool Use

Last March, just two weeks after GPT-4 was released, researchers at Microsoft quietly announced a plan to compile millions of APIs—tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room—into a compendium that would be made accessible to large language models (LLMs). This was just one milestone in the race across industry and academia to find the best ways to teach LLMs how to manipulate tools, which would supercharge the potential of AI more than any of the impressive advancements we’ve seen to date.

The Microsoft project aims to teach AI how to use any and all digital tools in one fell swoop, a clever and efficient approach. Today, LLMs can do a pretty good job of recommending pizza toppings to you if you describe your dietary preferences and can draft dialog that you could use when you call the restaurant. But most AI tools can’t place the order, not even online. In contrast, Google’s seven-year-old Assistant tool can synthesize a voice on the telephone and fill out an online order form, but it can’t pick a restaurant or guess your order. By combining these capabilities, though, a tool-using AI could do it all. An LLM with access to your past conversations and tools like calorie calculators, a restaurant menu database, and your digital payment wallet could feasibly judge that you are trying to lose weight and want a low-calorie option, find the nearest restaurant with toppings you like, and place the delivery order. If it has access to your payment history, it could even guess at how generously you usually tip. If it has access to the sensors on your smartwatch or fitness tracker, it might be able to sense when your blood sugar is low and order the pie before you even realize you’re hungry.

Perhaps the most compelling potential applications of tool use are those that give AIs the ability to improve themselves. Suppose, for example, you asked a chatbot for help interpreting some facet of ancient Roman law that no one had thought to include examples of in the model’s original training. An LLM empowered to search academic databases and trigger its own training process could fine-tune its understanding of Roman law before answering. Access to specialized tools could even help a model like this better explain itself. While LLMs like GPT-4 already do a fairly good job of explaining their reasoning when asked, these explanations emerge from a “black box” and are vulnerable to errors and hallucinations. But a tool-using LLM could dissect its own internals, offering empirical assessments of its own reasoning and deterministic explanations of why it produced the answer it did.

If given access to tools for soliciting human feedback, a tool-using LLM could even generate specialized knowledge that isn’t yet captured on the web. It could post a question to Reddit or Quora or delegate a task to a human on Amazon’s Mechanical Turk. It could even seek out data about human preferences by doing survey research, either to provide an answer directly to you or to fine-tune its own training to be able to better answer questions in the future. Over time, tool-using AIs might start to look a lot like tool-using humans. An LLM can generate code much faster than any human programmer, so it can manipulate the systems and services of your computer with ease. It could also use your computer’s keyboard and cursor the way a person would, allowing it to use any program you do. And it could improve its own capabilities, using tools to ask questions, conduct research, and write code to incorporate into itself.

It’s easy to see how this kind of tool use comes with tremendous risks. Imagine an LLM being able to find someone’s phone number, call them and surreptitiously record their voice, guess what bank they use based on the largest providers in their area, impersonate them on a phone call with customer service to reset their password, and liquidate their account to make a donation to a political party. Each of these tasks invokes a simple tool—an Internet search, a voice synthesizer, a bank app—and the LLM scripts the sequence of actions using the tools.

We don’t yet know how successful any of these attempts will be. As remarkably fluent as LLMs are, they weren’t built specifically for the purpose of operating tools, and it remains to be seen how their early successes in tool use will translate to future use cases like the ones described here. As such, giving the current generative AI sudden access to millions of APIs—as Microsoft plans to—could be a little like letting a toddler loose in a weapons depot.

Companies like Microsoft should be particularly careful about granting AIs access to certain combinations of tools. Access to tools to look up information, make specialized calculations, and examine real-world sensors all carry a modicum of risk. The ability to transmit messages beyond the immediate user of the tool or to use APIs that manipulate physical objects like locks or machines carries much larger risks. Combining these categories of tools amplifies the risks of each.

The operators of the most advanced LLMs, such as OpenAI, should continue to proceed cautiously as they begin enabling tool use and should restrict uses of their products in sensitive domains such as politics, health care, banking, and defense. But it seems clear that these industry leaders have already largely lost their moat around LLM technology—open source is catching up. Recognizing this trend, Meta has taken an “If you can’t beat ’em, join ’em” approach and partially embraced the role of providing open source LLM platforms.

On the policy front, national—and regional—AI prescriptions seem futile. Europe is the only significant jurisdiction that has made meaningful progress on regulating the responsible use of AI, but it’s not entirely clear how regulators will enforce it. And the US is playing catch-up and seems destined to be much more permissive in allowing even risks deemed “unacceptable” by the EU. Meanwhile, no government has invested in a “public option” AI model that would offer an alternative to Big Tech that is more responsive and accountable to its citizens.

Regulators should consider what AIs are allowed to do autonomously, like whether they can be assigned property ownership or register a business. Perhaps more sensitive transactions should require a verified human in the loop, even at the cost of some added friction. Our legal system may be imperfect, but we largely know how to hold humans accountable for misdeeds; the trick is not to let them shunt their responsibilities to artificial third parties. We should continue pursuing AI-specific regulatory solutions while also recognizing that they are not sufficient on their own.

We must also prepare for the benign ways that tool-using AI might impact society. In the best-case scenario, such an LLM may rapidly accelerate a field like drug discovery, and the patent office and FDA should prepare for a dramatic increase in the number of legitimate drug candidates. We should reshape how we interact with our governments to take advantage of AI tools that give us all dramatically more potential to have our voices heard. And we should make sure that the economic benefits of superintelligent, labor-saving AI are equitably distributed.

We can debate whether LLMs are truly intelligent or conscious, or have agency, but AIs will become increasingly capable tool users either way. Some things are greater than the sum of their parts. An AI with the ability to manipulate and interact with even simple tools will become vastly more powerful than the tools themselves. Let’s be sure we’re ready for them.

This essay was written with Nathan Sanders, and previously appeared on Wired.com.

Posted on September 8, 2023 at 7:05 AMView Comments

The Hacker Tool to Get Personal Data from Credit Bureaus

The new site 404 Media has a good article on how hackers are cheaply getting personal information from credit bureaus:

This is the result of a secret weapon criminals are selling access to online that appears to tap into an especially powerful set of data: the target’s credit header. This is personal information that the credit bureaus Experian, Equifax, and TransUnion have on most adults in America via their credit cards. Through a complex web of agreements and purchases, that data trickles down from the credit bureaus to other companies who offer it to debt collectors, insurance companies, and law enforcement.

A 404 Media investigation has found that criminals have managed to tap into that data supply chain, in some cases by stealing former law enforcement officer’s identities, and are selling unfettered access to their criminal cohorts online. The tool 404 Media tested has also been used to gather information on high profile targets such as Elon Musk, Joe Rogan, and even President Joe Biden, seemingly without restriction. 404 Media verified that although not always sensitive, at least some of that data is accurate.

Posted on September 7, 2023 at 7:09 AMView Comments

Inconsistencies in the Common Vulnerability Scoring System (CVSS)

Interesting research:

Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities

Abstract: The Common Vulnerability Scoring System (CVSS) is a popular method for evaluating the severity of vulnerabilities in vulnerability management. In the evaluation process, a numeric score between 0 and 10 is calculated, 10 being the most severe (critical) value. The goal of CVSS is to provide comparable scores across different evaluators. However, previous works indicate that CVSS might not reach this goal: If a vulnerability is evaluated by several analysts, their scores often differ. This raises the following questions: Are CVSS evaluations consistent? Which factors influence CVSS assessments? We systematically investigate these questions in an online survey with 196 CVSS users. We show that specific CVSS metrics are inconsistently evaluated for widespread vulnerability types, including Top 3 vulnerabilities from the ”2022 CWE Top 25 Most Dangerous Software Weaknesses” list. In a follow-up survey with 59 participants, we found that for the same vulnerabilities from the main study, 68% of these users gave different severity ratings. Our study reveals that most evaluators are aware of the problematic aspects of CVSS, but they still see CVSS as a useful tool for vulnerability assessment. Finally, we discuss possible reasons for inconsistent evaluations and provide recommendations on improving the consistency of scoring.

Here’s a summary of the research.

Posted on September 5, 2023 at 7:03 AMView Comments

Friday Squid Blogging: We’re Genetically Engineering Squid Now

Is this a good idea?

The transparent squid is a genetically altered version of the hummingbird bobtail squid, a species usually found in the tropical waters from Indonesia to China and Japan. It’s typically smaller than a thumb and shaped like a dumpling. And like other cephalopods, it has a relatively large and sophisticated brain.

The see-through version is made possible by a gene editing technology called CRISPR, which became popular nearly a decade ago.

Albertin and Rosenthal thought they might be able to use CRISPR to create a special squid for research. They focused on the hummingbird bobtail squid because it is small, a prodigious breeder, and thrives in lab aquariums, including one at the lab in Woods Hole.

Is this far behind?

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Posted on September 1, 2023 at 5:29 PMView Comments

Spyware Vendor Hacked

A Brazilian spyware app vendor was hacked by activists:

In an undated note seen by TechCrunch, the unnamed hackers described how they found and exploited several security vulnerabilities that allowed them to compromise WebDetetive’s servers and access its user databases. By exploiting other flaws in the spyware maker’s web dashboard—used by abusers to access the stolen phone data of their victims—the hackers said they enumerated and downloaded every dashboard record, including every customer’s email address.

The hackers said that dashboard access also allowed them to delete victim devices from the spyware network altogether, effectively severing the connection at the server level to prevent the device from uploading new data. “Which we definitely did. Because we could. Because #fuckstalkerware,” the hackers wrote in the note.

The note was included in a cache containing more than 1.5 gigabytes of data scraped from the spyware’s web dashboard. That data included information about each customer, such as the IP address they logged in from and their purchase history. The data also listed every device that each customer had compromised, which version of the spyware the phone was running, and the types of data that the spyware was collecting from the victim’s phone.

Posted on September 1, 2023 at 7:07 AMView Comments

Own Your Own Government Surveillance Van

A used government surveillance van is for sale in Chicago:

So how was this van turned into a mobile spying center? Well, let’s start with how it has more LCD monitors than a Counterstrike LAN party. They can be used to monitor any of six different video inputs including a videoscope camera. A videoscope and a borescope are very similar as they’re both cameras on the ends of optical fibers, so the same tech you’d use to inspect cylinder walls is also useful for surveillance. Kind of cool, right? Multiple Sony DVD-based video recorders store footage captured by cameras, audio recorders by high-end equipment brand Marantz capture sounds, and time and date generators sync gathered media up for accurate analysis. Circling back around to audio, this van features seven different audio inputs including a body wire channel.

Only $26,795, but you can probably negotiate them down.

Posted on August 31, 2023 at 7:06 AMView Comments

When Apps Go Rogue

Interesting story of an Apple Macintosh app that went rogue. Basically, it was a good app until one particular update…when it went bad.

With more official macOS features added in 2021 that enabled the “Night Shift” dark mode, the NightOwl app was left forlorn and forgotten on many older Macs. Few of those supposed tens of thousands of users likely noticed when the app they ran in the background of their older Macs was bought by another company, nor when earlier this year that company silently updated the dark mode app so that it hijacked their machines in order to send their IP data through a server network of affected computers, AKA a botnet.

This is not an unusual story. Sometimes the apps are sold. Sometimes they’re orphaned, and then taken over by someone else.

Posted on August 30, 2023 at 9:39 AMView Comments

Identity Theft from 1965 Uncovered through Face Recognition

Interesting story:

Napoleon Gonzalez, of Etna, assumed the identity of his brother in 1965, a quarter century after his sibling’s death as an infant, and used the stolen identity to obtain Social Security benefits under both identities, multiple passports and state identification cards, law enforcement officials said.

[…]

A new investigation was launched in 2020 after facial identification software indicated Gonzalez’s face was on two state identification cards.

The facial recognition technology is used by the Maine Bureau of Motor Vehicles to ensure no one obtains multiple credentials or credentials under someone else’s name, said Emily Cook, spokesperson for the secretary of state’s office.

Posted on August 29, 2023 at 7:03 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.