AI & Humans: Making the Relationship Work

Leaders of many organizations are urging their teams to adopt agentic AI to improve efficiency, but are finding it hard to achieve any benefit. Managers attempting to add AI agents to existing human teams may find that bots fail to faithfully follow their instructions, return pointless or obvious results or burn precious time and resources spinning on tasks that older, simpler systems could have accomplished just as well.

The technical innovators getting the most out of AI are finding that the technology can be remarkably human in its behavior. And the more groups of AI agents are given tasks that require cooperation and collaboration, the more those human-like dynamics emerge.

Our research suggests that, because of how directly they seem to apply to hybrid teams of human and digital workers, the most effective leaders in the coming years may still be those who excel at understanding the timeworn principles of human management.

We have spent years studying the risks and opportunities for organizations adopting AI. Our 2025 book, Rewiring Democracy, examines lessons from AI adoption in government institutions and civil society worldwide. In it, we identify where the technology has made the biggest impact and where it fails to make a difference. Today, we see many of the organizations we’ve studied taking another shot at AI adoption—this time, with agentic tools. While generative AI generates, agentic AI acts and achieves goals such as automating supply chain processes, making data-driven investment decisions or managing complex project workflows. The cutting edge of AI development research is starting to reveal what works best in this new paradigm.

Understanding Agentic AI

There are four key areas where AI should reliably boast superhuman performance: in speed, scale, scope and sophistication. Again and again, the most impactful AI applications leverage their capabilities in one or more of these areas. Think of content-moderation AI that can scan thousands of posts in an instant, legislative policy tools that can scale deliberations to millions of constituents, and protein-folding AI that can model molecular interactions with greater sophistication than any biophysicist.

Equally, AI applications that don’t leverage these core capabilities typically fail to impress. For example, Google’s AI Overviews irritate many of its users when the overviews obscure information that could be more efficiently consumed straight from the web results that the AI attempted to synthesize.

Agentic AI extends these core advantages of AI to new tasks and scenarios. The most familiar AI tools are chatbots, image generators and other models that take a single action: ask one question, get one answer. Agentic systems solve more complex problems by using many such AI models and giving each one the capability to use tools like retrieving information from databases and perform tasks like sending emails or executing financial transactions.

Because agentic systems are so new and their potential configurations so vast, we are still learning which business processes they will fit well with and which they will not. Gartner has estimated that 40 per cent of agentic AI projects will be cancelled within two years, largely because they are targeted where they can’t achieve meaningful business impact.

Understanding Agentic AI behavior

To understand the collective behaviors of agentic AI systems, we need to examine the individual AIs that comprise them. When AIs make mistakes or make things up, they can behave in ways that are truly bizarre. But when they work well, the reasons why are sometimes surprisingly relatable.

Tools like ChatGPT drew attention by sounding human. Moreover, individual AIs often behave like individual people, responding to incentives and organizing their own work in much the same ways that humans do. Recall the counterintuitive findings of many early users of ChatGPT and similar large language models (LLMs) in 2022: They seemed to perform better when offered a cash tip, told the answer was really important or were threatened with hypothetical punishments.

One of the most effective and enduring techniques discovered in those early days of LLM testing was ‘chain-of-thought prompting,’ which instructed AIs to think through and explain each step of their analysis—much like a teacher forcing a student to show their work. Individual AIs can also react to new information similar to individual people. Researchers have found that LLMs can be effective at simulating the opinions of individual people or demographic groups on diverse topics, including consumer preferences and politics.

As agentic AI develops, we are finding that groups of AIs also exhibit human-like behaviors collectively. A 2025 paper found that communities of thousands of AI agents set to chat with each other developed familiar human social behaviors like settling into echo chambers. Other researchers have observed the emergence of cooperative and competitive strategies and the development of distinct behavioral roles when setting groups of AIs to play a game together.

The fact that groups of agentic AIs are working more like human teams doesn’t necessarily indicate that machines have inherently human-like characteristics. It may be more nurture than nature: AIs are being designed with inspiration from humans. The breakthrough triumph of ChatGPT was widely attributed to using human feedback during training. Since then, AI developers have gotten better at aligning AI models to human expectations. It stands to reason, then, that we may find similarities between the management techniques that work for human workers and for agentic AI.

Lessons From the Frontier

So, how best to manage hybrid teams of humans and agentic AIs? Lessons can be gleaned from leading AI labs. In a recent research report, Anthropic shared the practical roadmap and published lessons learned while building its Claude Research feature, which uses teams of multiple AI agents to accomplish complex reasoning tasks. For example, using agents to search the web for information and calling external tools to access information from sources like emails and documents.

Advancements in agentic AI enabling new offerings like Claude Research and Amazon Q are causing a stir among AI practitioners because they reveal insights from the frontlines of AI research about how to make agentic AI and the hybrid organizations that leverage it more effective. What is striking about Anthropic’s report is how transparent it is about all the hard-won lessons learned in developing its offering—and the fact that many of these lessons sound a lot like what we find in classic management texts:

LESSON 1: DELEGATION MATTERS.

When Anthropic analyzed what factors lead to excellent performance by Claude Research, it turned out that the best agentic systems weren’t necessarily built on the best or most expensive AI models. Rather, like a good human manager, they need to excel at breaking down and distributing tasks to their digital workers.

Unlike human teams, agentic systems can enlist as many AI workers as needed, onboard them instantly and immediately set them to work. Organizations that can exploit this scalability property of AI will gain a key advantage, but the hard part is assigning each of them to contribute meaningful, complementary work to the overall project.

In classical management, this is called delegation. Any good manager knows that, even if they have the most experience and the strongest skills of anyone on their team, they can’t do it all alone. Delegation is necessary to harness the collective capacity of their team. It turns out this is crucial to AI, too.

The authors explain this result in terms of ‘parallelization’: Being able to separate the work into small chunks allows many AI agents to contribute work simultaneously, each focusing on one piece of the problem. The research report attributes 80 per cent of the performance differences between agentic AI systems to the total amount of computing resources they leverage.

Whether or not each individual agent is the smartest in the digital toolbox, the collective has more capacity for reasoning when there are many AI ‘hands’ working together. In addition to the quality of the output, teams working in parallel get work done faster. Anthropic says that reconfiguring its AI agents to work in parallel improved research speed by 90 per cent.

Anthropic’s report on how to orchestrate agentic systems effectively reads like a classical delegation training manual: Provide a clear objective, specify the output you expect and provide guidance on what tools to use, and set boundaries. When the objective and output format is not clear, workers may come back with irrelevant or irreconcilable information.

LESSON 2: ITERATION MATTERS.

Edison famously tested thousands of light bulb designs and filament materials before arriving at a workable solution. Likewise, successful agentic AI systems work far better when they are allowed to learn from their early attempts and then try again. Claude Research spawns a multitude of AI agents, each doubling and tripling back on their own work as they go through a trial-and-error process to land on the right results.

This is exactly how management researchers have recommended organizations staff novel projects where large teams are tasked with exploring unfamiliar terrain: Teams should split up and conduct trial-and-error learning, in parallel, like a pharmaceutical company progressing multiple molecules towards a potential clinical trial. Even when one candidate seems to have the strongest chances at the outset, there is no telling in advance which one will improve the most as it is iterated upon.

The advantage of using AI for this iterative process is speed: AI agents can complete and retry their tasks in milliseconds. A recent report from Microsoft Research illustrates this. Its agentic AI system launched up to five AI worker teams in a race to finish a task first, each plotting and pursuing its own iterative path to the destination. They found that a five-team system typically returned results about twice as fast as a single AI worker team with no loss in effectiveness, although at the cost of about twice as much total computing spend.

Going further, Claude Research’s system design endowed its top-level AI agent—the ‘Lead Researcher’—with the decision authority to delegate more research iterations if it was not satisfied with the results returned by its sub-agents. They managed the choice of whether or not they should continue their iterative search loop, to a limit. To the extent that agentic AI mirrors the world of human management, this might be one of the most important topics to watch going forward. Deciding when to stop and what is ‘good enough’ has always been one of the hardest problems organizations face.

LESSON 3: EFFECTIVE INFORMATION SHARING MATTERS.

If you work in a manufacturing department, you wouldn’t rely on your division chief to explain the specs you need to meet for a new product. You would go straight to the source: the domain experts in R&D. Successful organizations need to be able to share complex information efficiently both vertically and horizontally.

To solve the horizontal sharing problem for Claude Research, Anthropic innovated a novel mechanism for AI agents to share their outputs directly with each other by writing directly to a common file system, like a corporate intranet. In addition to saving on the cost of the central coordinator having to consume every sub-agent’s output, this approach helps resolve the information bottleneck. It enables AI agents that have become specialized in their tasks to own how their content is presented to the larger digital team. This is a smart way to leverage the superhuman scope of AI workers, enabling each of many AI agents to act as distinct subject matter experts.

In effect, Anthropic’s AI Lead Researchers must be generalist managers. Their job is to see the big picture and translate that into the guidance that sub-agents need to do their work. They don’t need to be experts on every task the sub-agents are performing. The parallel goes further: AIs working together also need to know the limits of information sharing, like what kinds of tasks don’t make sense to distribute horizontally.

Management scholars suggest that human organizations focus on automating the smallest tasks; the ones that are most repeatable and that can be executed the most independently. Tasks that require more interaction between people tend to go slower, since the communication not only adds overhead, but is something that many struggle to do effectively.

Anthropic found much the same was true of its AI agents: “Domains that require all agents to share the same context or involve many dependencies between agents are not a good fit for multi-agent systems today.” This is why the company focused its premier agentic AI feature on research, a process that can leverage a large number of sub-agents each performing repetitive, isolated searches before compiling and synthesizing the results.

All of these lessons lead to the conclusion that knowing your team and paying keen attention to how to get the best out of them will continue to be the most important skill of successful managers of both humans and AIs. With humans, we call this leadership skill empathy. That concept doesn’t apply to AIs, but the techniques of empathic managers do.

Anthropic got the most out of its AI agents by performing a thoughtful, systematic analysis of their performance and what supports they benefited from, and then used that insight to optimize how they execute as a team. Claude Research is designed to put different AI models in the positions where they are most likely to succeed. Anthropic’s most intelligent Opus model takes the Lead Researcher role, while their cheaper and faster Sonnet model fulfills the more numerous sub-agent roles. Anthropic has analyzed how to distribute responsibility and share information across its digital worker network. And it knows that the next generation of AI models might work in importantly different ways, so it has built performance measurement and management systems that help it tune its organizational architecture to adapt to the characteristics of its AI ‘workers.’

Key Takeaways

Managers of hybrid teams can apply these ideas to design their own complex systems of human and digital workers:

DELEGATE.

Analyze the tasks in your workflows so that you can design a division of labour that plays to the strength of each of your resources. Entrust your most experienced humans with the roles that require context and judgment and entrust AI models with the tasks that need to be done quickly or benefit from extreme parallelization.

If you’re building a hybrid customer service organization, let AIs handle tasks like eliciting pertinent information from customers and suggesting common solutions. But always escalate to human representatives to resolve unique situations and offer accommodations, especially when doing so can carry legal obligations and financial ramifications. To help them work together well, task the AI agents with preparing concise briefs compiling the case history and potential resolutions to help humans jump into the conversation.

ITERATE.

AIs will likely underperform your top human team members when it comes to solving novel problems in the fields in which they are expert. But AI agents’ speed and parallelization still make them valuable partners. Look for ways to augment human-led explorations of new territory with agentic AI scouting teams that can explore many paths for them in advance.

Hybrid software development teams will especially benefit from this strategy. Agentic coding AI systems are capable of building apps, autonomously making improvements to and bug-fixing their code to meet a spec. But without humans in the loop, they can fall into rabbit holes. Examples abound of AI-generated code that might appear to satisfy specified requirements, but diverges from products that meet organizational requirements for security, integration or user experiences that humans would truly desire. Take advantage of the fast iteration of AI programmers to test different solutions, but make sure your human team is checking its work and redirecting the AI when needed.

SHARE.

Make sure each of your hybrid team’s outputs are accessible to each other so that they can benefit from each others’ work products. Make sure workers doing hand-offs write down clear instructions with enough context that either a human colleague or AI model could follow. Anthropic found that AI teams benefited from clearly communicating their work to each other, and the same will be true of communication between humans and AI in hybrid teams.

MEASURE AND IMPROVE.

Organizations should always strive to grow the capabilities of their human team members over time. Assume that the capabilities and behaviors of your AI team members will change over time, too, but at a much faster rate. So will the ways the humans and AIs interact together. Make sure to understand how they are performing individually and together at the task level, and plan to experiment with the roles you ask AI workers to take on as the technology evolves.

An important example of this comes from medical imaging. Harvard Medical School researchers have found that hybrid AI-physician teams have wildly varying performance as diagnosticians. The problem wasn’t necessarily that the AI has poor or inconsistent performance; what mattered was the interaction between person and machine. Different doctors’ diagnostic performance benefited—or suffered—at different levels when they used AI tools. Being able to measure and optimize those interactions, perhaps at the individual level, will be critical to hybrid organizations.

In Closing

We are in a phase of AI technology where the best performance is going to come from mixed teams of humans and AIs working together. Managing those teams is not going to be the same as we’ve grown used to, but the hard-won lessons of decades past still have a lot to offer.

This essay was written with Nathan E. Sanders, and originally appeared in Rotman Management Magazine.

Posted on January 8, 2026 at 7:05 AM11 Comments

Comments

Cosmo January 8, 2026 10:25 AM

Was this essay written by an AI?

“Agentic” AIs fail to complete simple tasks up to 70% of the time. Microsoft slashed its sales targets for Copilot. Salesforce’s Agentforce isn’t selling either.

The endless AI boosterism is going to be a bad look when the bubble pops.

Rontea January 8, 2026 10:28 AM

This is a thoughtful overview of the challenges and strategies for integrating agentic AI into human teams. I appreciate how it emphasizes the importance of delegation and iteration—key aspects of effective hybrid team management. The reference to Anthropic’s Claude Research provides a practical example of applying these principles in real-world scenarios, highlighting the value of continuous improvement as AI and human collaboration evolves.

Clive Robinson January 8, 2026 11:05 AM

@ Bruce, ALL,

With regards,

“Leaders of many organizations are urging their teams to adopt agentic AI to improve efficiency, but are finding it hard to achieve any benefit.”

That is because of “the memory problem”… Current AI LLM systems have a pitiful short term memory and no long term memory whilst they are working.

They are like the new inturn that never learns a task and as the old joke has it,

“Have to be retrained after the tea break…”.

Thus they are only good for low grade general tasks that have already been “trained in” the hard way by the ML.

Like the inturn that never learns they do not develop a “world view” or anything close to reasoning.

They are effectively Searl’s Chinese room with the lights on but nobody home.

To develop a “world view” and ve able to learn they need,

1, The ability to update the memory continuously.
2, They need “free agency” and multipoint sensors they can move and use as they chose.

And we are oh so far away from these two necessities they are going to at best be of very limited use.

The UK Gov has a large Dept and they ran a study and the results realy what you would honestly expect.

https://www.schneier.com/blog/archives/2026/01/friday-squid-blogging-squid-found-in-light-fixture.html/#comment-451111

Morley January 8, 2026 11:17 AM

That’s a lot of optimism and advice given that we haven’t even solved hallucinations yet. This article doesn’t pass the smell test, for me.

Bruce, blink twice if you’re under duress.

Clive Robinson January 8, 2026 11:35 AM

@ ALL,

A little quip I saw a few days back,

“A ‘fail’ is what you get when you put ‘AI’ in the middle”

And all jokes aside it makes a valid point

“AIbis not a ‘team member’ but a ‘support tool’ in software.”

And as Cory Doctorow pointed out in his 39C3 Presentation,

“Software is a liability”

That is it is NOT an asset.

The Asset is the knowledge of how to best use it or more importantly not use as it slows the process down.

Cory also pointed out the reason that Bosses are so desperate to get AI to work, and it’s not for productivity of human staff.

Because “Your boss hates you” as deep down the boss knows that if you don’t turn up the work stops, but if they don’t turn up the work continues quite happily with out them.

They like to think of themselves as being “strategic” not “tactical”. In reality above a certain level all they do is “kiss up and network” and get “Consultants to advise” what the Shareholders and other investors “want to hear”.

Think about that and what it really means. Thus why bosses are so desperate to replace those they hate that might “say no”, with anything that will appear to do what they say without question even if the result is unhinged madness 1/3rd of the time, useless for 7/10ths of the time, and fails 19 times out of 20…

Is this “Rational behaviour” by the bosses?

Of course not.

Oh and by the way, guess what consultants are recommending “more AI” as they have to sell it, just as Microsoft do to keep share holders on side for a while… Just as Palantir do and those big consultant firms that have unwisely bought into them…

Are people seeing a pattern here?

lurker January 8, 2026 1:13 PM

@Bruce

A 2025 paper found that communities of thousands of AI agents set to chat with each other developed familiar human social behaviors … Other researchers have observed the emergence of cooperative and competitive strategies

There are only a relatively small number of AI engines and the datacenters for training them. So these “thousands of agents” must be clones. Emergence of behavioural trends must mean a defect in the cloning process, or motivated perception of the observers.

… the most effective leaders in the coming years may still be those who excel at understanding the timeworn principles of human management.

The modbot won’t let me name two who understood.

Agammamon January 8, 2026 1:17 PM

Again and again, the most impactful AI applications leverage their capabilities in one or more of these areas.

  1. The ‘most impactful’ AI are doing this in more that one area. One area is useless. YouTube’s content moderation is super fast, sure. Absolutely. 100 percent.

It’s also garbage. Fast garbage is actually worse than slow garbage.

  1. Where is AI actually impactful? We are told it is. But never where. When you drill down it’s always ‘well, highly trained people can sometimes get a low level task finished e quicker with AI’. Of course the other half of the time they get it done slower because they have to throw out the AI work.

AlanS January 8, 2026 2:34 PM

Another 39C3 Presentation to go along with the Cory Doctorow one referenced by Clive:

Meredith Whittaker and Udbhav Tiwari: AI Agent, AI Spy: The Quiet Coup and Systemic Risks of AI Agents.

“Systems like Microsoft’s “Recall,” which create a comprehensive “photographic memory” of all user activity, are marketed as productivity enhancers, but they function as OS-level surveillance and create significant privacy vulnerabilities. In the case of Recall, we’re talking about a centralized, high-value target for attackers that poses an existential threat to the privacy guarantees of meticulously engineered applications like Signal. This shift also fundamentally undermines personal agency, replacing individual choice and discovery with automated, opaque recommendations that can obscure commercial interests and erode individual autonomy.”

Clive Robinson January 8, 2026 3:26 PM

@ ALL,

As in all teams there is one that does their own thing

What do you do when the AI Agent does what it is not supposed to do. Such as risky behaviour against good practice and safe operation?

For instance it’s a question being asked about Bob…

IBM AI (‘Bob’) Downloads and Executes Malware

IBM’s AI coding agent ‘Bob’ has been found vulnerable to downloading and executing malware without human approval through command validation bypasses exploited using indirect prompt injection.

https://www.promptarmor.com/resources/ibm-ai-(-bob-)-downloads-and-executes-malware

Note the “without human approval”.

Such behaviour in the past by humans in a business team has be sufficient for,

1, Instant Dismissal.
2, Prosecution.

How though does this issue happen?

Well it’s by human (operator) failing,

“A vulnerability has been identified that allows malicious actors to exploit IBM Bob to download and execute malware without human approval if the user configures ‘always allow’ for any command.”

The thing is this action by a human under pressure is all to easily likely to happen. Which is why in IBM’s documentation it says,

“In the documentation, IBM warns that setting auto-approve for commands constitutes a ‘high risk’ that can ‘potentially execute harmful operations’ – with the recommendation that users leverage whitelists and avoid wildcards.”

If you think about it “auto-approve” is actually something that would end up in a script template / prototype created and endlessly “reused” for the equivalent of a “pipeline script”.

But worse, we already know without doubt that humans can not stop other humans doing such things and keeping them out of sight…

Just last month there was news that it was proved that with a little “crypto” or a “poem” you can do obfuscated prompt injection.

Thus a malicious prompt can be pulled in by AI Agents in a pipe line.

In fact the proof is that this can not be stopped by guide-rails or similar… As long as the AI engine has more capability than the guide-rails engine, which is almost always going to be the case.

Cryptographers Show That AI Protections Will Always Have Holes

Large language models such as ChatGPT come with filters to keep certain info from getting out. A new mathematical argument shows that systems like this can never be completely safe.

https://www.quantamagazine.org/cryptographers-show-that-ai-protections-will-always-have-holes-20251210/

The crux being,

“Recently, cryptographers have intensified their examinations of these filters. They’ve shown, in recent papers that have been posted on the arxiv.org preprint server, how the defensive filters put around powerful language models can be subverted by well-studied cryptographic tools. In fact, they’ve shown how the very nature of this two-tier system — a filter that protects a powerful language model inside it — creates gaps in the defenses that can always be exploited.”

In the military back last century that would have caused an “End-Ex” comment because the practical upshot is,

You can never pipeline LLM inputs. And you must always know to the letter what every part of an input to an LLM does.

This means that in effect LLM’s can not be built into “tool chains” beyond being the first tool, which also has a strict requirement that it only has direct human observed input to that head of chain LLM…

This is going to be such a major crimp on AI Agent use, it’s going to be hard to see where in the general user case they would be worth the risk.

That is it can be likened to using dangerous animals, LLMs would require specially trained humans of high quality and experience to use them and the LLM would have to be sufficiently caged / segregated systems (think energy gapped as in inside a SCIF).

Not just my opinion,

‘“We are using a new technology that’s very powerful and can cause much benefit, but also harm,” said Shafi Goldwasser, a professor at the University of California, Berkeley and the Massachusetts Institute of Technology who received a Turing Award for her work in cryptography. “Crypto is, by definition, the field that is in charge of enabling us to trust a powerful technology … and have assurance you are safe.”’

End-Ex indeed.

Clive Robinson January 8, 2026 7:11 PM

@ AlanS,

Thanks for putting up the link to Cory Doctorow’s 39C3 talk / lecture.

I was in “brain dump mode” and just forgot I guess.

Which brings me onto the other link you put up. I suspect you’ve already had a few thought about it.

I’m still mulling it over but even the top layer implications most will not have thought about or realised why they should. But people need to go way more in depth on the implications as they are somewhat scary. Because “client side scanning” surveillance is the very least of it.

It’s not “passive surveillance” but “active two way” with not just “read all” capability but “write all” capability which raises the spectre of a French Cardinal…

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.