$HEADLESS SYSTEMS
$ cat /blog/more-agents-is-not-the-answer

More agents is not the answer | Headless Systems

Petr Pátek··10 min·systems

In February 2026, every major AI tool shipped multi-agent in the same two-week window. Grok Build launched with eight coordinated agents. Windsurf deployed five in parallel. Anthropic's Claude Code rolled out a delegating agent architecture. The message from the industry was clear: the future of AI is more agents, working together, on everything.

Then Google and MIT published a paper that tested 180 agent configurations across five architecture types. The findings should make every team rethinking their agent strategy pause. On sequential reasoning tasks, every multi-agent variant they tested degraded performance by 39% to 70%. Independent multi-agent systems amplified errors by 17.2x. The "tool-coordination trade-off" showed that as the number of tools grows past 16, multi-agent overhead increases disproportionately, eating the gains that coordination was supposed to provide.

The industry is solving the wrong problem. The bottleneck in AI agent systems is not agent intelligence or agent quantity. It is the quality of the APIs those agents consume.

What Google actually found

The paper, "Towards a Science of Scaling Agent Systems," is the first large-scale quantitative study of when and why multi-agent architectures help or hurt. The research team evaluated five canonical architectures: single-agent (one agent executing steps sequentially), independent (multiple agents in parallel, no communication), centralized (hub-and-spoke with an orchestrator), decentralized (peer-to-peer mesh), and hybrid (hierarchical oversight plus peer-to-peer).

Three effects dominated the results.

The tool-coordination trade-off. Under fixed computational budgets, tasks requiring many tools (16+) suffer disproportionately from multi-agent overhead. Every additional agent that needs to share tool context adds coordination cost. The more tools involved, the steeper the penalty.

Capability saturation. Once a single-agent baseline exceeds roughly 45% accuracy on a task, adding more agents yields diminishing or negative returns. Below that threshold, multi-agent coordination can help. Above it, the single agent is already capable enough that coordination overhead outweighs any gains.

Topology-dependent error amplification. Independent agents (no coordination) amplified errors by 17.2x compared to the single-agent baseline. Centralized coordination contained amplification to 4.4x. Decentralized peer-to-peer fell somewhere in between. Architecture choice isn't a preference. It's a reliability decision.

The performance split by task type was stark. On parallelizable tasks like financial reasoning, centralized coordination improved performance by 80.8%. On sequential reasoning tasks like planning, every multi-agent variant degraded performance by 39% to 70%. The researchers built a predictive model based on two task properties (sequential dependencies and tool density) that correctly identified the optimal architecture for 87% of unseen configurations.

The implication is clear: multi-agent is not a general-purpose upgrade. It's a task-specific tool with a narrow window of advantage.

The real bottleneck is the API, not the agent

Read Google's "tool-coordination trade-off" through an architecture lens and a different conclusion emerges. When agents interact with 16+ tools and performance degrades, the problem is not the agents. The problem is the API surface they consume.

Consider what "16+ tools" means in practice. It usually means a system with sprawling, overlapping endpoints, inconsistent authentication, redundant data structures, and implicit dependencies between calls. An agent navigating this surface has to maintain context across all of them, retry ambiguous responses, and guess at sequencing. Each additional tool is not just one more capability. It is one more source of coordination cost.

Clean, well-scoped APIs reduce the effective tool count naturally. A headless CRM that exposes five endpoints (contacts, deals, activities, pipelines, search) gives an agent a clear surface. A dashboard-first CRM that exposes 40+ endpoints across sales, marketing, admin, reporting, and UI state management forces the agent to navigate complexity that was designed for human workflows, not machine consumption.

Salesforce's 2026 Connectivity Report validates this at enterprise scale. Organizations now average 12 AI agents, with 67% projected growth. But 50% of those agents operate in isolated silos. And 96% of IT leaders say agent success depends on integration across systems. The agents aren't failing because they're unintelligent. They're failing because the systems they consume weren't built for them.

This is the distinction that matters. The industry frames the problem as "how do we orchestrate more agents?" The data says the better question is: "how do we build APIs that a single agent can consume effectively?"

Why 40% of agent projects will fail (and it is not the agents' fault)

Gartner predicts that over 40% of agentic AI projects will be cancelled by the end of 2027. The stated reasons are escalating costs, unclear business value, and inadequate risk controls. But trace those symptoms back to root causes and a pattern emerges: the underlying systems were not designed for machine consumption.

Agent projects typically start with a capable model, an orchestration framework, and an existing software stack. The model works. The framework works. Then the agent hits the real world: APIs that return HTML instead of structured data, authentication flows designed for browser sessions, pagination that requires stateful navigation, error messages meant for human eyes. The agent can reason about the task perfectly and still fail because the system it consumes was built for dashboards, not for machines.

Gartner also found that "agent washing" is rampant, with only about 130 of the thousands of agentic AI vendors offering genuine capabilities. This compounds the architecture problem. Teams deploy agents that cannot actually reason autonomously, pointed at systems that were never designed for programmatic consumption. The failure rate is predictable.

Meanwhile, Deloitte reports that up to half of organizations will put more than 50% of their digital transformation budgets toward AI automation in 2026. The autonomous agent market is projected to reach $8.5 billion this year and $35 billion by 2030. The money is there. The agent capability is there (and growing fast). What's missing is the infrastructure layer: APIs built for machines, not humans clicking through interfaces. Organizations are buying agents and pointing them at systems that were never designed to be consumed programmatically. The result is predictable, expensive failure.

A concrete example: the CRM agent

Abstract research becomes tangible with a specific scenario. A sales team wants an AI agent to manage their CRM pipeline. Create contacts from email, update deal stages based on call transcripts, flag stale deals, generate weekly summaries.

The dashboard-first CRM exposes its API as an afterthought. The contacts endpoint returns UI-formatted data with display names, avatar URLs, and layout metadata. Creating a deal requires four sequential calls: lookup the contact, get the pipeline ID, get the stage IDs, then create the deal with all three references. Error responses return HTML error pages. Authentication uses session tokens that expire unpredictably.

Map this to Google's findings. The agent needs 15+ tool definitions just to cover basic operations. Sequential dependencies are high (contact lookup before deal creation). Error amplification hits hard because each step in the chain can fail in ways the next step cannot recover from. This is exactly the configuration where multi-agent performance degrades by 39% to 70%.

The headless CRM was built API-first. Five clean endpoints return structured JSON. Creating a deal takes one call with an inline contact reference (the system handles resolution). Errors return machine-readable codes with retry hints. Authentication uses API keys with clear scoping.

Now the agent needs five tool definitions instead of 15+. Sequential dependencies drop to near zero (single-call operations). Error amplification stays low because each operation is self-contained. Google's predictive model would classify this as a low-tool-density, low-sequential-dependency task: exactly the configuration where even a single agent performs well.

The difference isn't the agent. It's the API.

| Property | Dashboard-first CRM | Headless CRM | |---|---|---| | Tool definitions needed | 15+ | 5 | | Sequential dependencies | High (multi-call chains) | Near zero (single-call ops) | | Error amplification risk | High (chain failures cascade) | Low (self-contained operations) | | Google's predicted outcome | -39% to -70% (sequential task) | Single agent performs well |

What the METR data adds

METR (Model Evaluation and Threat Research) has been tracking how long autonomous agents can work on tasks before failing. Their finding: the length of tasks that frontier agents complete autonomously with 50% reliability has been doubling approximately every seven months for the past six years. Post-2023, the rate has accelerated to roughly 4.3 months.

Extrapolate this trend two to four years and agents will independently handle tasks that currently take humans days or weeks. That's not speculation. It's the measured trajectory of six years of data across multiple model generations.

This acceleration makes the API quality question more urgent, not less. A more capable agent hitting a poorly designed API still fails. The failure mode just becomes more expensive: the agent reasons further, attempts more retries, explores more dead ends before giving up. Capability without a clean API surface is wasted compute.

Conversely, a more capable agent hitting a clean API surface compounds its advantages. Every improvement in reasoning ability translates directly to better task completion because the infrastructure isn't fighting the agent at every step.

This is the compounding effect that headless architecture enables. As agents improve (and they are improving on a measured exponential curve), the systems designed for them improve automatically. A CRM with five clean endpoints doesn't need to change when the agent consuming it gets twice as smart. The agent simply uses those endpoints more effectively, handles edge cases better, recovers from errors faster. The systems designed for dashboards stay exactly where they are, waiting for their next UI redesign.

Build the API, not the swarm

Three principles emerge from the research for anyone building systems that agents will consume.

Invest in API design before agent orchestration. Google's tool-coordination trade-off shows that tool count is a direct input to coordination cost. Reducing the number of tools an agent needs to accomplish a task (through clean endpoint design, thoughtful scoping, and self-contained operations) delivers more reliable gains than any orchestration framework. Five clean endpoints beat 40 messy ones, regardless of how many agents you deploy.

Use single agents for sequential workflows, multi-agent only for parallelizable ones. The research gives this concrete guidance: if step B depends entirely on step A, a single agent outperforms every multi-agent variant by 39% to 70%. Multi-agent coordination helps when work can be divided into independent chunks (financial analysis across multiple portfolios, document review across multiple sources). The 87% predictive accuracy of the two-variable model (sequential dependencies + tool density) means this isn't a judgment call. It's a measurable property of the task.

If you must coordinate multiple agents, centralize. Independent agents (no coordination) amplified errors by 17.2x. Centralized coordination (hub-and-spoke with an orchestrator) contained this to 4.4x. Decentralized peer-to-peer coordination fell in between. If your task genuinely benefits from multiple agents, an orchestrator that owns routing, context, and error handling isn't optional. It's the difference between 4.4x and 17.2x error growth.

The simplest path scales the farthest

The industry narrative in early 2026 is about more: more agents, more orchestration layers, more coordination frameworks, more complexity. Google's 180-configuration study points in the opposite direction. The most reliable path to scalable agent systems is not adding agents. It is reducing the friction between a single capable agent and the system it consumes.

That means building APIs that return structured data, not UI artifacts. Designing endpoints for machine workflows, not dashboard navigation. Scoping operations to be self-contained, so agents do not need to maintain state across chains of dependent calls.

This is, by definition, headless architecture. Build the backend for machines. Let the human interface be optional, added later if needed. As METR's data shows, the agents consuming your APIs will be twice as capable in seven months. If your API surface is clean, that capability translates directly to value. If it is not, you will be shopping for yet another orchestration framework, trying to solve an architecture problem with more agents.

The research is clear. The bottleneck was never the agent.