We scored 195 vendors on machine consumability. Here is the gap nobody is measuring.
If you build anything that calls other people’s software, you have hit this problem. When you ask “can an agent operate this vendor end to end?” there is no good place to look up the answer. There are analyst reports about market share. There are review sites that score user experience. There are Gartner quadrants for IT buyers. None of them score the API. None of them measure machine consumability: whether a software product can be controlled by a machine without falling back to a dashboard.
So we built that.
Today we are publishing The Headless Index: a public scorecard for how well software vendors can be consumed by AI agents. 195 vendors across 13 categories, ranked on five evidence-based sub-criteria plus an open-source agent readiness framework. Every score cites the raw evidence it was based on. The methodology is published and versioned. The data tree is open.
The machine consumability gap
API quality is the most under-measured attribute in enterprise software. Pricing gets compared on every analyst slide. Customer support gets rated on G2. UI polish gets nominated for design awards. The question that actually determines whether you can integrate a product into an agentic workflow, whether a machine can drive it without a human in the loop, has no consistent scoring.
When AI agents arrived, this gap became expensive. An engineer evaluating two CMSes can compare admin screenshots in five minutes. Evaluating which one an agent can operate programmatically takes hours of docs-reading per vendor, with no shared rubric to compare results against. Multiply by 13 categories, multiply by the dozen mainstream vendors per category, and the cost of due diligence becomes unmanageable. So nobody does it. Companies pick the vendor with the strongest brand and discover the API limitations after they have already shipped.
Partial measures exist. Cloudflare publishes an AI Agent Readiness score for sites under its CDN. Clirank ranks “agent-friendliness” by surface signals. Fern’s AI Readiness score covers vendors who use Fern for their docs. Each is useful for what it covers. None are vendor-independent third-party scorings of the full API surface across categories. None publish their rubric in a way that lets you reproduce the score from the same raw evidence.
The Headless Index is what happens when you treat that as a fixable problem.
What the Headless Index measures
195 software vendors across 13 categories: Content Management, Commerce, Auth, Search, Payments, Communications, Analytics, Observability, Storage, Feature Flags, Project Management, Workflow Automation, and AI Platforms. Each vendor gets a scorecard with two scores. A Headless Index score, 0 to 100, across five sub-criteria. And a JAIRF score, 0 to 100, across six agent readiness dimensions, when the vendor publishes a machine-readable OpenAPI spec.
Each scorecard is bound to a methodology version and ships with a full raw evidence trail: the OpenAPI probes that were tried, the GitHub repos scanned, the docs pages crawled, the calibration data fetched from external scorers. Every claim cites a field in that evidence file. There is no opinion without a citation.
You can read any scorecard in 30 seconds. You can read the methodology in 10 minutes. You can read the raw evidence file for any vendor and audit every score we assigned. If you think we got something wrong, the correction form is one click away, with a public changelog and a 7-day SLA.
The five sub-criteria
The Headless Index score rolls up five things that, together, decide whether an AI agent can drive a piece of software end to end. We did not invent these out of thin air. They fell out of looking at hundreds of integration projects and asking what fails first.
API-first posture. Does the vendor treat the API as the product, or as an afterthought to a dashboard-first business model? Signals: published OpenAPI or GraphQL surface, breadth of official SDKs, npm, pypi, maven, and rubygems download volumes, official documentation prioritizing programmatic flows.
Headless operability. Can the routine workflows that the dashboard supports (onboarding, billing, team management, configuration, content modeling) all be performed via API? Signals: documentation coverage of programmatic equivalents for every UI workflow, presence of CLI tooling, configuration-as-code support.
MCP and agent posture. Has the vendor invested in the agent ecosystem? Signals: an official MCP server published under the vendor’s name, presence in the MCP registry, dedicated AI, agents, and LLM-tooling docs sections, agent toolkit packages on npm or pypi.
Schema observability. Can a machine introspect the API to know what is available without reading prose docs? Signals: published OpenAPI specs (URL-fetchable, not just text mentions), GraphQL introspection support, schema files in the vendor’s GitHub.
Webhooks and events. Can the system push state changes outward? Signals: documented webhook reference page, HMAC signing scheme (SHA-256 preferred), retry vocabulary (idempotency, dead-letter queues, replay), declared event types, payload completeness.
Each sub-criterion is scored 0 to 20, summing to a 0 to 100 Headless Index total. A vendor that scores Unknown on a sub-criterion has it excluded from the denominator rather than penalized. We publish what we measured, not what we guessed.
JAIRF: the agent readiness layer for OpenAPI specs
When a vendor publishes a machine-readable OpenAPI specification, we run their spec through JAIRF, the Jentic API AI-Readiness Framework, an open-source rubric maintained by Jentic under Apache 2.0. JAIRF scores six dimensions of an OpenAPI spec:
- Foundational Compliance: is the spec structurally valid?
- Developer Experience and Tooling Compatibility: descriptions, response examples, operation tags?
- AI-Readiness and Agent Experience: operation summaries, parameter intent, error semantics?
- Agent Usability: explicit instructions about consumption?
- Security: well-defined security schemes?
- AI Discoverability: searchable, machine-classifiable surface?
JAIRF gives us an agent readiness layer that is rigorous and vendor-independent because we did not build it. Pinning the index to JAIRF v1.0.0 means our scoring improves automatically as Jentic ships JAIRF v1.1 and beyond. The industry does not need another rubric on top of an existing one.
Only a minority of vendors in the index currently publish an OpenAPI spec we can fetch. For the rest, JAIRF is reported as N/A and the Headless Index alone determines the band, with a ceiling rule that caps such vendors at B band absent strong override evidence.
From A to F: how the bands work
Every scorecard lands in a letter band from A to F. The bands follow rules from a published band table.
- A: Headless Index ≥ 75 and JAIRF ≥ 75. Strong fit for agent-driven use cases.
- B: Headless Index ≥ 60 and JAIRF ≥ 60 (or HI ≥ 60 if JAIRF is N/A). Workable end to end for agents, with edges to verify.
- C: Headless Index in the 40 to 75 range. Workable but expect to wrap missing pieces.
- D: Headless Index below 40 (or JAIRF below 40). Use only when locked in.
- F: Headless Index below 30 (or JAIRF below 30). Not currently suitable for agent consumption.
There is also a zero-floor rule. If any individual sub-criterion scores catastrophically low, published with high confidence as a structural absence rather than as Unknown, the overall band is capped at C regardless of the other scores. A vendor cannot earn an A while shipping no webhooks, even if every other criterion is perfect. Agents fail unevenly. One missing primitive can be the integration killer, and an aggregate score that hides that fact is misleading.
What the data shows
After 195 scorecards across 13 categories, a few patterns are clear.
Payments and Communications are the most mature. Stripe scored A with a Headless Index of 100 of 100 and JAIRF of 79.3 of 100, on the back of its public OpenAPI spec and category-leading SDK breadth. Resend scored A on the strength of its OpenAPI plus an official MCP server. Across both categories, every major vendor lands in B or C. The financial cost of agent-driven misintegration is high enough that vendors invested in API quality early.
AI platforms are not as agent-ready as you would expect. OpenAI lands at B band, 86 of 100 Headless Index. They publish an OpenAPI spec, ship SDKs in six languages, and the API is unambiguously the product. But they have not yet published their own MCP server, and several smaller LLM vendors (Mistral, Cohere, Groq) trail OpenAI in API surface breadth. Mistral and Cohere ship OpenAPI-compatible APIs but have less depth on schema observability and webhooks.
Headless CMS is uneven. Contentful and Cosmic both reached B. The category leaders that pre-dated the agent era (Strapi, Hygraph, Payload) generally land in C or D, dragged down by webhook gaps and missing MCP integration. The newer entrants who launched after the MCP standard appeared have moved faster.
Object storage is mostly S3-compatible at the protocol level. AWS S3 itself reached B. The only meaningful gap is no official MCP server. Cloudflare R2, Backblaze B2, Wasabi, DigitalOcean Spaces, Scaleway, and Vultr all benefit from the same well-documented protocol. Cloudinary and ImageKit, with their media-management layer on top of storage, also reach B on the strength of their SDK breadth.
Search and vector DBs are the most aggressive on MCP adoption. Algolia, Elasticsearch, Meilisearch, Typesense, Pinecone, Qdrant, Chroma, Weaviate, Milvus, and LanceDB nearly all publish official MCP servers under the vendor’s own GitHub organization. This category is moving fastest of all 13 on agent readiness.
The pattern: machine consumability tracks the maturity of the cost-of-failure model in each category. Categories where bad integrations cost money in observable ways (payments, comms, observability) score higher. Categories where bad integrations cost time but not transactions (analytics, project management, content authoring) score lower.
How to use The Headless Index
The Headless Index is live at headless-systems.com/the-headless-index. The full methodology v1 is published, including the pipeline, evidence-collection rules, and override policy. A correction form accepts submissions from vendors and readers. Corrections go to a public changelog with a 7-day SLA.
If you are a vendor and we got something wrong, file a correction. If you are an integrator and a vendor you depend on is missing from the index, name it and we will score it. If you disagree with the methodology, the rubric is versioned. Open a discussion and we will respond in the next methodology version.
The index refreshes quarterly. Bands move up when vendors invest in their API surface. They move down when vendors let webhook documentation rot or remove OpenAPI publishing. This is a living measurement, not a snapshot. The methodology will evolve. v1 today, v2 when the underlying landscape shifts enough to warrant it. Both versions will be reproducible.
The wider point on machine consumability
Most enterprise software still measures itself on metrics that mattered when humans were the only consumers. Pricing, UI polish, support response time. AI agents have changed who is consuming the software. The metrics have not caught up. Machine consumability is what the next generation of vendor scoring has to measure, and The Headless Index is what catching up looks like.