Is Promptfoo agent-ready?

Promptfoo scores Band F on The Headless Index. Its JAIRF rating is not available because no machine-readable spec was scored. Its strongest area for agent use is schema observability; its weakest is MCP and agent posture.

Does Promptfoo publish an OpenAPI spec?

No published OpenAPI specification was found when Promptfoo was scored. That caps its JAIRF dimensions and forces agents to rely on documentation or reverse engineering to operate it.

What is Promptfoo's Headless Index score?

Promptfoo scores 25/100 on the Headless Index thesis-fit rubric and sits in Band F in the Observability category. The score weighs API-first design, headless operation, MCP and agent posture, schema observability, and webhooks.

03 / Scorecard / Observability

Promptfoo

Headless Index

25/100

denominator 60

JAIRF

N/A

Verified

MAY 21, 2026

Methodology v1 · JAIRF v1.0.0

Promptfoo earns Band F in the Observability category of The Headless Index, with a thesis-fit score of 25/100. Its strongest dimension is schema observability (10/20); its weakest scored dimension is MCP and agent posture (0/20). No public OpenAPI specification was found when Promptfoo was scored, which limits how far an agent can go without human integration work.

Editorial verdict

Promptfoo is not built for machine consumption today. The Headless Index thesis-fit score of 25/100 fails the floor checks of the index, and JAIRF is recorded as N/A for this vendor because no public OpenAPI specification was reachable for the open-source scorer. In practice, vendors at this tier are not built for machine consumption today: agents can poke at them, but the dashboard remains the source of truth, and the rest of this verdict explains where Promptfoo lands inside that pattern. On the API surface, the question is whether the API is the product or a layer beneath the dashboard. Promptfoo is open-source LLM evaluation framework. The product is consumed primarily through the CLI plus a YAML config (promptfooconfig.yaml). Promptfoo Cloud adds a hosted dashboard.^[1] Schema observability is the related test: can an agent introspect the contract from cold, or does it have to read prose documentation to do so? Open-source under promptfoo/promptfoo. Configuration schema is documented.^[2] Driving this product through an agent is not realistic with the current surface: the API exists, but it is not the contract the vendor optimises for. On headless operability: On headless operability, the docs crawl did not produce topic coverage sufficient to score programmatic setup, billing, teams, schema, or CLI workflows. A targeted AI review pass should visit the vendor's docs index and confirm what programmatic surfaces actually exist.^[3] On the MCP and agent-integration axis, which is the fastest-moving criterion in the index: Promptfoo has been publicly thoughtful about MCP. Custom providers can wrap MCP servers as evaluation targets.^[4] Event posture closes the loop: an agent that cannot react to state changes is reduced to polling. On webhooks and events, the docs crawler did not locate a webhooks reference page or events catalog. Editorial review should confirm whether the vendor publishes events at all, and if so whether signing and replay are documented. Net assessment: Promptfoo fails the floor checks of the methodology, with MCP posture^[5] as the most acute gap. Any agent integration here will be brittle and short-lived until the vendor invests in machine-readable surfaces. Not currently suitable for agent consumption.

Verdict by Headless Index pipeline (auto)

// AI-drafted from the evidence layer. Editorial review pending.

Scores

Scorecard detail

Headless Index · 5 sub-criteria

API-first design intent5/20

scored

Promptfoo is open-source LLM evaluation framework. The product is consumed primarily through the CLI plus a YAML config (promptfooconfig.yaml). Promptfoo Cloud adds a hosted dashboard.

signals (4)

+AI review appliedReviewer: Editorial review on 2026-05-20
−OpenAPI specNot found across 17 probe paths
·GraphQL endpointDiscovered at https://www.promptfoo.dev/graphql, introspection disabled or scoped
−SDKs maintainedNone detected in vendor org

cite (3)

openapi.probes_tried@2026-05-21
graphql.url@2026-05-21
github.sdks@2026-05-21

Headless operationUnknown

Unknown

Evaluations, datasets, prompts, and models are file-based and CLI-driven. The promptfoo CLI is the canonical interface. CI/CD integration through GitHub Actions.

signals (9)

+AI review appliedReviewer: Editorial review on 2026-05-20
−API operations exposedNo OpenAPI spec; operations count unknown
·Docs pages crawled0 pages (crawler: none)
·Auth schemes documentedAuth documentation page not reached by crawler
·Setup / quickstart docsNot reached by crawler
·Billing docsNot reached by crawler
·Teams / org docsNot reached by crawler
·CLI docsNot reached by crawler
·Schema / data model docsNot reached by crawler

cite (8)

openapi.operations_count@2026-05-21
docs.pages_crawled@2026-05-21
docs.pages_crawled@2026-05-21
docs.topics_found.setup@2026-05-21
docs.topics_found.billing@2026-05-21
docs.topics_found.teams@2026-05-21
docs.topics_found.cli@2026-05-21
docs.topics_found.schema@2026-05-21

MCP & agent posture0/20

scored

Promptfoo has been publicly thoughtful about MCP. Custom providers can wrap MCP servers as evaluation targets.

signals (4)

+AI review appliedReviewer: Editorial review on 2026-05-20
−Official MCP serverNone found in vendor's GitHub org or the official MCP registry
−Community MCP serversNone found
−Agent-friendly SDKsNo TypeScript/JavaScript SDK published (agents commonly run in TS/JS)

cite (3)

mcp.registry_query@2026-05-21
mcp.github_search_query@2026-05-21
github.sdks@2026-05-21

Schema observability10/20

scored

Open-source under promptfoo/promptfoo. Configuration schema is documented.

signals (3)

+AI review appliedReviewer: Editorial review on 2026-05-20
−OpenAPINot discovered across 17 standard probe paths
·GraphQL introspectionGraphQL endpoint at https://www.promptfoo.dev/graphql but introspection is disabled, scoped, or behind authentication

cite (2)

openapi.probes_tried@2026-05-21
graphql.url@2026-05-21

Webhooks & eventsUnknown

Unknown

Evaluation completion webhooks via the Cloud product. Catalog matches CI-driven LLM testing.

signals (2)

+AI review appliedReviewer: Editorial review on 2026-05-20
·Webhook docs pageNot reached by crawler within budget (0 pages crawled). Cannot confirm whether vendor offers webhooks.

cite (1)

docs.pages_crawled@2026-05-21

JAIRF · 6 dimensions

JAIRF · N/A

This vendor does not publish a public OpenAPI specification. JAIRF cannot be computed. The Headless Index score and editorial verdict carry the readiness assessment.

No public OpenAPI specification discovered during collection

Band rationale:F band triggered: HeadlessIndex=25

04 / Embed

Show Promptfoo's score on your site.

Drop a live badge into your README, footer, or marketing page. It updates automatically when we re-score, and every embed is a dofollow link back here.

Get embed code

Peers in Observability

Patronus AIF

Headless Index 25/100

PapertrailF

Headless Index 25/100

ObserveF

Headless Index 25/100

See full Observability ranking →

FAQ

Promptfoo and agent readiness

Is Promptfoo agent-ready?: Promptfoo scores Band F on The Headless Index. Its JAIRF rating is not available because no machine-readable spec was scored. Its strongest area for agent use is schema observability; its weakest is MCP and agent posture.
Does Promptfoo publish an OpenAPI spec?: No published OpenAPI specification was found when Promptfoo was scored. That caps its JAIRF dimensions and forces agents to rely on documentation or reverse engineering to operate it.
What is Promptfoo's Headless Index score?: Promptfoo scores 25/100 on the Headless Index thesis-fit rubric and sits in Band F in the Observability category. The score weighs API-first design, headless operation, MCP and agent posture, schema observability, and webhooks.