$HEADLESS SYSTEMS
03 / Scorecard / Observability

Promptfoo

F
Headless Index
25/100
denominator 60
JAIRF
N/A
Verified
MAY 21, 2026
Methodology v1 · JAIRF v1.0.0

Powered by JAIRF v1.0.0 by Jentic · open methodology at /the-headless-index/methodology

Editorial verdict
Promptfoo is not built for machine consumption today. The Headless Index thesis-fit score of 25/100 fails the floor checks of the index, and JAIRF is recorded as N/A for this vendor because no public OpenAPI specification was reachable for the open-source scorer. In practice, vendors at this tier are not built for machine consumption today: agents can poke at them, but the dashboard remains the source of truth, and the rest of this verdict explains where Promptfoo lands inside that pattern. On the API surface, the question is whether the API is the product or a layer beneath the dashboard. Promptfoo is open-source LLM evaluation framework. The product is consumed primarily through the CLI plus a YAML config (promptfooconfig.yaml). Promptfoo Cloud adds a hosted dashboard.[1] Schema observability is the related test: can an agent introspect the contract from cold, or does it have to read prose documentation to do so? Open-source under promptfoo/promptfoo. Configuration schema is documented.[2] Driving this product through an agent is not realistic with the current surface: the API exists, but it is not the contract the vendor optimises for. On headless operability: On headless operability, the docs crawl did not produce topic coverage sufficient to score programmatic setup, billing, teams, schema, or CLI workflows. A targeted AI review pass should visit the vendor's docs index and confirm what programmatic surfaces actually exist.[3] On the MCP and agent-integration axis, which is the fastest-moving criterion in the index: Promptfoo has been publicly thoughtful about MCP. Custom providers can wrap MCP servers as evaluation targets.[4] Event posture closes the loop: an agent that cannot react to state changes is reduced to polling. On webhooks and events, the docs crawler did not locate a webhooks reference page or events catalog. Editorial review should confirm whether the vendor publishes events at all, and if so whether signing and replay are documented. Net assessment: Promptfoo fails the floor checks of the methodology, with MCP posture[5] as the most acute gap. Any agent integration here will be brittle and short-lived until the vendor invests in machine-readable surfaces. Not currently suitable for agent consumption.
Verdict by Headless Index pipeline (auto)
// AI-drafted from the evidence layer. Editorial review pending.
Scores

Scorecard detail

Headless Index · 5 sub-criteria
API-first design intent5/20
scored

Promptfoo is open-source LLM evaluation framework. The product is consumed primarily through the CLI plus a YAML config (promptfooconfig.yaml). Promptfoo Cloud adds a hosted dashboard.

signals (4)
  • +AI review appliedReviewer: Editorial review on 2026-05-20
  • OpenAPI specNot found across 17 probe paths
  • ·GraphQL endpointDiscovered at https://www.promptfoo.dev/graphql, introspection disabled or scoped
  • SDKs maintainedNone detected in vendor org
cite (3)
  • openapi.probes_tried@2026-05-21
  • graphql.url@2026-05-21
  • github.sdks@2026-05-21
Headless operationUnknown
Unknown

Evaluations, datasets, prompts, and models are file-based and CLI-driven. The promptfoo CLI is the canonical interface. CI/CD integration through GitHub Actions.

signals (9)
  • +AI review appliedReviewer: Editorial review on 2026-05-20
  • API operations exposedNo OpenAPI spec; operations count unknown
  • ·Docs pages crawled0 pages (crawler: none)
  • ·Auth schemes documentedAuth documentation page not reached by crawler
  • ·Setup / quickstart docsNot reached by crawler
  • ·Billing docsNot reached by crawler
  • ·Teams / org docsNot reached by crawler
  • ·CLI docsNot reached by crawler
  • ·Schema / data model docsNot reached by crawler
cite (8)
  • openapi.operations_count@2026-05-21
  • docs.pages_crawled@2026-05-21
  • docs.pages_crawled@2026-05-21
  • docs.topics_found.setup@2026-05-21
  • docs.topics_found.billing@2026-05-21
  • docs.topics_found.teams@2026-05-21
  • docs.topics_found.cli@2026-05-21
  • docs.topics_found.schema@2026-05-21
MCP & agent posture0/20
scored

Promptfoo has been publicly thoughtful about MCP. Custom providers can wrap MCP servers as evaluation targets.

signals (4)
  • +AI review appliedReviewer: Editorial review on 2026-05-20
  • Official MCP serverNone found in vendor's GitHub org or the official MCP registry
  • Community MCP serversNone found
  • Agent-friendly SDKsNo TypeScript/JavaScript SDK published (agents commonly run in TS/JS)
cite (3)
  • mcp.registry_query@2026-05-21
  • mcp.github_search_query@2026-05-21
  • github.sdks@2026-05-21
Schema observability10/20
scored

Open-source under promptfoo/promptfoo. Configuration schema is documented.

signals (3)
  • +AI review appliedReviewer: Editorial review on 2026-05-20
  • OpenAPINot discovered across 17 standard probe paths
  • ·GraphQL introspectionGraphQL endpoint at https://www.promptfoo.dev/graphql but introspection is disabled, scoped, or behind authentication
cite (2)
  • openapi.probes_tried@2026-05-21
  • graphql.url@2026-05-21
Webhooks & eventsUnknown
Unknown

Evaluation completion webhooks via the Cloud product. Catalog matches CI-driven LLM testing.

signals (2)
  • +AI review appliedReviewer: Editorial review on 2026-05-20
  • ·Webhook docs pageNot reached by crawler within budget (0 pages crawled). Cannot confirm whether vendor offers webhooks.
cite (1)
  • docs.pages_crawled@2026-05-21
JAIRF · 6 dimensions
JAIRF · N/A

This vendor does not publish a public OpenAPI specification. JAIRF cannot be computed. The Headless Index score and editorial verdict carry the readiness assessment.

No public OpenAPI specification discovered during collection

Powered by JAIRF v1.0.0 by Jentic

Band rationale:F band triggered: HeadlessIndex=25

04 / Embed

Show Promptfoo's score on your site.

Drop a live badge into your README, footer, or marketing page. It updates automatically when we re-score, and every embed is a dofollow link back here.