On this page
- Abstract
- 1. Status of this document
- 2. Conformance language
- 3. Naming and terminology
- 4. Scope
- 5. Architecture overview
- 6. Tier system
- 7. Score formula
- 8. Level ladder
- 9. Profile system
- 10. Check definitions
- 11. Output schema
- 12. Conformance
- 13. Implementation guidance
- 14. Versioning policy
- 15. Maintainers
- 16. Acknowledgments
- 17. References
- Annex A — Threshold values
- Annex B — Output JSON Schema
- Citation
Abstract
The Agent-Adoption Specification defines an open methodology for measuring how accessible a website is to AI agents and automated HTTP clients. It specifies a list of 25 measurable HTTP-level practices, a tier-based scoring system that aggregates results into a 0–100 score, a three-tier level ladder (L1 / L2 / L3), a profile-configuration system for adapting the methodology to different site categories, and a canonical JSON output schema. The specification is implementation-agnostic; conforming scanners may use any HTTP client, programming language, or runtime.
1. Status of this document
This document is the version 1.0.0 release of the Agent-Adoption Specification. It is published as an open specification at the canonical URL above and at the source repository, and is licensed under CC-BY 4.0.
This specification describes the measurement methodology used in the companion Agent-Adoption Correlation Study. Empirical findings about which signals predict LLM visibility outcomes are reported in that companion study; this specification is descriptive (defining what a conforming scanner measures and how) rather than predictive.
A v2.0 update is planned for approximately Q3 2026, calibrated against findings from quarterly correlation re-runs. v2 will recalibrate tier weights and may revise check definitions based on accumulated empirical evidence. v1.0 and v2.0 will both remain accessible at their respective version-suffixed URLs; tagged releases are immutable.
Editorial fixes (typos, clarifications without behavior change) and threshold-value calibrations land as patches (v1.0.x). Additive changes (new optional checks, new profiles) land as minor releases (v1.x.0). Breaking changes land in major releases (vN.0.0). See §14 for full versioning policy.
2. Conformance language
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in BCP 14 (RFC 2119 and RFC 8174) when, and only when, they appear in all capitals, as shown here.
Conformance requirements are stated in the Conformance section (§12) and inline within individual check definitions (§10). Implementations conforming to v1.0 of this specification MUST disclose the spec version they conform to in their output (see §11).
3. Naming and terminology
This specification uses three layered terms with distinct register and meaning. Implementations and downstream content MUST NOT mix or substitute these terms.
The Specification defines the rules. The Check is the act of measuring. The score is the value produced.
Other terms used throughout this document:
- Implementer — a party building a Check that conforms to this Specification.
- Scanner — a software implementation of an Agent-Adoption Check.
- Brand — the website being measured. The Specification operates on a single root domain (eTLD+1) per scan.
- Profile — a configuration that adapts the Specification to a particular category of sites (e.g., B2B SaaS, e-commerce). See §9.
- Check (lowercase) — one of 25 individual measurements defined in §10. Distinguished from the capitalized “Check” tool above by context.
- Tier — one of four weight classes (Critical, High, Medium, Low) assigned to scored checks. See §6.
- Level — one of three published readiness levels (L1 Basic Web Presence, L2 AI-Aware, L3 Agent-Optimized). See §8.
4. Scope
4.1 In scope
This specification defines:
- HTTP-level practices that conforming scanners verify by fetching public URLs and parsing responses (robots.txt, sitemap.xml, llms.txt, OAuth discovery endpoints, MCP server cards, AGENTS.md, and others enumerated in §10).
- A scoring methodology that aggregates check results into a 0–100 score using a tier-weighted formula (§7).
- A level ladder with explicit gate criteria for L1, L2, and L3 (§8).
- A profile-configuration model that adapts the methodology to different site categories (§9).
- A canonical JSON output schema for scanner results (§11, Annex B).
- A conformance contract for implementations (§12).
4.2 Out of scope
This specification does NOT define:
- LLM ranking algorithms. The internal mechanisms by which large language models or other AI systems rank, cite, or recommend brands are not publicly documented and are not in scope.
- Causal claims about agent-readiness and AI visibility. Empirical findings about correlation between Agent-Adoption Specification signals and LLM-mention outcomes are reported in the companion correlation study. This specification is descriptive (what a scanner measures) and does not claim that improving scores causally improves AI visibility. See §13 for the Defensive-Framing Principle.
- Site-content quality. This specification measures structural agent-accessibility, not editorial quality. A site can score L3 and still produce thin content; a site can be L1 and produce excellent content.
- Specific implementation technologies. Implementations MAY use any programming language, HTTP client, or runtime environment. The specification does not endorse or require specific tooling.
- UI/UX conventions for presenting scanner results to humans. Implementations are free to design their result presentation.
- Pricing or commercial terms of any specific implementation.
4.3 Relationship to traditional SEO
Several Agent-Adoption Specification checks (e.g., robots-txt-exists, sitemap-exists, redirect-behavior, http-status-codes) overlap with established Search Engine Optimization (SEO) practices. This overlap is intentional but coincidental: web infrastructure that supports traditional crawlers (Googlebot, Bingbot, etc.) shares primitives with infrastructure that supports AI agents and other automated HTTP clients.
The Agent-Adoption Specification is not a Search Engine Optimization methodology and does not address SEO concerns directly. Implementations of this specification SHOULD NOT be marketed as SEO tools, nor SHOULD compliance with this specification be conflated with traditional search ranking factors. The two domains share substrate but pursue different goals.
5. Architecture overview
The specification operates on the following architecture:
┌─────────────────┐ ┌──────────────────┐ ┌────────────────┐
│ Brand domain │ → │ Conforming │ → │ Output JSON │
│ (e.g. site.com)│ │ scanner │ │ per §11 │
└─────────────────┘ │ │ └────────────────┘
│ - 25 checks │
│ - Tier weights │
│ - Score formula │
│ - Level gates │
│ - Profile cfg │
└──────────────────┘Each scan operates on one root domain (eTLD+1) and produces one output JSON document. The scanner:
- Loads a profile configuration (§9) which determines which checks apply, their tier weights, and gate criteria.
- Executes each applicable check (§10) against the brand domain, producing a per-check status of
pass,fail,neutral, orerror. - Computes an aggregate score in 0–100 using the formula in §7.
- Computes a level (L1, L2, or L3) using the gate criteria in §8.
- Emits a JSON output document conforming to the schema in §11.
Checks are organized into four categories:
- Discoverability — does the site declare its structure to crawlers? (3 checks)
- Access Control — does the site declare its policy on AI access? (4 checks)
- Content Readability — can agents consume the site's content? (12 checks)
- Agent Endpoints — does the site expose specialized agent interfaces? (6 checks)
Total: 25 checks. See §10 for individual definitions.
6. Tier system
Each check is assigned to one of five classes that determine its contribution to the score.
The first four classes (Critical, High, Medium, Low) are collectively called the scored tiers. The fifth class (Informational) does not affect the score; informational checks are detected and reported in scanner output but contribute zero to both numerator and denominator of the score formula.
6.1 Why informational checks exist
A check is designated Informational when one of the following applies:
- Empirical evidence indicates the signal does not predict relevant outcomes. Example: the
llms-txt-*family. The companion correlation study and external server-log audits indicate that major LLM crawlers do not fetch llms.txt files at meaningful rates as of 2026. - Empirical evidence indicates the signal is anti-correlated with desired outcomes. Example:
agents-md-detection. Peer-reviewed research (ETH Zurich, March 2026) finds that AGENTS.md presence often degrades agent task success. - The underlying specification is too pre-adoption to score meaningfully. Example:
agent-skills(specification at v0.2.0 pre-1.0; ecosystem adoption near zero).
Informational checks are nonetheless detected and reported because (a) site owners benefit from observability, (b) longitudinal correlation tracking requires the data, and (c) future spec versions may promote informational checks to scored tiers if empirical evidence changes.
6.2 Tier assignments are spec-normative
The tier assignment for each check is defined in §10 and is normative. Conforming implementations MUST apply the specified tier to each check, except when the implementation publishes a profile configuration that overrides specific tiers (see §9). Profile-level overrides MUST be disclosed in implementation documentation; silent tier override is non-conforming.
7. Score formula
For each scan, the agent-adoption score is computed as follows.
7.1 Definitions
For a given scan against a brand domain, with profile P determining the applicable check set:
- Let S = the set of scored checks applicable under profile P (i.e., not Informational, and applicable per profile skip logic)
- Let w(c) = the tier weight of check c (10 / 7 / 4 / 2 for Critical / High / Medium / Low)
- For each c ∈ S, let status(c) ∈ {pass, fail, neutral, error}
7.2 Status contributions
Informational checks contribute 0 to both numerator and denominator regardless of status.
7.3 Formula
Σ { w(c) : c ∈ S, status(c) = pass }
score = ───────────────────────────────────────── × 100
Σ { w(c) : c ∈ S, status(c) ∈ {pass, fail} }The score is rounded to the nearest integer in 0–100.
7.4 Status semantics
pass— the check's positive condition is met.fail— the check's positive condition is not met, and the scanner is confident in its determination.neutral— the check is structurally inapplicable to this site (e.g., a commerce check on a non-commerce site) or the scanner could not reach a determination due to circumstances beyond the site's control (e.g., transient network failure, geographic routing restriction). A neutral status indicates “we cannot or should not score this check”; it does not penalize the site.error— the scanner encountered an internal error while attempting the check. Treated identically toneutralfor scoring purposes; distinct status code is preserved for diagnostic and operational visibility.
7.5 Skip logic
A check is skipped by the profile when the profile configuration declares the check inapplicable to the site type. For example, the B2B SaaS profile includes commerce-related checks but skips them when the site is determined to be non-commerce; an e-commerce profile (planned in a future version) would always apply commerce checks.
Skip is a profile-level concept; neutral is a scan-run-level concept. Both effectively exclude a check from scoring, but they are semantically distinct: skip means “this check does not apply to this site type at all”; neutral means “this check applies in principle but we cannot determine the result for this specific scan.”
7.6 Score interpretation
The score represents the percentage of available scored-check weight that the site's measured practices satisfy. A score of 47 means the site passes scored checks worth 47% of the total weight applicable to the site.
Implementations MUST NOT present the score in any way that implies it predicts a quantitative AI-visibility outcome. See §13 for the Defensive-Framing Principle.
8. Level ladder
The specification defines three published levels.
8.1 Default level
A site that is reachable by the scanner (i.e., robots-txt-exists returns pass, fail, or neutral — i.e., the site exists) but does not satisfy any higher-level gate criterion is at L1.
8.2 L1 → L2 gate
A site advances from L1 to L2 if and only if content-signals returns pass.
This is a single-gate criterion: no score-floor requirement, no additional checks. The semantic anchor: L2 marks sites that have explicitly declared AI usage policy via either Cloudflare's Content-Signal directive or the IETF AIPREF Content-Usage standard. This is the minimum threshold for the site posture characterized by “AI-Aware.”
8.3 L2 → L3 gate
A site advances from L2 to L3 if and only if markdown-negotiation returns pass.
This is a single-gate criterion: no score-floor requirement, no additional checks. The semantic anchor: L3 marks sites that serve agent-readable markdown via HTTP content-negotiation (Accept: text/markdown). This gate signals the site has implemented at least one explicit agent-consumption surface beyond traditional crawler hygiene.
The L2→L3 gate corresponds to the strongest empirically-validated signal in the v1 reference correlation study (companion preprint, 2026-04). Its empirical grounding is rare among gate criteria and is noted here so future maintainers preserve this rationale.
8.4 L4 and beyond — reserved
L4 and higher levels are reserved for future evidence-based promotion. As of v1.0, no captured scan in the reference corpus reached a level beyond L3, and the specification authors lack the empirical basis to define a meaningful gate criterion for L4+.
Conforming implementations MUST NOT promote sites to L4 or higher in v1.0 output. Implementations MAY display “L3 — beyond current measurement” or similar copy in user interfaces to indicate that the ladder has more rungs in principle but none defined yet at this specification version.
A future spec version (v1.1 or later) MAY define an L4 gate when sufficient empirical evidence exists.
8.5 Level computation order
The level is computed by checking gate criteria in ascending order. A site advances through gates monotonically: it cannot reach L3 without first satisfying the L1→L2 gate.
L1 (default) → if content-signals = pass → L2 → if markdown-negotiation = pass → L3
Implementations MUST NOT declare L3 status without first verifying both gates are satisfied.
9. Profile system
A profile is a configuration that adapts the specification to a particular category of sites. The B2B SaaS profile is bundled with v1.0 of the specification.
9.1 Profile responsibilities
A profile defines:
- Applicability — which of the specification's 25 checks apply to this site category. Unapplicable checks are skipped in scoring.
- Tier weights per applicable check — typically the canonical tier from §10, but profiles MAY override.
- Gate criteria for level advancement — typically the canonical gates from §8, but profiles MAY override.
- Skip logic — when does a check return
neutraldue to non-applicability versus actually being measured? (Example: commerce checks for non-commerce sites.)
9.2 Profile distribution
Profiles ship as JSON configuration files in this specification's repository under profiles/. The B2B SaaS profile lives at profiles/b2b-saas.json. Future profiles (e-commerce, media, government, etc.) will ship as siblings.
Implementers MAY ship custom profiles. Conforming implementations MUST be able to load and apply any specification-bundled profile and MUST use a specification-bundled profile when one matches the scanned site's category. Implementations MAY offer profile selection logic (e.g., infer from leaderboard category labels, or accept user input).
9.3 The B2B SaaS profile
The first bundled profile is b2b-saas, which is also the default profile for v1.0 of the specification. It applies to commercial software-as-a-service businesses targeting business customers (the dominant category of brands in the reference correlation corpus).
The B2B SaaS profile applies all 25 specification checks with their canonical tier assignments and uses canonical L1→L2 and L2→L3 gates. The profile defines no skip-logic rules for v1.0 — all 25 checks are universally applicable within the B2B SaaS category. Future profiles (e-commerce, media, etc.) are anticipated to use the skip-logic mechanism more actively to handle category-specific check applicability.
9.4 Future profiles
Future profile bundles are anticipated for at least:
- E-commerce — commerce-checks active, additional product-discovery checks (potentially)
- Media and news — content-sampling adjusted for article-type content; freshness checks
- Government — accessibility-mandated overlay; restricted commerce
- Education — LMS-style content patterns; content-licensing distinct from B2B SaaS
These are not v1.0 deliverables. The specification's profile system is designed to accommodate them without breaking changes.
10. Check definitions
The specification defines 25 checks across 4 categories. For each check, this section provides:
- Identifier — the canonical check ID, in lowercase-with-hyphens. The ID is used in output JSON, in profile configurations, and in citations.
- Tier — one of Critical / High / Medium / Low / Informational (§6).
- Description — what the check measures (semantic level).
- Pass criterion — what conditions in the response cause the check to return
pass. - Fail criterion — what conditions cause
fail. - Neutral criterion — what conditions cause
neutral. - Notes — implementation guidance, edge cases, references to authoritative documents.
Threshold values (byte sizes, redirect counts, etc.) appear in Annex A and MAY be revised in patch releases (v1.0.x) without bumping the minor or major version.
10.1 Discoverability
Three checks. Does the site declare its structure to crawlers in a parseable way?
robots-txt-exists
Tier: High.
Description: Verifies the site publishes a parseable robots.txt at the conventional path.
Pass: GET request to /robots.txt returns HTTP 200 with a body containing at least one User-agent directive.
Fail: HTTP 404, malformed body (no parseable User-agent directive), or empty body.
Neutral: HTTP 5xx, network timeout, or scanner-side network error.
Notes: The empty-body case is treated as fail because robots.txt with zero directives is technically RFC-valid but operationally useless to agents. Conformance is to RFC 9309 parse semantics; the specification does not require the file to be syntactically perfect, only that it contains at least one parseable directive.
sitemap-exists
Tier: Medium.
Description: Verifies the site publishes a valid XML sitemap reachable from a conventional location.
Pass: A reachable sitemap with valid structure (root element is urlset per sitemaps.org or sitemapindex for index files) is found at one of the standard locations.
Fail: No sitemap is reachable at any standard location.
Neutral: All probed paths return 5xx or network errors.
Notes: Standard locations include /sitemap.xml, /sitemap_index.xml, and any path declared via the Sitemap: directive in robots.txt (if robots-txt-exists passed). First-found-wins; the scanner does not need to enumerate all sitemaps.
link-headers
Tier: Low.
Description: Verifies the site emits HTTP Link headers with at least one relation type that could plausibly support agent discovery (i.e., not purely browser-resource-loading hints).
Pass: GET request to the site's root URL returns at least one Link header with a rel-value NOT in the noise-rel blocklist enumerated in Annex A.
Fail: No Link headers, or Link headers contain only rel-values from the noise-rel blocklist (browser-resource hints like preload, prefetch, stylesheet, icon, etc.).
Neutral: Network error or homepage unreachable.
Notes: The check uses a blocklist approach (any rel-value outside the noise set is accepted) rather than an explicit allowlist. This is intentionally permissive: emerging agent-relevant rel-values do not need to wait for a spec-version bump before scanners count them. The noise-rel blocklist is enumerated in Annex A and is expandable in patch releases as new browser-resource-loading rel-values appear in the IANA Link Relations registry.
10.2 Access Control
Four checks measuring how the site signals AI-access policy. Three are scored; one is informational.
ai-bot-rules
Tier: Critical.
Description: Verifies the site has declared an explicit policy regarding AI-bot access via robots.txt.
Pass: Either (a) robots.txt contains at least one User-agent stanza naming a known AI-bot crawler (canonical list in Annex A), or (b) robots.txt contains a Content-Signal directive (Content-Signal: header line) declaring an AI-usage policy.
Fail: robots.txt is reachable but contains neither AI-specific User-agent stanzas nor Content-Signal directives.
Neutral: robots.txt is unreachable (per robots-txt-exists neutral path).
Notes: The “either of two paths” pass criterion reflects the reality that two competing conventions are in active use as of 2026: per-bot User-agent stanzas (the older approach) and Content-Signals (the newer convention). Either signals deliberate AI-policy declaration. The canonical AI-bot list is enumerated in Annex A and is updated as new AI bots register their crawler User-Agent strings.
content-signals
Tier: Critical.
Description: Verifies the site declares its AI-usage policy via the Content-Signals convention. This check is the L1→L2 gate (§8.2).
Pass: robots.txt contains either (a) Cloudflare's Content-Signal directive (Content-Signal: header line) or (b) the IETF AIPREF Content-Usage directive, in either case well-formed and declaring a parseable policy.
Fail: robots.txt is reachable but contains neither directive.
Neutral: robots.txt is unreachable.
Notes: Content-Signals detection in v1 is robots.txt-based. HTTP-header-based mechanisms (e.g., Permissions-Policy: ai=()) and HTML-meta-based mechanisms (e.g., <meta name="content-signals">) are not detected by v1.0 conforming scanners; detection of these alternative carriers may be added in v1.1.
web-bot-auth
Tier: Medium.
Description: Verifies the site has implemented passive support for the Web Bot Auth specification (cryptographic identity claims for automated HTTP clients).
Pass: Site response includes Web Bot Auth presence indicators per the draft architecture document — typically an Authorization: challenge on a representative request, or a published signing-key directory URL referenced from any well-known endpoint.
Fail: No Web Bot Auth presence indicators detected.
Neutral: Network error.
Notes: v1.0 detection is passive (the scanner does not perform cryptographic verification). Cryptographic verification is OPTIONAL in v1.0 and is anticipated to become RECOMMENDED or REQUIRED in v2.0+ as the spec finalizes and adoption rises. The Medium tier reflects the spec's pre-adoption status as of 2026; this tier is anticipated to revise as the IETF draft matures and adoption rises.
robots-allow-all
Tier: Informational (weight = 0).
Description: Reports whether the site declares blanket-allow access in robots.txt for the wildcard user-agent (User-agent: *).
Pass: robots.txt contains a User-agent: * stanza with Allow: / (or no Disallow rules).
Fail: robots.txt's wildcard stanza contains any Disallow rule.
Neutral: robots.txt is unreachable.
Notes: This check observes the site's stated stance on universal crawler access. It is informational because such a stance does not, by itself, imply alignment with agent-friendly practice (the site might allow crawlers but block agent fetches via WAF). The check is detected for observability and for longitudinal correlation tracking; it does not affect score.
10.3 Content Readability
Twelve checks. Can agents read and consume the site's content?
llms-txt-exists
Tier: Informational (weight = 0).
Description: Reports whether the site publishes an llms.txt file at the conventional path.
Pass: GET request to /llms.txt (or alternatively /docs/llms.txt) returns HTTP 200 with non-empty content.
Fail: HTTP 404 or empty body at all probed paths.
Neutral: HTTP 5xx or network errors.
Notes: The llms.txt convention is documented at llmstxt.org. v1.0 designates the llms.txt family as informational based on empirical evidence that major LLM crawlers do not currently fetch llms.txt files at meaningful rates (server-log audits and the companion correlation study). The check is detected for observability and to enable longitudinal study of whether LLM consumption of llms.txt changes in future periods.
llms-txt-valid
Tier: Informational (weight = 0).
Description: Reports whether the site's llms.txt content conforms to the structural conventions described at llmstxt.org.
Pass: llms.txt is well-formed: contains an H1 title, a blockquote summary, at least one H2 section, and link-list structure inside sections.
Fail: llms.txt exists but one or more of the structural elements is missing or malformed.
Neutral: llms.txt does not exist (per llms-txt-exists fail path).
Notes: Validation strictness for v1.0 follows the llmstxt.org canonical structure. Implementations MAY use looser parsing rules and surface this as evidence; conforming pass/fail must use the strict definition.
llms-txt-size
Tier: Informational (weight = 0).
Description: Reports the size of the site's llms.txt file relative to thresholds suitable for typical agent context windows.
Pass: llms.txt content size is at or below the warn threshold.
Warn (reported as pass with caveat in evidence): content size between warn and fail thresholds.
Fail: content size exceeds the fail threshold; agents may truncate.
Threshold values are defined in Annex A.
Neutral: llms.txt does not exist.
Notes: This check captures whether the file is sized for downstream agent consumption. The Informational tier reflects empirical evidence of low real-world LLM fetching of llms.txt regardless of size.
llms-txt-has-optional-section
Tier: Informational (weight = 0).
Description: Reports whether the site's llms.txt content contains the ## Optional section as defined by the llmstxt.org convention.
Pass: llms.txt contains a section with the heading “Optional” (case-insensitive variants accepted) per the convention.
Fail: llms.txt exists but lacks an Optional section.
Neutral: llms.txt does not exist.
Notes: This check was renamed from llms-txt-directive in spec development, before v1.0 release, to avoid a name collision with a different check used by other scanner implementations measuring an unrelated concept (in-page pointers to llms.txt rather than file-structure validation). Implementations MUST use the renamed identifier llms-txt-has-optional-section.
markdown-url-support
Tier: High.
Description: Verifies that the site exposes Markdown variants of its content URLs via the .md URL convention.
Pass: A sampled HTML page has a sibling .md URL that returns text/markdown (or application/markdown) content type with markdown content.
Fail: The .md variant returns HTML, 404, or non-markdown content type.
Neutral: The site has no sampled HTML pages (e.g., empty or unreachable sitemap).
Notes: This check is distinct from markdown-negotiation (next check). markdown-url-support tests “does a .md twin URL exist at a parallel path?” markdown-negotiation tests “does the same URL serve markdown when requested via Accept header?” Both can pass independently.
markdown-negotiation
Tier: Critical. L2→L3 gate.
Description: Verifies the site honors HTTP content-negotiation for Accept: text/markdown. This check is the L2→L3 gate (§8.3).
Pass: GET request to a sampled URL with header Accept: text/markdown returns HTTP 200 with Content-Type: text/markdown (or application/markdown) and a body containing markdown (not HTML).
Fail: Request returns HTML body, 4xx response, or a non-markdown content type.
Neutral: HTTP 5xx, network error, or fetch is blocked at the WAF level (403/429 with bot-protection signature).
Notes: Conforming scanners SHOULD issue this request using a direct HTTP client (e.g., axios, fetch) rather than a browser-mediated client (e.g., Puppeteer, Playwright). The reasoning: real AI agents are HTTP clients, not browsers; testing via browser-mediated traffic produces false positives because some sites configure agent-friendly WAF rules only for non-browser traffic. The specification does not strictly require direct-HTTP testing, but implementations using browser-mediated testing MUST disclose this in their evidence output.
rendering-strategy
Tier: Medium.
Description: Verifies that the site's HTML pages contain substantive content in the server-rendered response, accessible without client-side JavaScript execution.
Pass: Either:
- The plain (pre-JavaScript) HTML response contains visible textual content above the SPA-shell threshold defined in Annex A, OR
- A pre-JS-vs-post-JS content-ratio comparison places the page in the SSR (server-rendered) or hydrated rendering category per the ratio thresholds defined in Annex A.
Fail: Either:
- The plain HTML response contains a recognizable framework root marker (e.g.,
<div id="root">) AND visible textual content below the SPA-shell threshold (the SPA-shell short-circuit), OR - The pre-JS-vs-post-JS ratio comparison places the page in the CSR (client-side-rendered) category.
Neutral: The scanner cannot perform JavaScript-pre/post comparison (e.g., scanner runs without a JavaScript engine), and the SPA-shell short-circuit does not fire.
Notes: The check uses a two-stage detection pipeline:
- Short-circuit — if the plain HTML has a framework marker AND visible text below the SPA-shell threshold, the page is immediately classified as CSR without invoking the more expensive ratio comparison.
- Ratio comparison — for ambiguous cases (no framework marker present, or visible text above the short-circuit threshold), the scanner compares pre-JS to post-JS content via a ratio metric and classifies the page using the ratio thresholds in Annex A.
Implementations MAY report a 3-way categorical (server-rendered / hydrated / client-rendered) in evidence. For scoring purposes, the check is binary: pass for server-rendered or hydrated; fail for client-side-rendered.
page-size-html
Tier: Low.
Description: Reports whether sampled pages are within reasonable size limits for agent consumption.
Pass: Sampled pages, when serialized to markdown and counted by character, average at or below the warn threshold defined in Annex A.
Warn (reported as pass with caveat in evidence): average between warn and fail thresholds.
Fail: average exceeds the fail threshold.
Neutral: No samplable pages.
Notes: This check measures the “naive crawler consumption cost” — full content serialization without boilerplate stripping. Implementations MAY surface alternative measurements (e.g., main-content extraction via Readability) in evidence, but the canonical measurement for scoring is full-content serialization.
http-status-codes
Tier: High.
Description: Verifies the site responds with appropriate HTTP status codes — clean 200 for valid paths, proper 404 for non-existent paths, no 5xx anomalies.
Pass: All of:
- A GET to the site's homepage returns HTTP 200 (without redirect-following; if the homepage 3xx-redirects, the response is treated as inconclusive and falls into Neutral).
- A GET to a deliberately-non-existent path (e.g.,
/this-path-definitely-doesnt-exist-{random-string}) returns HTTP 404 (a “hard 404”) rather than HTTP 200 (a “soft 404”). - No probed path returns 5xx or HTTP-status anomalies.
Fail: Either: homepage returns 4xx or 5xx; or the non-existent path returns HTTP 200 (a soft-404, which agents cache as canonical content).
Neutral: Network errors block determination, OR the homepage returns a 3xx redirect (this check does not follow redirects; redirect-cleanliness is the concern of the separate redirect-behavior check).
Notes: Soft-404 detection is essential because soft-404s are agent-confusion vectors: agents treat them as valid content and propagate downstream. This check intentionally does NOT follow redirects: redirect chains are evaluated by the separate redirect-behavior check, and a homepage 3xx response provides no information about the site's status-code hygiene at the canonical URL.
redirect-behavior
Tier: Medium.
Description: Verifies that any redirects the site issues stay within its own eTLD+1 boundary and use clean HTTP redirect mechanisms (not JavaScript-mediated).
Pass: Either:
- The homepage serves a 200 directly (no redirect occurs), OR
- The homepage redirects via HTTP 301/302/307/308, every redirect target is on the same eTLD+1 as the source, the chain contains no JavaScript-mediated redirects, and the chain length is within the maximum defined in Annex A.
Fail: Any of:
- A redirect target's eTLD+1 differs from the source's eTLD+1 (a cross-eTLD+1 redirect).
- A JavaScript-only redirect is detected at any hop (an agent without JavaScript sees an empty 200 response).
- The redirect chain exceeds the maximum hop count.
Neutral: Network errors block determination.
Notes: The check uses eTLD+1 (effective top-level domain plus one label) as its primary domain-boundary criterion, not a hardcoded CDN/edge whitelist. The eTLD+1 is computed using the Public Suffix List — a single-source-of-truth list maintained by Mozilla and major browser vendors. This approach handles CDN edges automatically (e.g., stripe.com → cdn.stripe.com is allowed because both share stripe.com as eTLD+1) without requiring the spec to enumerate every legitimate CDN domain.
Within-eTLD+1 redirect chains are permitted up to the maximum hop count; the check does NOT differentiate “1 hop within eTLD+1” from “5 hops within eTLD+1” as both indicate site-administrator-controlled routing. The cross-eTLD+1 boundary is the meaningful gate, not raw hop count.
agents-md-detection
Tier: Informational (weight = 0).
Description: Reports whether the site publishes an AGENTS.md file at conventional paths.
Pass: GET request to any of the canonical probe paths (see Annex A.5) returns HTTP 200 with non-empty content matching the AGENTS.md format conventions.
Fail: No AGENTS.md found at any probed path.
Neutral: Network errors block determination.
Notes: Peer-reviewed research (ETH Zurich, March 2026) finds that the presence of AGENTS.md often degrades agent task success. The specification therefore designates this check as Informational: presence is detected for observability and longitudinal correlation tracking, but the specification does not recommend implementing AGENTS.md based on current evidence. Detection is included to enable longitudinal study of whether the convention's effect changes over time as agents evolve.
cache-header-hygiene
Tier: Low.
Description: Verifies the site emits well-formed HTTP caching headers on its homepage.
Pass: Homepage response includes Cache-Control (well-formed), and at least one of ETag or Last-Modified, and at least one of Vary or Age.
Fail: Cache-Control is missing or malformed; or both ETag and Last-Modified are missing.
Neutral: Scanner cannot reach homepage.
Notes: Cache-header hygiene allows agent crawlers to revalidate efficiently rather than re-fetch on every visit. The pass criterion in this check is stricter than typical browser-cache adequacy because agent fleets at scale benefit from stronger cache signals.
10.4 Agent Endpoints
Six checks. Does the site expose specialized agent interfaces?
api-catalog
Tier: Medium.
Description: Verifies the site publishes an API catalog at the conventional .well-known path.
Pass: GET request to /.well-known/api-catalog returns HTTP 200 with a parseable linkset format containing at least one OpenAPI-spec link.
Fail: HTTP 404, malformed body, or no OpenAPI links.
Neutral: Network error.
Notes: Conforms to RFC 9727.
oauth-discovery
Tier: High.
Description: Verifies the site publishes OAuth Authorization Server metadata at the conventional .well-known path.
Pass: GET request to /.well-known/oauth-authorization-server returns HTTP 200 with valid AS metadata per RFC 8414. Falling back to /.well-known/openid-configuration is acceptable.
Fail: Neither path returns valid metadata.
Neutral: Network errors block all probes.
Notes: A site without OAuth functionality fails this check; the absence of OAuth metadata is itself the signal. Conforming implementations MUST NOT add neutral-when-no-OAuth logic.
oauth-protected-resource
Tier: Medium.
Description: Verifies the site publishes OAuth Protected Resource metadata at the conventional .well-known path.
Pass: GET request to /.well-known/oauth-protected-resource returns HTTP 200 with metadata containing a valid resource field per RFC 9728.
Fail: HTTP 404 or metadata lacks the resource field.
Neutral: Network errors block determination.
Notes: This check pairs semantically with oauth-discovery: the AS authorizes; the protected-resource metadata identifies what is protected. As with oauth-discovery, absence of metadata is the signal — no neutral-on-missing-OAuth logic.
mcp-server-card
Tier: High.
Description: Verifies the site publishes a Model Context Protocol Server Card at one of the candidate .well-known paths.
Pass: GET request to any of /.well-known/mcp.json, /.well-known/mcp/server-card.json, or /.well-known/mcp/server.json returns HTTP 200 with a parseable Server Card JSON containing recognizable MCP shape fields.
Fail: No reachable parseable Server Card at any probed path.
Neutral: Network errors block all probes.
Notes: The MCP path convention is currently undergoing finalization (SEP-1649 at PR-stage as of v1.0). v1.0 conforming scanners probe the candidate path set; implementations SHOULD update path probing to match the converged path when SEP-1649 lands. The Medium-to-High tier classification reflects that MCP is the most-mature emergent agent-protocol but its specification is still pre-final.
a2a-agent-card
Tier: Medium.
Description: Verifies the site publishes an Agent-to-Agent (A2A) Agent Card at the conventional .well-known path.
Pass: GET request to /.well-known/agent-card.json returns HTTP 200 with valid Agent Card JSON containing minimum fields per A2A v1.0: name, description, url, capabilities, skills[].
Fail: HTTP 404 or fields are incomplete.
Neutral: Network errors block determination.
Notes: A2A is at v1.0 specification but adoption remains low as of 2026.
agent-skills
Tier: Informational (weight = 0).
Description: Reports whether the site publishes an Agent Skills index at conventional paths.
Pass: GET request to either /.well-known/agent-skills/index.json or /.well-known/agent-skills.json returns HTTP 200 with a schema-valid skill index per Agent Skills v0.2.0.
Fail: No reachable parseable skill index.
Neutral: Network errors.
Notes: Agent Skills is at v0.2.0 (pre-1.0). Adoption in the v1 reference corpus is near-zero. The Informational tier reflects this maturity status; the check is detected for observability and may be promoted to a scored tier in a future spec version when the underlying spec finalizes and adoption rises.
11. Output schema
Conforming scanners MUST produce output conforming to the JSON Schema defined in Annex B. The schema captures:
spec_version— the specification version the scanner conforms to (e.g.,"1.0.0").profile— the profile applied to this scan (e.g.,"b2b-saas").domain— the brand root domain scanned.scanned_at— ISO 8601 UTC timestamp of the scan.scanner— an object withnameandversionidentifying the implementation.score— the aggregate score in 0–100.level— the assigned level ("L1","L2", or"L3").level_name— the human-readable level label.categories— an object with per-category subscores (discoverability,accessControl,contentReadability,agentEndpoints). Each entry includesscore(0–100),checks_passed(count of scored checks that passed in this category), andchecks_total(count of scored checks that are applicable under the active profile in this category). Informational checks are enumerated in thechecksarray but do not count towardchecks_total.checks— an array of per-check results, each withid,category,status,scored,tier,weight, and an implementation-definedevidenceobject.
The evidence field's internal shape is implementation-defined; consumers parsing scanner output MUST treat the field as opaque and parse defensively.
A conforming output document MUST validate against the schema in Annex B.
12. Conformance
12.1 Conforming scanner requirements
A scanner conforms to the Agent-Adoption Specification V1.0 if and only if it satisfies all of the following:
- Implements all scored checks. The scanner MUST implement all 18 scored checks defined in §10. Informational checks (the 7 designated as such in §10) are RECOMMENDED to implement but the scanner MAY mark them as
not-supportedin output without losing conformance. - Produces conforming output. The scanner MUST produce output validating against the JSON Schema in Annex B.
- Applies the score formula. The scanner MUST compute the score per §7.
- Applies level gates. The scanner MUST compute the level per §8, respecting the L1→L2 and L2→L3 gates.
- Applies skip logic. The scanner MUST apply per-profile skip logic per §9.
- Discloses spec version. Output MUST include the
spec_versionfield. Implementations conforming to v1.0 MUST NOT claim conformance to v2 or later versions until they update.
12.2 Implementation flexibility
A scanner is NOT required to:
- Use any specific technology. Implementations MAY use any HTTP client, programming language, or runtime.
- Match the reference implementation exactly. Where check definitions in §10 leave room for interpretation (e.g., specific path-probing order, exact byte threshold within the spec-permitted range), implementations MAY differ.
- Reproduce identical thresholds. The threshold values in Annex A define a spec-permitted range; conforming implementations MAY use slightly different exact values within the range, trading exact reproducibility for implementation flexibility. Implementations MUST disclose their threshold choices in scanner documentation if they deviate from the canonical defaults.
12.3 Profile-level overrides
Conforming implementations MAY publish custom profiles that override tier weights or gate criteria from the canonical configuration. Such overrides MUST be disclosed in implementation documentation; silent override is non-conforming.
13. Implementation guidance
13.1 The Defensive-Framing Principle
Conforming implementations MUST NOT make causal claims about the relationship between scoring high on the Agent-Adoption Specification and achieving improved AI visibility, citation in LLM responses, or commercial outcomes.
The companion empirical research (the Agent-Adoption Correlation Study) finds that agent-readiness signals are real but small contributors to LLM visibility outcomes (Cohen's d ≤ 0.65 on the strongest measured signal; ρ ≤ 0.15 across all signals; approximately 2% of outcome variance explained).
Implementations SHOULD frame results descriptively:
“Your site implements N of M agent-readiness practices defined in the Agent-Adoption Specification V1.0.”
Implementations SHOULD NOT use prescriptive framings such as:
“Implementing these checks will improve your AI visibility.”
“Boost your AI ranking by passing more checks.”
“Don't fall behind — sites that score below Y are ignored by AI.”
This principle protects both the specification's credibility and the user's expectations. The specification measures structural agent-accessibility; it does not promise outcomes.
13.2 Empirical-research grounding
Where this specification's behavior is empirically calibrated (e.g., specific tier assignments to checks based on study findings), the calibration is documented in §10 inline with the relevant check, with references to the companion correlation study.
Future spec versions (v1.1+, v2.0) may revise calibrations as additional empirical evidence accumulates from quarterly correlation re-runs. The companion study's study-2026-XX tags provide the audit trail for these revisions.
13.3 Scanner-fingerprint considerations
Bot-protection systems on target sites may distinguish among different HTTP-client fingerprints. A scanner using browser-mediated fetching (Puppeteer, Playwright) may receive different responses than a scanner using direct HTTP (axios, fetch). The companion correlation study finds that approximately 12% of brands in the reference corpus are blocked by at least one scanner due to bot-protection.
Conforming implementations SHOULD document their HTTP-client fingerprint in scanner documentation and SHOULD include the fingerprint in evidence output where it materially affects results. Implementations MUST NOT spoof user-agent strings to circumvent bot protection in a way that the implementer would not be willing to disclose; conformance presumes good-faith probing.
13.4 Sampling strategy for content checks
Several checks (markdown-url-support, markdown-negotiation, rendering-strategy, page-size-html) operate on sampled URLs from the site rather than the homepage alone. Implementations MUST document their sampling strategy. Reasonable strategies include sampling URLs from the site's sitemap (if present) or from the homepage's outbound link graph. Empty or inaccessible sample sets cause these checks to return neutral.
14. Versioning policy
14.1 Version bump rules
Threshold tuning is a patch-level change because thresholds are measurement details that may calibrate with data while semantic definitions stay unchanged. The specification anticipates quarterly threshold tuning as the correlation study produces calibration evidence; bumping minor or major every quarter would be operationally hostile.
14.2 v2.0 trigger
A v2.0 release is triggered when at least one quarterly correlation re-run produces results that materially change tier assignments or gate criteria. The specification authors do not bump major versions on a calendar; they bump when accumulated empirical evidence justifies a structural change.
L4 and higher levels may be defined in v2.0 if a captured scan reaches those levels and the specification authors have empirical basis for the gate criteria. Until then, L4+ remains reserved per §8.4.
14.3 Tag immutability
Tagged releases of this specification are immutable. Once v1.0.0 is published, the tag does not move. Errors discovered after release are documented in CHANGELOG.md with pointers to corrected versions in subsequent tags. The historical tag remains preserved as-is so prior citations remain valid.
This convention is standard for technical specifications: implementers must be able to target a specific version with confidence that its requirements will not change beneath them.
14.4 Multi-version coexistence
v1 and v2 (and subsequent majors) coexist indefinitely. Each major version has its own URL at respectarium.com/spec/agent-adoption/v{N} and remains accessible. Implementations MUST declare which version they conform to via the spec_version field in output (§11).
15. Maintainers
The Agent-Adoption Specification is maintained by Respectarium. Some maintainers also operate commercial implementations of this specification. The specification is open; commercial implementations are welcome from any party including parties unaffiliated with Respectarium.
Maintainer commercial interests are disclosed in CONTRIBUTING.md and governance/maintainers.md. The maintainers' commitment is to evaluate contributions on technical merit, not on contributor affiliation.
The decision process for accepting changes is documented in governance/maintainers.md.
16. Acknowledgments
The Agent-Adoption Specification builds on prior work in:
- Web standards — the Internet Engineering Task Force, W3C, and IANA registries that define many of the underlying primitives this specification measures.
- Agent-readiness research — independent server-log audits of LLM crawler behavior, peer-reviewed studies on AGENTS.md behavior, and the broader open conversation in the agent-readiness research community.
- Open-spec patterns — projects like OpenAPI, JSON Schema, and the IETF
/.well-known/registry whose patterns inform this specification's structure and conformance approach. - The companion correlation study (Agent-Adoption Correlation Study) — quarterly empirical testing that calibrates this specification against measured outcomes.
Contributors who land merged Pull Requests against this specification are listed by name (or organization, where preferred) in this section in subsequent releases. The Agent-Adoption Specification is open; contributions from any party including organizations operating their own implementations are welcomed and acknowledged regardless of contributor affiliation.
17. References
17.1 Normative references
- RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels. https://www.rfc-editor.org/rfc/rfc2119
- RFC 8174 — Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. https://www.rfc-editor.org/rfc/rfc8174
- RFC 9309 — Robots Exclusion Protocol. https://www.rfc-editor.org/rfc/rfc9309
- Sitemaps.org Protocol — XML Sitemap format. https://www.sitemaps.org/protocol.html
- RFC 8414 — OAuth 2.0 Authorization Server Metadata. https://www.rfc-editor.org/rfc/rfc8414
- RFC 9727 — Linkset for IETF API Catalogs. https://www.rfc-editor.org/rfc/rfc9727
- RFC 9728 — OAuth 2.0 Protected Resource Metadata. https://www.rfc-editor.org/rfc/rfc9728
17.2 Informative references
- Companion Correlation Study, Q1 2026 baseline. https://respectarium.com/research/correlation-2026-04
- llms.txt convention. https://llmstxt.org/
- Web Bot Auth Architecture, draft. https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/
- Public Suffix List. https://publicsuffix.org/
- IANA Link Relations Registry. https://www.iana.org/assignments/link-relations/
Annex A — Threshold values
This annex enumerates the threshold values, canonical lists, and probe paths referenced in §10. Threshold tuning is a patch-level change (§14.1) and may be revised in v1.0.x releases without bumping minor or major versions. Canonical lists (AI-bot User-Agents, noise-rel blocklist, etc.) are similarly patch-revisable as the underlying registries and ecosystem evolve.
A.1 AI-bot User-Agent canonical list
The canonical list of AI-bot User-Agent strings recognized by ai-bot-rules as of v1.0 (14 entries):
Explicitly retired entries (recognized by older spec drafts but excluded from v1.0 conformance, due to vendor deprecation announcements made on or before 2026-02-20):
anthropic-ai— superseded byClaudeBotfamilyClaude-Web— superseded byClaude-User
Conforming scanners MUST NOT count the retired entries as valid AI-bot User-Agents for ai-bot-rules. The list is updated in patch releases as additional AI-bot crawlers publicly register and as deprecation announcements occur.
A.2 Noise-rel blocklist
The link-headers check uses a blocklist approach: any Link header rel-value NOT in the noise blocklist counts as agent-relevant. The blocklist enumerates rel-values that are purely browser-resource-loading hints, not agent-discovery indicators.
Noise-rel blocklist (9 entries) as of v1.0:
preloadprefetchstylesheeticondns-prefetchpreconnectmodulepreloadapple-touch-iconunknown(synthetic placeholder for unparseable rel-values)
Any Link header rel-value not in the blocklist (e.g., alternate, describedby, api, self, service-desc, manifest, canonical, or future agent-relevant rel-values registered with IANA) counts as a pass condition. The blocklist is updated in patch releases as new browser-resource rel-values appear.
A.3 Threshold values for content-readability checks
A.4 eTLD+1 boundary determination
The redirect-behavior check uses eTLD+1 (effective top-level domain plus one label) as its domain-boundary criterion. The eTLD+1 of a domain is computed using the Public Suffix List, a community-maintained list of public suffixes used by major browsers and security tools.
Conforming scanners SHOULD use a current Public Suffix List implementation (e.g., the tldts library for Node.js, or equivalent libraries in other ecosystems) rather than maintaining a custom domain whitelist. This approach has two advantages:
- Coverage — the Public Suffix List handles all CDN edges, regional ccTLD variants, and operational subdomains automatically. A site at
example.comredirecting tocdn.example.comis within eTLD+1; a redirect tounrelated-attacker.comis not. - Maintenance — the Public Suffix List is updated regularly by the browser vendor community; relying on it lets the spec inherit those updates without per-version maintenance.
A redirect chain is “within eTLD+1” if and only if every hop's target shares the same eTLD+1 as the original source. A single hop crossing eTLD+1 disqualifies the chain.
A.5 AGENTS.md probe paths
Three canonical paths probed in order:
/AGENTS.md(case-sensitive — convention's preferred form)/agents.md(lower-case fallback)/.github/AGENTS.md(repository-style location)
First-reachable-with-non-empty-content wins. The check is informational; a narrower probe set (e.g., probing only /AGENTS.md) is acceptable for implementations that prefer minimal probe overhead, but conforming scanners SHOULD probe all three to maximize observability for longitudinal correlation tracking.
A.6 Sitemap probe paths
In priority order:
- All
Sitemap:directives declared in/robots.txt(when the robots.txt is reachable and parseable). The directive may declare any URL; the scanner follows declared paths. - Fallback path 1:
/sitemap.xml - Fallback path 2:
/sitemap_index.xml
First-reachable-with-valid-XML wins. The XML root element may be either <urlset> (a single sitemap) or <sitemapindex> (an index of sitemaps); either qualifies as a valid sitemap for pass.
A.7 MCP Server Card probe paths
The candidate path set as of v1.0:
/.well-known/mcp.json/.well-known/mcp/server-card.json/.well-known/mcp/server.json
First-reachable-with-parseable-JSON wins. This candidate set will collapse to a single canonical path in a future patch release when MCP SEP-1649 finalizes.
A.8 A2A Agent Card probe path
A single canonical path:
/.well-known/agent-card.jsonper A2A Protocol v1.0 (Linux Foundation).
A.9 Agent Skills probe paths
In priority order:
- Primary:
/.well-known/agent-skills/index.jsonper Agent Skills Discovery v0.2.0 - Legacy fallback:
/.well-known/skills/index.json(pre-v0.2.0 sites)
First-reachable-with-schema-valid-content wins.
Annex B — Output JSON Schema
The canonical output JSON Schema for v1.0 conforming scanners is published in this repository at schemas/output-v1.schema.json. An illustrative example output document:
{
"spec_version": "1.0.0",
"profile": "b2b-saas",
"domain": "example.com",
"scanned_at": "2026-04-26T12:34:56Z",
"scanner": {
"name": "respectarium-scanner",
"version": "0.7.8"
},
"score": 47,
"level": "L2",
"level_name": "AI-Aware",
"categories": {
"discoverability": { "score": 80, "checks_passed": 2, "checks_total": 3 },
"accessControl": { "score": 50, "checks_passed": 2, "checks_total": 3 },
"contentReadability": { "score": 35, "checks_passed": 4, "checks_total": 7 },
"agentEndpoints": { "score": 30, "checks_passed": 2, "checks_total": 5 }
},
"checks": [
{
"id": "robots-txt-exists",
"category": "discoverability",
"status": "pass",
"scored": true,
"tier": "high",
"weight": 7,
"evidence": {
"request": { "method": "GET", "url": "https://example.com/robots.txt" },
"response": { "status": 200, "user_agent_directives_found": 3 }
}
},
{
"id": "llms-txt-exists",
"category": "contentReadability",
"status": "pass",
"scored": false,
"tier": "informational",
"weight": 0,
"evidence": {
"found_at": "/llms.txt",
"size_bytes": 4231
}
}
]
}The evidence field's internal shape is implementation-defined. Consumers parsing scanner output MUST treat the field as opaque and parse defensively.
The categories.checks_total field counts only checks that are applicable under the active profile (skipped checks are excluded from the count).
Citation
To cite this specification:
Agent-Adoption Specification, Version 1.0. Respectarium, 2026-04-26.
Available at: https://respectarium.com/spec/agent-adoption/v1
Source: https://github.com/respectarium/agent-adoption-spec/releases/tag/v1.0.0
This work is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Agent-Adoption Specification, Version 1.0. Respectarium, 2026-04-26.