Agent-Adoption Correlation Study — 2026-04 Baseline

On this page

Abstract
1. Background
2. Methodology
3. Findings
3.1 Two signals graduate
3.2 Effect sizes
3.3 Claude vs GPT reverse
3.4 Cross-scanner divergence
3.5 Ecosystem nascency
3.6 Bot-protection
3.7 Respectarium score
4. Caveats and limitations
5. Conflict of interest
6. Reproducibility
7. Acknowledgments
8. Citation and license

Abstract

We present the first cross-sectional study pairing agent-readiness signals with LLM visibility outcomes. Three independent scanners (Respectarium, Cloudflare, and Fern) produced 72 agent-readiness predictors — 66 individual scanner checks plus 5 aggregate metrics and 1 derived signal — measured against a sample of 908 brands across 50 B2B SaaS categories. Outcomes were captured from Respectarium's leaderboards across three large language models (Claude, GPT, and Gemini).

Under analytical thresholds pre-registered before any results were viewed, 2 of 50 evaluated predictors graduate to scored-signal status: cloudflare.level (Cloudflare's aggregate readiness level) and respectarium.markdown-negotiation (HTTP Accept: text/markdown content-negotiation support). Of the 72 predictors fed in, 22 were excluded before evaluation (20 by the variance filter at <5% adoption, plus 2 by additional pre-registered data-quality filters), leaving 50 evaluated against five LLM-visibility outcomes.

All effect sizes are small to medium (Cohen's d ≤ 0.65). No "large" effects (d > 0.8) appear anywhere in the study. The most striking structural finding is that Claude and GPT reverse direction on four signals simultaneously — the same agent-readiness check predicts higher Claude listing and lower GPT listing (or vice versa), at FDR-significant levels in both directions.

Twenty of the 66 per-check signals have <5% real-world adoption in the brand universe, indicating that the agent-readiness specification ecosystem is in its infancy as of 2026-04. Cross-scanner divergence on shared check names (ρ ≈ 0.03 on three pairs of same-named checks) is documented as a primary finding rather than a footnote.

We publish the methodology, code, and dataset for reproducibility. We acknowledge a conflict of interest — Respectarium operates one of the three scanners evaluated — and report findings unfavorable to the Respectarium scanner transparently. Longitudinal re-measurement is needed to test causal hypotheses.

From 72 measured signals to 2 promoted predictors

Pre-registered analytical thresholds, applied mechanically across 11 statistical scripts.

Total measured signals72

66 per-check + 5 aggregates + 1 derived (bot_protected)

Excluded — zero variance20

<5% of brands deviate from modal status (ecosystem-nascency)

Excluded — additional filters2

further pre-registered data-quality filters before evaluation

Tested predictors50

evaluated against 5 outcomes (250 tests)

DROP30

failed univariate AND multivariate AND no subgroup signal

KEEP_INFORMATIONAL18

passed some criteria but not all four for promotion

PROMOTE_SCORED2

passed all four pre-registered criteria

cloudflare.levelrespectarium.markdown-negotiation

Source: results/10-verdicts.json + results/00-data-quality.json (study-2026-04).

1. Background

1.1 What is agent-readiness?

"Agent-readiness" describes how accessible a website is to AI agents and automated HTTP clients. It encompasses HTTP-protocol practices (robots.txt with AI-bot rules, content-negotiation for markdown, OAuth discovery metadata, MCP server cards) and content-shape practices (llms.txt files, page-size limits, redirect-behavior cleanliness, link headers).

The space is fragmented as of 2026-04: at least three open scanner systems implement different methodologies for measuring agent-readiness, with check definitions that often share names but diverge substantively in operationalization.

1.2 Why this study?

A common claim in the agent-readiness discourse is that improving these signals will improve LLM visibility — that a site becomes more discoverable and rank-able by AI assistants when it adopts agent-readiness practices. This claim has, until now, lacked cross-sectional empirical evidence at scale.

This study provides the first such evidence. We measure 72 agent-readiness predictors (66 individual scanner checks plus 5 aggregate metrics and 1 derived signal) from three scanners against five LLM-visibility outcomes on a sample large enough to support both univariate and multivariate analyses with proper multiple-testing corrections.

1.3 The three scanners

We use three independent agent-readiness scanners:

Respectarium scanner — a closed-source implementation of the Agent-Adoption Specification V1, maintained by Respectarium.
Cloudflare — Cloudflare's public isitagentready.com API.
Fern — Fern's open-source afdocs scanner.

Each scanner outputs per-check results in its native enum (pass / fail / neutral for Respectarium and Cloudflare; pass / warn / skip / fail / error for Fern). We do not artificially unify the enums; instead, each is encoded numerically and analyzed in its native shape.

1.4 The Agent-Adoption Specification

The Respectarium scanner implements the Agent-Adoption Specification V1, an open methodology document. Anyone may build an additional implementation against the same specification; cross-implementation comparison is itself a research activity.

2. Methodology

2.1 Pre-registered analytical thresholds

The promotion / drop / informational rules below were committed in writing on 2026-04-24, two days before any results were viewed (analysis began 2026-04-26). The thresholds are applied mechanically by the analysis script — they are not retrofitted to data. The verbatim pre-registered thresholds are preserved as methodology.md §1 of the study repository, with the 2026-04-24 commit date verifiable from git history (the commit predates the 2026-04-26 analysis runs).

A signal graduates to PROMOTE_SCORED status when ALL of:

FDR-adjusted p < 0.05 (Benjamini-Hochberg) in univariate Spearman correlation against the predictor's best outcome
95% confidence interval on the multivariate β coefficient excludes zero (OLS with eligible-category fixed effects)
Direction-consistent in at least 2 of 3 LLMs in binary "listed by this LLM (1) vs not (0)" analysis
Not in a redundancy cluster with another signal that has higher mean |ρ| (clustering at |ρ| ≥ 0.8)

A signal drops entirely when ALL of:

FDR-adjusted univariate p > 0.10 AND
Multivariate β 95% CI spans zero AND
No subgroup (per-LLM, per-category) shows direction-consistent effect

All other signals are KEEP_INFORMATIONAL.

2.2 Data sources and acquisition

Brand universe construction

The 908-brand universe was constructed by querying three large language models — Claude, GPT, and Gemini — across 50 B2B SaaS categories. For each category, each LLM was asked, in substance, "What are the top 10–20 brands or products in {category}?" (exact prompt phrasing varied slightly per LLM; the intent in all cases was a ranked list of category-leading brands as the LLM understood them).

The brands appearing in any LLM's response for any category, deduplicated by domain, form the brand universe. Capture window: leaderboard data anchored to 2026-04-19; full corpus frozen for analysis on 2026-04-25.

This sampling design has an important consequence — the selection effect documented as a methodological caveat below.

Outcome variables

The leaderboard data yields five outcome variables per brand:

Outcome	Description	Coverage
`claiScore`	Respectarium quality score, 0–100	100% (n = 908)
`claudeRank`	Position in Claude's listing for the brand's category (1–20), or null if not listed	77% (n = 700)
`geminiRank`	Position in Gemini's listing	46% (n = 420)
`gptRank`	Position in GPT's listing	44% (n = 402)
`avgRank`	Mean of available LLM ranks	100%

Note that not every brand is listed by every LLM. A brand is in the universe if any of the three LLMs listed it; per-LLM coverage varies because the three LLMs' implicit selection functions differ.

Predictor data — three scanners

For predictor data, three independent agent-readiness scanners ran against each of the 908 brands' primary domains:

Scanner	Source	Predictors per brand	Native status enum
Respectarium v0.7.8	closed-source, Agent-Adoption Specification V1 implementation	25 per-check + score + level	`pass / fail / neutral`
Cloudflare	public `isitagentready.com` API	19 per-check + level + isCommerce	`pass / fail / neutral`
Fern	open-source `afdocs` CLI	22 per-check + passRatio (computed aggregate)	`pass / warn / fail / skip / error`

Scanner sweep methodology

The three scanners ran via a parallel orchestrator over the period 2026-04-25 / 2026-04-26. Each scanner runs in its own concurrency lane with rate-limiting and retry policy tuned to the scanner's transport:

Cloudflare — concurrency 1, ~6-second gap between requests (compliant with the public API's announced rate limit)
Fern — concurrency 4 (local CLI, CPU-bound; no upstream rate limit)
Respectarium — concurrency 2, ~1-second gap (compliant with the scanner's backend capacity)

Per-domain scan results are written atomically as JSON to data/scans/<runId>/<scanner>/<domain>.json (success) or <domain>.error.json (persistent failure). Transient errors (network timeouts, HTTP 5xx, Retry-After responses) are retried up to 2 attempts before being marked as persistent failures. The orchestrator is idempotent on (runId, domain, scanner) — re-running a sweep on the same runId skips already-completed work, supporting partial recovery from interrupted runs.

After all three scanners completed, a merge step joined per-scan results on the domain key into a single canonical dataset (data/merged.json). Per-domain rows include all three scanners' outputs plus the brand's leaderboard data; missing-scanner cases are preserved as null rather than dropped.

Bot-protected brands — those for which one or more scanners returned success: false because the target site blocked the scanner's fingerprint — are kept in the dataset with their non-blocked scanners' data intact. They form the basis of the derived bot_protected predictor (§3.6).

Selection effect (important methodological caveat)

Every brand entered the dataset by being mentioned in at least one LLM's listing for at least one category. The findings therefore characterize relative ranking among already-LLM-discovered brands, NOT LLM-mention probability. We do not have non-mentioned brands in the dataset, so we cannot estimate the effect of agent-readiness on whether an LLM mentions a brand at all — only on relative rank position once mentioned. This limitation is discussed further in §4.

2.3 Sample

n = 908 brands (target was 1500; sample expansion is planned for the next quarterly study cycle)
50 categories of which 24 have ≥ 20 brands and qualify for per-category breakouts
Single snapshot, captured 2026-04-25 / 2026-04-26 (cross-sectional, not longitudinal)

2.4 Encoding

Native enum statuses are encoded numerically for correlation analysis:

Cloudflare and Respectarium statuses → pass = +1, fail = -1, neutral = 0, missing = null
Fern statuses → pass = +1, warn = +0.5, skip = 0, fail = -1, error = -1, missing = null

A derived predictor bot_protected is set to 1 when any scanner reported success: false for the brand (typically due to fingerprint-based bot blocking by the target site). Otherwise bot_protected = 0.

2.5 Variance filter

Per-check predictors with <5% variance (one outcome dominates ≥ 95% of brands) are excluded from correlation analysis. With near-zero variation, no statistical relationship can be detected. The exclusion preserves transparency: the excluded checks are listed separately as a finding in their own right (see §3.5).

2.6 Multiple-testing correction

Benjamini-Hochberg FDR is the primary correction throughout, applied within each script's family of tests (e.g., across all 250 univariate tests in the univariate analysis). Bonferroni is reported alongside as a more conservative reference. Per-LLM analyses apply FDR within each (LLM, strategy) cell separately AND globally across all per-LLM tests.

2.7 Determinism and reproducibility

All analyses are deterministic — no random sampling or seeded randomness. Re-running the analysis scripts on the same dataset produces numerically identical outputs. Code, dataset, and complete reproducibility instructions are at github.com/respectarium/agent-adoption-research.

3. Findings

3.1 Two signals graduate to PROMOTE_SCORED

Of 50 tested predictors, only two pass all four pre-registered thresholds:

Predictor	Univariate ρ	Multivariate β (95% CI)	Cross-LLM consistency	Effect interpretation
`cloudflare.level`	+0.138	+1.59 [0.52, 2.66]	≥ 2 of 3 LLMs same direction	+1.6 rank positions per +1 level
`respectarium.markdown-negotiation`	+0.111	+1.43 [0.30, 2.55]	≥ 2 of 3 LLMs same direction	+1.8 rank positions in Gemini for sites that pass

Univariate ρ vs predictor's best outcome (claudeRank for cloudflare.level; avgRank for respectarium.markdown-negotiation). Multivariate β reported for best-by-|t| outcome.

Both effects are small to medium in magnitude. Neither is a "silver bullet" predictor. Their value is in being the only signals that survive every check our pre-spec required.

Sites with cleaner basic crawler hygiene (Cloudflare's aggregate level reflects robots.txt quality, AI-bot rules presence, sitemap availability) are associated with modestly better LLM visibility outcomes, after controlling for category. Sites that respond appropriately to Accept: text/markdown content-negotiation requests show similar modest improvement. These two signals capture different layers of agent-readiness — protocol-level clarity vs content-presentation flexibility — and both are confirmed predictively useful.

Multivariate effect sizes — best-outcome view

OLS regression with category fixed effects. 95% CI excluding zero is one of four pre-registration criteria for PROMOTE_SCORED.

β coefficient (rank positions for rank outcomes; points for claiScore)

Source: results/04-multivariate.json (study-2026-04). Each row = predictor's best-by-|t| outcome.

3.2 Effect sizes are small to medium across the board

Signal	Effect (pass minus fail)	Cohen's d
`respectarium.oauth-discovery`	+3.7 positions in Claude rank	0.64 (medium)
`cloudflare.robotsTxt`	+10.2 claiScore points	0.46 (small)
`cloudflare.robotsTxt`	+3.1 positions in Claude rank	0.53 (medium)
`cloudflare.robotsTxtAiRules`	+8.5 claiScore points	0.38 (small)
`fern.redirect-behavior`	+7.1 claiScore points	0.32 (small)

No effect sizes reach the conventional "large" threshold (Cohen's d > 0.8) anywhere in the study. The narrative this supports is "agent-readiness is a real but small contributor to LLM visibility" — not the stronger claim that adoption produces dramatic visibility gains.

3.3 Claude vs GPT structurally reverse on 4 signals

The most striking structural finding. For four respectarium.* checks, Claude and GPT correlate in opposite directions with whether the LLM lists the brand:

Predictor	Claude (binary) ρ	GPT (binary) ρ	What it means
`respectarium.sitemap-exists`	+0.134*	-0.172*	Claude lists sites with sitemaps more; GPT lists them less
`respectarium.oauth-discovery`	+0.135*	-0.113*	Claude rewards OAuth declaration; GPT does the opposite
`respectarium.robots-txt-exists`	+0.125*	-0.104*	Same pattern, robots.txt presence
`respectarium.markdown-negotiation`	-0.106*	+0.119*	Reversed direction, content-negotiation

(Asterisks mark FDR-adjusted p < 0.05.)

Claude and GPT reverse direction on four agent-readiness signals

Predictor → being-listed-by-LLM correlation. Same predictor, opposite signs. All four pairs FDR-significant in both columns.

Source: results/02-per-llm.json, binary strategy (study-2026-04).

These are not noise. The directions are statistically significant in both columns simultaneously, on the same predictors against the same outcome family. Claude and GPT select brands for their listings using structurally different criteria.

Note that all four reversal signals are Respectarium-scanner predictors. The same conceptual checks as measured by Cloudflare or Fern do not produce the same reversal pattern, likely because the three scanners operationalize these checks differently (see §3.4 on cross-scanner divergence). The reversal phenomenon may be partially scanner-implementation-specific.

Plausible mechanisms include:

Training-data recall hypothesis: GPT's training data is comparatively older. Established household-name brands are recognized by GPT from training without active crawling — and these brands disproportionately have basic web hygiene like robots.txt and sitemaps. Claude's training is comparatively more recent, putting more weight on directly-readable site signals.
Crawler-policy hypothesis: GPT's crawlers may be blocked by sites with aggressive robots.txt configurations. Sites without robots.txt are implicitly permissive. The negative direction in GPT could reflect this.
Selection bias hypothesis: When asked "give me top brands in category X," each LLM applies its own implicit selection function. The four reversal-signals correlate with category structure differently across LLMs.

We cannot disambiguate these mechanisms with this dataset. What we can say is that a single universal agent-readiness score that optimizes outcomes across all three LLMs is structurally constrained — improvements that benefit Claude listing actively hurt GPT listing on these four checks. Universal optimization is unreachable; weighted compromises remain possible but cannot resolve the underlying reversal.

3.4 Cross-scanner divergence: same names, different things

Three pairs of scanners share check names but produce essentially uncorrelated results:

Pair	Spearman ρ	Interpretation
`fern.redirect-behavior`↔ `respectarium.redirect-behavior`	0.03	Different operationalizations
`fern.markdown-url-support`↔ `respectarium.markdown-url-support`	0.02	Different operationalizations
`fern.rendering-strategy`↔ `respectarium.rendering-strategy`	0.03	Different operationalizations

Other shared-name pairs correlate moderately:

Pair	Spearman ρ
`cloudflare.markdownNegotiation`↔ `respectarium.markdown-negotiation`	0.57
`cloudflare.linkHeaders`↔ `respectarium.link-headers`	0.57
`fern.http-status-codes`↔ `respectarium.http-status-codes`	0.51
`cloudflare.level`↔ `respectarium.level`	0.28

Same name, different operationalization

11 cross-scanner pairs of identically-named checks. Three correlate at ρ < 0.05 — they measure different things.

0.00.30.61.0

cloudflare.robotsTxt↔ cloudflare.robotsTxtAiRules

within-scanner (same family) · reference

0.834

cloudflare.markdownNegotiation↔ respectarium.markdown-negotiation

0.574

cloudflare.linkHeaders↔ respectarium.link-headers

0.572

fern.http-status-codes↔ respectarium.http-status-codes

0.508

fern.cache-header-hygiene↔ respectarium.cache-header-hygiene

0.396

cloudflare.sitemap↔ respectarium.sitemap-exists

0.365

fern.page-size-html↔ respectarium.page-size-html

0.342

cloudflare.level↔ respectarium.level

different definitions

0.282

fern.rendering-strategy↔ respectarium.rendering-strategy

essentially uncorrelated

0.034

fern.redirect-behavior↔ respectarium.redirect-behavior

essentially uncorrelated

0.029

fern.markdown-url-support↔ respectarium.markdown-url-support

essentially uncorrelated

0.024

ρ ≥ 0.6 (moderate-strong)0.3 ≤ ρ < 0.6 (weak-moderate)ρ < 0.3 (weak / divergent)

Source: results/06-redundancy.json (study-2026-04). All 11 same-name pairs ranked.

The three independent scanners agree on which brand is which (they all use domain as join key) but they often disagree on which brands satisfy a given check, even when the check has the same name. The "Respectarium / Cloudflare / Fern" labels are not interchangeable measurement instruments — they are different operationalizations of overlapping concepts.

This is itself a publishable finding for the agent-readiness research community: claims of the form "this site is agent-ready by Specification X" are scanner-implementation-specific. Any cross-scanner comparison must be made with the divergence explicitly acknowledged.

3.5 Twenty of 66 per-check signals have <5% adoption — the ecosystem is in its infancy

The variance filter (§2.5) excludes 20 of the 66 per-check signals because <5% of brands have anything other than the modal status. (These 20 are part of the 22 predictors excluded before evaluation — the remaining 2 exclusions are aggregate or derived predictors filtered by additional pre-registered data-quality criteria.) The excluded set is almost entirely the bleeding-edge agent-protocol family:

Cloudflare exclusions: mcpServerCard, oauthProtectedResource, oauthDiscovery, agentSkills, contentSignals, webBotAuth, a2aAgentCard
Respectarium exclusions: llms-txt-exists, llms-txt-valid, llms-txt-size, agents-md-detection, mcp-server-card, agent-skills, web-bot-auth, a2a-agent-card, content-signals, api-catalog, oauth-protected-resource
Fern exclusions: llms-txt-directive, tabbed-content-serialization

20 of 66 per-check signals: adoption rates 0–4%

These checks could not be evaluated for predictive power — too few brands have implemented them.

respectariumcloudflarefern

Source: results/00-data-quality.json (study-2026-04).

Practical adoption of these specifications, as of 2026-04, ranges from approximately 0% to 4% in the surveyed brand universe. The specifications exist — agent-protocol families like MCP, A2A, OAuth-discovery, AGENTS.md, and Cloudflare's commerce-protocol stack (x402, mpp, ucp, acp, ap2) are public and documented. The practice has not yet arrived.

This is publishable on its own merit. We cannot measure the predictive power of these checks until adoption rises high enough to produce variance against outcomes. The next study cycle (planned Q2 2026) will re-measure this universe; the comparison of "X% adoption now vs Y% adoption six months ago" becomes its own data point.

3.6 Bot-protection: meaningful covariate, not standalone signal

12% of brands (109 of 908) had at least one scanner blocked by the target site's bot-protection. Welch's t-test comparing outcomes between bot-protected and unblocked brands:

Outcome	mean (blocked)	mean (unblocked)	diff	p-value
`claiScore`	59.05	58.05	+1.00	0.66
`claudeRank`	10.70	10.33	+0.37	0.60
`geminiRank`	7.04	6.57	+0.47	0.43
`gptRank`	5.93	5.93	0.00	0.99

Standalone, bot-protection has zero detectable effect on outcomes.

In multivariate regression with category fixed effects, however, bot_protected emerges as a meaningful covariate: β = -13 claiScore points, p < 0.02. The within-category negative effect is masked at the across-category level because bot-protection is concentrated in enterprise/incumbent categories that have higher baseline claiScore overall.

bot_protected is best treated as a covariate to control for in multivariate models, not as a scored signal in its own right.

A specific per-LLM finding worth flagging: GPT listing is positively correlated with bot-protection (+8.1 percentage point listing rate for bot-protected brands vs unblocked brands). Claude (-2.1pp) and Gemini (-0.4pp) show no such effect. The pattern is consistent with GPT preferentially listing established brands recognized from training data, even when those brands' websites cannot be crawled directly.

3.7 The Respectarium scanner's score aggregate has zero predictive power

We report this finding transparently as part of our conflict-of-interest commitment (§5).

respectarium.score (the headline 0–100 number the Respectarium Agent-Adoption Check tool produces for each brand) has the lowest predictive power of any aggregate predictor measured:

Mean |ρ| across 5 outcomes: 0.016
FDR-adjusted p (vs avgRank, the best-performing outcome): 0.69
Multivariate t-statistic (best across outcomes): 1.84, raw p = 0.07, CI spans zero

For comparison, cloudflare.level (the analogous Cloudflare aggregate) shows mean |ρ| = 0.109 — almost an order of magnitude stronger. The Respectarium scanner's individual checks include strong predictors (markdown-negotiation graduates to PROMOTE_SCORED), but the v1 weighting scheme that combines them into the score aggregate dilutes the signal.

The intended use of this finding is informative: a v2 scoring scheme rebuilt around the empirically-validated signals should produce a more predictive aggregate. We publish the weak v1 result rather than concealing it.

4. Caveats and limitations

4.1 Selection effect at the brand-recruitment level

The most important methodological caveat. Every brand in the dataset entered by virtue of being LLM-mentioned in at least one category for at least one of the three LLMs. Findings characterize:

Relative ranking among already-LLM-discovered brands — what predicts higher rank when an LLM does mention you
NOT LLM-mention probability — we cannot estimate whether agent-readiness causes a previously-unmentioned brand to become LLM-mentioned

This is structural to the dataset construction. Resolving it would require collecting outcome data on a broader, non-LLM-mentioned brand pool — a substantial expansion that future studies may attempt.

4.2 Cross-sectional only

Single snapshot, 2026-04-25 / 26. We cannot make causal inferences about whether adopting agent-readiness signals causes higher LLM rank, only whether the two are correlated at this point in time.

Quarterly re-runs over the next 4–8 quarters will permit panel analysis and substantially stronger inference. The Q2 study tag (study-2026-07) will be the first time-series step.

4.3 Sample size n = 908 (target was 1500)

Adequate for univariate and per-LLM (binary-outcome) analyses; per-category breakouts demanding (|ρ| ≥ 0.45 needed for raw p < 0.05 at n = 20). Sample expansion to ~1500 brands is planned for the next study cycle, primarily via category expansion (50 → 80–100 categories).

4.4 No brand-size proxy

The Respectarium leaderboard data does not include employee count, revenue, domain age, or other size-related metadata. Multivariate regression uses only category fixed effects as a control. This is a meaningful gap — brand size is plausibly a strong confound for LLM visibility (larger brands are more frequently mentioned in training data) and we cannot control for it directly.

Possible Q2 enhancements include WHOIS-based domain-age enrichment as a covariate, and external traffic-tier data where licensing permits.

4.5 Bot-protection asymmetry

The set of brands blocked by Respectarium ≠ the set blocked by Cloudflare ≠ the set blocked by Fern. Different scanner fingerprints trip different bot-defenses. bot_protected reflects "this scanner's fingerprint was blocked" rather than "this site is universally bot-protected." Independent implementations will produce different bot-protected sets.

The bot_protected covariate is the union of scanner-block events: a brand is flagged as bot-protected if any of the three scanners was blocked. The 12% rate reflects this union, not universal blocking. A brand could be bot_protected: 1 because Cloudflare blocked it while Fern and Respectarium succeeded — independent implementations would produce different bot-protected sets and could observe different effects.

4.6 The 20 zero-variance checks cannot be evaluated

We cannot say agent-protocol checks (MCP, A2A, OAuth, AGENTS.md, content-signals, commerce protocols) do not predict LLM visibility — only that adoption is currently too sparse to measure their predictive power. Their effect on outcomes becomes measurable only as adoption rises.

4.7 Three-LLM scope

Only Claude, GPT, and Gemini are surveyed. Other AI assistants (Perplexity, Copilot, Gemini-AI-Mode, etc.) may behave differently. The study's findings should not be generalized to LLMs outside the surveyed three without re-measurement.

5. Conflict of interest disclosure

Respectarium operates one of the three scanners evaluated in this study. To mitigate analytical bias:

Pre-registered analytical thresholds. All promotion / drop / informational rules were committed in writing on 2026-04-24, two days before any results were viewed. The threshold logic is implemented mechanically in scripts/10-verdicts.ts — auditable.
Findings unfavorable to the Respectarium scanner are reported transparently. §3.7 documents that the Respectarium scanner's score aggregate has zero predictive power for LLM-visibility outcomes. This finding is published rather than concealed; it is a key input to v2 spec design.
Per-scanner reporting throughout. Readers can examine each scanner's signals independently. The Respectarium scanner does not receive special treatment in any tabulation.
Cross-scanner divergence on shared check names is documented as a primary finding (§3.4), not a footnote. We do not minimize the implication that scanner outputs diverge.
The Agent-Adoption Specification is open. The Respectarium scanner implements an open spec at respectarium.com/spec/agent-adoption/v1. Independent implementations are invited and would be welcomed as additional comparison data.

6. Reproducibility

All analysis is deterministic and fully reproducible. The complete analytical pipeline is published at:

github.com/respectarium/agent-adoption-research (study tag: study-2026-04)

Contents:

data/merged.json — the canonical merged dataset (908 rows × ~80 columns when flattened)
data/merged.csv — flat CSV view of the same data
scripts/ — 11 TypeScript analysis scripts and 5 typed helper modules
results/ — canonical outputs from running scripts on the dataset (matches output of any peer's reproduction)
methodology.md — pre-registered thresholds, encoding rules, and conflict-of-interest disclosure
REPRODUCIBILITY.md — step-by-step reproduction protocol

Reproduction prerequisites: Node.js 22+, ~150 MB disk space, ~1 minute of compute. Numeric outputs are byte-identical between runs (no random seeds, no sampling, deterministic dependencies).

7. Acknowledgments

This study used data from three publicly-available scanner systems:

Respectarium Agent-Adoption Check tool — closed-source implementation of the open Agent-Adoption Specification V1, available at respectarium.com/agent-adoption-check. Specification published openly at respectarium.com/spec/agent-adoption/v1.
Fern afdocs CLI — open-source agent-documentation scanner from Fern, published at github.com/fern-api/fern. Used as published.
Cloudflare isitagentready.com API — public API from Cloudflare. Used as published.

Outcome data was sourced from Respectarium's tracked leaderboards across 50 B2B SaaS categories, captured 2026-04-25.

We thank the broader agent-readiness research and engineering community whose published specifications (W3C, IETF, individual companies' open work) informed the check definitions evaluated here.

8. Citation and license

Citation

Respectarium Research Team. Agent-Adoption Correlation Study — 2026-04 Baseline. Respectarium, 2026-04-26.
Web: https://respectarium.com/research/correlation-2026-04
Source + data: https://github.com/respectarium/agent-adoption-research/releases/tag/study-2026-04

License

This article and its underlying data are licensed under Creative Commons Attribution 4.0 International (CC-BY 4.0). The analysis code is licensed under MIT. Both licenses permit modification and redistribution with appropriate attribution.

Future studies

Quarterly re-measurement is planned. The Q2 study (study-2026-07) will publish to the same repository as a new immutable tag. Past tags remain accessible as published; the comparative time series across tags is itself a research output.

For updates: open an Issue or watch the repository on GitHub.

Respectarium Research, 2026-04-26