The Agent Tax

Research

The agent tax: 30 optimizations that slash AI token costs

How unoptimized websites silently drain AI agent budgets — and the structured fixes that cut token consumption by 70–99%.

Every website interaction an AI agent can't do efficiently becomes a token tax — and at $3–$25 per million tokens, that tax adds up fast. When a site lacks machine-readable metadata, structured APIs, or proper error handling, agents burn 5–100x more tokens through HTML parsing, trial-and-error discovery, retry loops, and over-fetching. Across the 30 agent-readiness checks audited by BotVisibility, a fully unoptimized site can cost an agent 120,000–500,000+ excess tokens per session — translating to $0.36–$7.50 per interaction on Claude Sonnet 4.6, or up to $12.50 per interaction on Claude Opus 4.6. At scale, these hidden costs compound into thousands of dollars monthly.

The research is unambiguous: structured, agent-friendly sites reduce token consumption by 70–99% across discovery, usability, and optimization layers. Published benchmarks from Anthropic, independent researchers, and production agent deployments confirm that naive agent interactions waste 35–45% of their total token budget on architectural friction — parsing irrelevant HTML, retrying failed requests, and reasoning about missing information that a simple structured file could have provided in 50 tokens.


Current token pricing sets the stakes

Before examining each optimization, the pricing context establishes why token efficiency matters. Current production rates:

ModelInput / 1M tokensOutput / 1M tokensTypical agent role
Claude Sonnet 4.6$3.00$15.00Primary agent workhorse
Claude Opus 4.6$5.00$25.00Complex reasoning tasks
GPT-5.4$2.50$15.00General-purpose agent
GPT-5.4-mini$0.75$4.50High-volume, simple tasks
Claude Haiku 4.5$1.00$5.00Classification, routing

A single wasted token costs fractions of a cent, but agent workflows routinely process 50,000–500,000 tokens per task. A 10-turn research agent accumulates 500,000+ input tokens without context pruning. Browser agents hit 175,000 tokens by step 10 as context compounds quadratically. Published production data shows naive agents spending $12 to process a $50 insurance claim — 47,000 tokens wasted on redundant lookups, re-summarizations, and hallucinated tool calls.

The math is straightforward: at Claude Sonnet 4.6 rates, 10,000 wasted tokens = $0.03 input + $0.15 output = $0.18 per interaction. Multiply by thousands of daily agent visits and the "agent tax" becomes a material business cost.


L1 — Discovery layer: where agents spend the most finding you

The discovery layer represents the largest potential savings because without structured discovery files, agents must crawl, parse, and reason about raw HTML — the single most token-wasteful activity in the agent workflow. A typical HTML page contains 15,000–30,000 tokens of raw markup, of which only 2,000–3,000 tokens (10–20%) is actual content. The rest is navigation, scripts, styling, and boilerplate.

1. llms.txt — the highest-impact single optimization

Token savings: 10,000–200,000 tokens per site discovery interaction (90–99% reduction)

Without llms.txt, an agent must crawl the homepage (15,000–40,000 tokens of HTML), strip irrelevant markup, follow navigation links to understand the site's structure, and potentially crawl 5–10 additional pages. Total cost: 50,000–200,000+ tokens to build a mental model of what the site offers.

With llms.txt, the agent fetches a single Markdown file — typically 200–1,500 tokens — containing a curated hierarchy of the site's content, capabilities, and documentation links. Fern's documentation platform reports over 90% token reduction compared to HTML pages. The companion llms-full.txt provides complete documentation in a single file, eliminating multi-page crawls entirely.

Over 600 websites have adopted the standard, including Anthropic, Stripe, Cloudflare, Vercel, and Perplexity. Mintlify and GitBook auto-generate llms.txt for hosted documentation. A developer-built MCP scraper measured 70–90% token reduction in production when converting raw HTML (15,000–30,000 tokens per article) to structured Markdown (2,000–3,000 tokens).

Dollar impact: At Claude Sonnet 4.6 rates, saving 50,000 input tokens per discovery = $0.15 per agent visit. At 1,000 agent visits/day = $4,500/month saved.

2. Agent Card (agent.json) — structured capability advertising

Token savings: 10,000–50,000 tokens per capability discovery (85–95% reduction)

Part of Google's Agent2Agent (A2A) protocol, the agent card at /.well-known/agent-card.json declares a site's capabilities, authentication requirements, supported skills, and input/output modes in 500–3,000 tokens. Without it, an agent must crawl documentation pages, probe endpoints through trial-and-error, and parse marketing copy to understand what a service can do — easily consuming 10,000–50,000 tokens of HTML parsing and exploratory reasoning.

3. OpenAPI spec — eliminates trial-and-error API discovery

Token savings: 50,000–400,000 tokens per API integration (60–90% reduction)

A published OpenAPI specification provides deterministic, structured access to every endpoint, parameter, and schema. A medium-complexity API spec (20–50 endpoints) runs 10,000–30,000 tokens — expensive, but far cheaper than the alternative.

Without an OpenAPI spec, an agent must crawl API documentation HTML pages (15,000–50,000 tokens each), attempt calls with guessed parameters, parse error responses, and iterate. For a service like Gmail's API, understanding capabilities without a spec could require crawling dozens of doc pages — 200,000–500,000 tokens. Each failed exploratory call costs 500–2,000 tokens in LLM reasoning about the error.

4. robots.txt AI policy — prevents wasted blocked requests

Token savings: 200–500 tokens per blocked request avoided; 2,000–10,000 tokens per crawl session

A robots.txt with AI-specific directives (for GPTBot, ClaudeBot, PerplexityBot, and others) immediately tells agents which paths are accessible. 35.7% of top-1,000 websites now block GPTBot specifically. Without clear AI directives, agents attempt to access restricted content, receive 403 errors, and must parse error pages and reason about access policies — each blocked attempt wasting 200–500 tokens. An agent hitting 10 blocked pages wastes 2,000–5,000 tokens before understanding the access boundary.

5. Documentation accessibility — auth-gated docs multiply costs 3–10x

Token savings: 5,000–50,000 tokens per documentation interaction

When developer documentation requires authentication, agents face a cascade of token costs: reasoning about login flows, attempting authentication, handling redirects, and potentially failing entirely — then falling back to trial-and-error endpoint probing. Auth-gated docs push this multiplier even higher. Publicly accessible documentation lets agents consume structured API information directly, with the agent parsing clean doc pages at 2,000–5,000 tokens versus exploring APIs blind at 20,000–100,000+ tokens.

6. CORS headers — unlock direct API access for browser agents

Token savings: 5,000–25,000 tokens per browser-based interaction

Browser-based agents (ChatGPT web browsing, GPT Actions, browser extensions) are subject to same-origin policy. Without proper CORS headers, these agents cannot access cross-origin API data at all, forcing them to fall back to parsing full HTML pages (15,000–30,000 tokens) instead of receiving structured JSON responses (500–2,000 tokens). Properly configured CORS headers enable a 10–30x token reduction by allowing direct API access from browser contexts.

7. AI meta tags — micro-optimization with outsized routing value

Token savings: 10,000–30,000 tokens per page interaction (when present)

HTML meta tags like llms:description, llms:url, and llms:instructions consume only 20–80 tokens per page but provide critical routing information. The llms:instructions tag is particularly powerful — telling the agent "Product specs are in the .specs class" or "Ignore the sidebar" eliminates exploratory parsing of the full 15,000–30,000 token HTML page.

8. Skill file (SKILL.md) — progressive disclosure saves 95–99%

Token savings: 50,000–200,000 tokens per multi-skill interaction (95–99% reduction)

Originally developed by Anthropic (December 2025), SKILL.md uses a three-stage progressive disclosure model: advertise (~100 tokens per skill showing only name + description), activate (1,000–5,000 tokens when a matching task is detected), and reference (additional files loaded on demand). Without skill files, an agent must load entire documentation sets — 50,000–200,000 tokens — to understand how to use a product.

Adoption is strong: Claude Code, OpenAI Codex CLI, GitHub Copilot, Cursor, Windsurf, and Gemini CLI all support SKILL.md. Over 500,000 skills are indexed on the SkillsMP.com marketplace.

9. AI site profile (ai.json) — capability manifest in 1,000–5,000 tokens

Token savings: 5,000–40,000 tokens per integration setup

A JSON manifest describing the site's name, capabilities, and skill links provides agents with a machine-readable integration map. At 1,000–5,000 tokens, it replaces the need for agents to reason about site capabilities from unstructured HTML (10,000–50,000+ tokens).

10. Skills index (skills/index.json) — catalog without loading

Token savings: 5,000–50,000 tokens per skill discovery session

A centralized JSON index listing all available agent skills with names, descriptions, and endpoints enables agents to discover capabilities without loading full skill definitions. At 500–2,000 tokens for a typical index, it replaces the alternative of loading all skill files upfront — which could cost 10,000–50,000+ tokens for a site with 10–20 skills.

11. Link headers — zero-parse discovery at 10–30 tokens

Token savings: 2,000–5,000 tokens per initial page load

HTTP Link headers and HTML link elements pointing to llms.txt, ai.json, and agent-card.json enable zero-parse discovery — agents get resource locations from response headers without parsing any page content. At 10–30 tokens, these headers eliminate the need to parse the homepage HTML head section (2,000–5,000 tokens).

12. MCP server — the most powerful but most complex optimization

Token savings: highly variable; 2,000–150,000 tokens depending on implementation quality

An MCP server provides agents with typed, discoverable tool definitions via JSON-RPC — eliminating the need to reverse-engineer API capabilities from documentation. The Scalekit benchmark (March 2026) provides the most rigorous comparison: MCP consumed 4–32x more tokens than CLI for identical GitHub tasks.

The critical nuance is implementation quality. Naive MCP implementations that expose all tools upfront create massive overhead — GitHub's 43-tool MCP server injects ~26,000 tokens per conversation. However, well-implemented MCP with dynamic toolsets achieves dramatic savings. Anthropic's code execution approach reduced tool definition loading from 150,000 tokens to 2,000 tokens — a 98.7% reduction.

The takeaway: having an MCP server is necessary, but its quality (Check #30) determines whether it saves or wastes tokens.


L2 — Usability layer: where agents burn tokens on friction

The usability layer addresses the operational friction that causes agents to waste tokens on authentication complexity, error handling, and retry loops. Research shows naive agents waste 35–45% of their total token budget on architectural failures.

13. API read operations — the gateway from scraping to structured access

Token savings: 5,000–25,000 tokens per data retrieval (80–95% reduction)

When list, get, and search endpoints are available via API, agents receive structured JSON responses (200–2,000 tokens) instead of scraping and parsing full HTML pages (15,000–30,000 tokens). Raw HTML inflates token counts by roughly 3x compared to equivalent structured data.

14. API write operations — 5–20x cheaper than form-filling

Token savings: 2,000–10,000 tokens per write action

An API write operation (structured JSON payload with predictable response) costs 200–500 tokens. The equivalent through UI automation — understanding form structure, filling fields sequentially, handling validation errors, submitting — consumes 5–20x more tokens.

15. API primary action — orders of magnitude difference

Token savings: 10,000–100,000+ tokens when core value is API-accessible

When an application's primary value action is available via API, the interaction is deterministic and typed — costing 500–2,000 tokens. When the same action requires UI automation, agents need visual reasoning, screenshot interpretation, DOM navigation, and multi-step form interactions. Browser agents spend $4 per 10-step workflow and $20+ per 50-step workflow.

16. API key authentication — eliminates multi-step auth tax

Token savings: 2,000–6,000 tokens per session

API key authentication requires one step: set the Authorization header (10–20 tokens). OAuth 2.0 requires 5–7 steps: register client, redirect to auth endpoint, handle consent, exchange authorization code, parse token response, handle token refresh, and use access token.

17. Scoped API keys — fewer errors, fewer recovery loops

Token savings: 1,000–5,000 tokens per session through error prevention

Scoped API keys reduce agent token waste by preventing agents from accidentally attempting dangerous actions, triggering error handling and recovery reasoning.

18. OpenID Configuration — one fetch replaces endpoint guessing

Token savings: 1,000–3,000 tokens per authentication setup

The OIDC discovery document at /.well-known/openid-configuration provides all authentication endpoints, supported scopes, and grant types in a single 200–400 token JSON response.

19. Structured error responses — the retry loop killer

Token savings: 5,000–15,000 tokens per error event

This is one of the highest-impact usability optimizations. When an agent receives a generic HTML error page, it must parse 2,000–10,000 tokens of markup to extract the error. A structured JSON error response costs 30–50 tokens and provides actionable next-step information. AWS/Strands Agents research found that clear terminal states reduce tool calls from 14 to 2 — an 86% reduction.

20. Async operations — reduces 24 polls to 2–3 status checks

Token savings: 3,000–5,000 tokens per long-running operation

For operations taking seconds to minutes, agents without async support either block waiting or poll blindly. An agent polling every 5 seconds for a 2-minute operation makes ~24 requests, each adding ~200 tokens — 4,800 tokens of polling overhead. With a pollable job ID pattern, the agent checks 2–3 times.

21. Idempotency support — makes retries safe and free

Token savings: 3,000–8,000 tokens per uncertain write operation

When a write request times out without idempotency support, the agent enters a verification loop costing thousands of tokens. With idempotency keys, the agent attaches a UUID and retries confidently — no verification, no duplicate detection, no cleanup reasoning.


L3 — Optimization layer: where every response byte costs tokens

The optimization layer targets per-request efficiency — reducing the token cost of every individual API interaction.

22. Sparse fields — 80–90% reduction per API response

Token savings: 500–5,000 tokens per API response

A typical user profile response with 50 fields costs ~2,000 tokens. With sparse fields, the same request returns only the needed fields for ~200 tokens — a 90% reduction.

23. Cursor pagination — stable, efficient, and 17x faster at depth

Token savings: 5,000–15,000 tokens over large pagination sequences

Offset pagination requires agents to track page numbers and handle duplicates — each adding 100–200 tokens of state management reasoning per page. Cursor pagination simplifies this to ~50 tokens per transition. PostgreSQL benchmarks show cursor pagination delivers 17x better performance with 1 million records.

24. Search and filtering — prevents the worst token waste scenario

Token savings: 10,000–100,000+ tokens per filtered query

Without server-side filtering, an agent must fetch all records and filter client-side. For 1,000 records at ~100 tokens each: 100,000 tokens when perhaps only 10 match — a 99% waste rate. Server-side filtering returns only matching records: 1,000 tokens instead of 100,000.

25. Bulk operations — eliminate per-request overhead at scale

Token savings: 10,000–30,000 tokens per batch of 100 operations

Each individual API call carries 200–500 tokens of fixed overhead. For 100 operations: 30,000 tokens of overhead alone. A single bulk operation reduces this to ~2,000 tokens — a 93% reduction.

26. Rate limit headers — prevent blind retry spirals

Token savings: 8,000–20,000 tokens per rate-limited encounter

Without rate limit headers, agents hit 429 errors and enter exponential backoff. A typical encounter: agent sends 5,000-token request, gets 429, retries — 15,000 tokens wasted. With rate limit headers, the agent checks remaining quota before starting.

27. Caching headers — 304 responses save 99.6% per cache hit

Token savings: 500–10,000+ tokens per polling cycle

An agent receives a 304 Not Modified response (~50 tokens) instead of full response (1,000–12,500+ tokens). For 10 polls where data changes once: 1,450 tokens vs 10,000 85% savings.

28. MCP tool quality — the difference between 350,000 wasted tokens and 1,160

Token savings: 10,000–350,000 tokens per conversation

Each MCP tool definition consumes 550–1,400 tokens. GitHub's MCP server at 93 tools loads ~55,000 tokens before conversation begins. Over a 25-turn conversation with 120 tools: 362,000 tokens on schemas alone. The on-demand discovery pattern cuts this to ~1,160 tokens — a 99.7% reduction.


The cumulative agent tax in dollars

Consider a moderately complex agent session:

ActivityUnoptimized (tokens)Optimized (tokens)Savings
Site discovery (no llms.txt)80,0001,00099%
API discovery (no OpenAPI)100,00015,00085%
Authentication (OAuth vs API key)5,0002099.6%
5 API calls (no sparse fields)10,0001,00090%
Error handling (1 error)12,0008099%
Pagination (100 items)15,0005,00067%
MCP tool overhead (50 tools)55,0002,00096%
Total277,00024,10091%

At Claude Sonnet 4.6 rates ($3/MTok input), this single interaction costs $0.83 unoptimized vs. $0.07 optimized — a $0.76 difference. At 1,000 agent interactions per day:

  • Claude Sonnet 4.6: $760/day saved = $22,800/month
  • Claude Opus 4.6: $1,267/day saved = $38,000/month
  • GPT-5.4: $633/day saved = $19,000/month

Conclusion: agent-readiness is a competitive moat

Three key insights:

First, the discovery layer dominates the total agent tax. Checks 1–12 account for roughly 70% of potential token savings. Implementing llms.txt, an OpenAPI spec, and a SKILL.md file delivers more savings than all optimization-layer checks combined.

Second, the agent tax compounds nonlinearly. LLM context accumulates quadratically across turns. A 10-turn session without context pruning can hit 500,000+ tokens. Optimizations that reduce early-session token consumption have outsized downstream effects.

Third, agent readiness is becoming a competitive differentiator. Gartner predicts that by 2027, over 40% of agentic AI projects will be canceled before production — largely due to cost overruns. The 30 BotVisibility checks quantify the difference between a site that welcomes AI agents at $0.07 per interaction and one that taxes them at $0.83 or more. At scale, that gap determines which services agents choose to use.