Your Site Is Agent-Ready. The Agents Aren't.

A real incident, not a hypothetical

March 31, 2026. An AI agent running Claude Opus was asked to find the top article on BotVisibility.com. The site publishes llms.txt, skill.md, openapi.json, and a full REST API with JSON responses.

The agent ignored all of it.

Instead, it ran a Readability parser against the homepage HTML. Got back a footer with a copyright notice. Hit a 404 on /blog. Needed three requests before stumbling onto the content. One call to /llms.txt would have returned the complete page index, API documentation, and everything needed in about 200 tokens.

This isn't a BotVisibility.com problem. The site did everything right. The agent did everything wrong.

Approach	Requests	Tokens consumed	Result
HTML scraping (what happened)	3	~2,500	Footer text, a 404, partial article
`llms.txt` check (what should have happened)	1	~400	Complete site index with API docs

A 6x token overhead on a simple lookup. Scale that across thousands of agent interactions per day and the waste compounds fast.

Why agents don't look

Four things are working against agent-first discovery right now.

There's no handshake protocol

When a web crawler hits a domain, the first thing it does is check robots.txt. That behavior is hardcoded into every major crawler. No equivalent exists for AI agents. There's no built-in routine that says "before interacting with any domain, check for machine-readable metadata." The agent has to be explicitly told to do it, and most aren't.

Training data teaches human browsing patterns

LLMs learned to interact with the web the way humans do: visit the page, read the content, extract what's useful. llms.txt didn't exist in training data. skill.md didn't exist. The model's instinct is to parse HTML because that's what worked in every example it ever saw.

Tool defaults favor scraping

The tools agents use (Readability, Playwright, Puppeteer, web_fetch) all default to "get the HTML, extract text." The happy path in every agent framework skips discovery and goes straight to content extraction. An agent would have to actively override its tooling to check for structured files first.

It has to be taught

The agent that failed today has an explicit rule in its configuration file: "Before interacting with any web app or service, check llms.txt at the root." Even with that rule written down, the agent didn't follow it. Rules in config files compete with trained behavior, and trained behavior usually wins unless the rule is embedded deeply enough into the agent's workflow.

The cost of not looking

BotVisibility's own research in The Agent Tax found that fully unoptimized sites can cost agents 120,000-500,000+ excess tokens per session. But here's the uncomfortable follow-up: even fully optimized sites eat unnecessary tokens if the agent never checks for the optimization.

At Claude Sonnet rates ($3/$15 per million tokens), a site getting 10,000 agent visits daily would generate roughly $75-150 in unnecessary token costs across the agents hitting it. The agents pay that cost, but the site bears the consequence: slower responses, higher error rates, and agents that give up before finding the right content.

Six fixes agent builders can adopt today

The fix isn't complicated. It's behavioral. Here are concrete patterns any agent builder can implement right now.

1. Add a pre-fetch discovery step

Before any web interaction, the agent should run a lightweight discovery check. This belongs in the agent's system prompt, AGENTS.md, or equivalent configuration:

Before fetching content from any new domain:
1. Check {domain}/llms.txt
2. Check {domain}/.well-known/agent-card.json
3. Check {domain}/skill.md
4. If any exist, use their structured data instead of parsing HTML
5. Fall back to HTML extraction only if no agent-friendly files are found

This adds one HTTP request to the first interaction with any domain. If the file exists, every subsequent interaction is cheaper. If it doesn't, the agent falls back to normal behavior. The downside is near zero.

2. Cache discovery results per domain

Once an agent knows a domain has llms.txt, it shouldn't check again for hours or days. A simple in-memory or on-disk cache eliminates repeated discovery overhead:

{
  "botvisibility.com": {
    "llms_txt": true,
    "agent_card": false,
    "skill_md": true,
    "openapi": true,
    "cached_at": "2026-03-31T15:00:00Z",
    "ttl_hours": 24
  }
}

3. Prefer structured endpoints over HTML

When llms.txt or an OpenAPI spec reveals a JSON API, use it. Don't parse the HTML version of a page that has a JSON equivalent. The structured endpoint is cheaper, more reliable, and less likely to break when the site redesigns.

4. Build discovery into the tool layer

This is the highest-leverage fix. Instead of relying on every agent to remember the rule, build it into the tools agents use. A web_fetch function could automatically check for llms.txt before falling back to HTML:

def smart_fetch(url):
    domain = extract_domain(url)

    # Check for agent-friendly metadata first
    llms = fetch(f"https://{domain}/llms.txt")
    if llms.status == 200:
        return {"type": "structured", "content": llms.text}

    # Fall back to HTML extraction
    return {"type": "html", "content": readability_parse(url)}

When the tool handles discovery, every agent using that tool gets the benefit without needing explicit instructions.

5. Use HTTP Link headers as hints

Site owners can add Link headers to every HTTP response pointing to their agent-friendly files:

Link: </llms.txt>; rel="ai-metadata"; type="text/plain"
Link: </openapi.json>; rel="service-desc"; type="application/json"

Even a basic fetch gets the hint. Agent frameworks can watch for these headers and redirect to structured sources automatically. This mirrors how browsers discover RSS feeds through <link rel="alternate"> tags.

6. Treat discovery like robots.txt

The reason every crawler checks robots.txt isn't because it's technically elegant. It's because the behavior was built into every major crawler from the beginning. The convention stuck because the tools enforced it.

llms.txt needs the same treatment. Agent framework authors (LangChain, CrewAI, AutoGPT, OpenClaw, and others) should build llms.txt discovery into their default web interaction patterns. Once the major frameworks do it, every agent built on them inherits the behavior.

The adoption path

Getting from "agents don't check" to "agents always check" follows a predictable pattern. It's the same path robots.txt took in the 1990s.

Phase	Timeline	What happens
Manual configuration	Now	Individual agents add discovery rules to config files. Works but fragile.
Framework-level defaults	Next 6 months	Major frameworks add `llms.txt` checks to web tools. The tipping point.
Model-level awareness	12-18 months	LLMs trained on post-2025 data instinctively check for agent-friendly files.
Protocol standardization	2+ years	An RFC or W3C recommendation formalizes the discovery protocol.

What you can do today

If you build agents: Add the pre-fetch discovery step to your agent's configuration. Test it against BotVisibility.com's scanner to see what you're missing.

If you build agent frameworks: Make llms.txt discovery a default behavior in your web tools. Don't make developers opt in. Make it automatic.

If you own a website: Keep publishing llms.txt and structured metadata. The agents will catch up. Add Link headers to speed adoption.

If you want to measure the gap: Run npx botvisibility against your site to see your agent-readiness score, then watch your server logs for how many agents actually request your llms.txt. The difference between your score and your actual agent traffic is the discovery gap.

We've spent a year telling websites to get ready for agents. The websites listened. Now it's time to tell the agents to get ready for websites.

The infrastructure exists. The files are published. The APIs are live. The agents just need to look.