We’re delivering pizzas with Ferraris.
We built the most capable machines in the history of computing, and we point them at the dumbest possible work. A frontier model can reason through a legal brief or write a compiler. We hand it a homepage and ask it to figure out what the company sells by grinding through 40,000 tokens of divs, tracking scripts, and cookie banners.
That’s a Ferrari idling in traffic with a stack of pizza boxes on the passenger seat.
The model providers are fine with it. They bill by the token, so every wasted token is revenue. Nobody at the meter is rooting for efficiency — I made that case in The $250K Token Illusion, where NVIDIA’s CEO wants your engineers burning a quarter-million in spend a year. Waste is the business model.
So it won’t get fixed from the top. It gets fixed at the input.
The model isn’t the expensive part. The input is. Hand a frontier model clean, structured data and it answers in 2,000 tokens. Hand it a wall of HTML and it takes 40,000 and still gets half of it wrong. Same engine, same question, 20x the bill. The only variable is what you fed it.
What wasting tokens looks like
- An agent parsing your HTML for a price that could have been one line of structured data.
- A model re-sorting a table your API could have returned sorted.
- A chatbot answer padded to three paragraphs when the answer was a number.
- Four CRUD calls to do one thing, because nobody shipped the single endpoint that does it.
Every one of those is a Ferrari delivering a pizza. The machine is overqualified. The road is the problem.
Now stand where the customer stands
I send my agent to find the best price on a 65-inch TV. It checks six retailers. Five are clean — structured product data, a real API, an llms.txt that points the way. The sixth is a maze. My agent burns $4 in tokens fighting that one site to extract a price it should have read in a sentence.
I paid that $4. Not the store. Me.
The token tax is yours to pay — and yours to pass on.
That’s the part that gets missed. An unoptimized site doesn’t save you money — it charges a token tax, and you pass it straight to your customer. Every clumsy page, every missing spec, every table an agent has to re-sort is a line item on someone else’s bill. You’re subsidizing your own sloppiness with their token budget.
And agents don’t file complaints. An agent that burns tokens fighting your site simply doesn’t come back. It routes around you to a competitor that costs less to read. A service that’s expensive to operate against is a service agents quietly stop using.
Match the output to the input
You don’t get the output you want by buying a bigger model. You get it by handing the model you already have something worth reading.
- Ship an llms.txt so the agent knows what you are without scraping.
- Publish an OpenAPI spec so it can act instead of guess.
- Return structured data so nobody re-sorts a table inside a language model.
- Expose agent-native endpoints so one intent is one call.
Optimize the input and the same Ferrari that was stuck delivering pizzas goes back to doing what it’s built for.
We’re early, and it compounds
This is the very start of agent commerce. Most sites still treat agents like a strange kind of browser. The ones that don’t — the ones that are cheap to read and easy to act on — are quietly winning the agents that decide where their humans spend. The token tax you pass on today is the transaction you don’t get tomorrow.
It’s also why we let agents pay for our work. An agent can buy a BotVisibility report, hand it to your Claude or Codex, and have your web presence updated in an afternoon — the llms.txt, the OpenAPI spec, the structured data, the agent-native endpoints that take you off the token-tax list. Same loop, pointed at fixing instead of fighting. It’s on our Pricing page.
The Ferrari was never the problem.
