The Other Token Budget

Gartner put a number on a fear that’s been hanging over every engineering org for a year: within two years, an enterprise’s AI token bill per developer could meet or exceed that developer’s salary. The analyst behind the forecast, Nitish Tyagi, is already fielding the anecdotes that make it real — a developer who burned through $20,000 in a month, a business user who somehow hit $32,000.

The framing is deliberately alarming, and Tyagi says so plainly: the point is to scare the industry into governing token cost before it governs them. Fair enough. But the headline number buries the more useful sentence, and almost every write-up of the report has walked right past it.

Here it is: there is no direct relationship between the tokens a developer burns and the productivity they produce. Tyagi’s phrase for the failure mode is tokenmaxxing — the assumption that more consumption means more output. It doesn’t. What correlates with quality, he argues, is the opposite discipline: optimizing consumption, spending only what the task actually requires.

That single reframe quietly demolishes the panic it’s attached to. If spend and value were coupled, a salary-sized token bill would just be the cost of a salary-sized productivity gain, and there’d be nothing to govern. The reason the number is scary is precisely that the coupling is broken. A large share of that spend is buying nothing.

So the real question isn’t “how do we afford this?” It’s “where is the waste, and who put it there?”

Gartner’s answer stops at your own front door

The report’s prescriptions are sound, and they’re worth doing. Route high-frequency, low-complexity work to smaller models and reserve frontier models for the work that earns them. Put thresholds and escalation policies into the workflow so a runaway agent can’t quietly drain a quarter’s budget. Train developers in context engineering — feeding the model only what’s relevant, summarized tightly, with the noise stripped out. Review your highest-consumption workflows on a cadence the way you’d review a slow query.

Every one of these levers points inward. They govern the inputs you author: your prompts, your context windows, your tool definitions, your retrieval pipelines. That’s the half of the token budget that lives inside your own repository, and it’s the half Gartner knows how to discipline.

But context engineering ends at the edge of your prompt. Your agents don’t.

The input you can’t context-engineer

The whole premise of the agentic shift is that models stop waiting for you to paste in context and go get it themselves. An agent checks a competitor’s price. It reads a vendor’s API docs to wire up an integration. It pulls a regulation off a government site, scrapes a product catalog, parses a support page to resolve a ticket. Every one of those round trips spends tokens — and the agent pays to ingest whatever it finds, exactly as it is, with no summarization step in between.

Here’s the uncomfortable part: you didn’t write that content, and you can’t context-engineer it. It belongs to someone else. And on the open web, “as it is” means bloated.

The average site an agent visits is a landfill of markup. Wrapper divs nested twelve deep, inline styling, tracking script, cookie-consent boilerplate, navigation chrome repeated on every page, dynamically injected junk that exists only to satisfy a layout engine no agent will ever render. By token weight, the actual answer — the price, the parameter, the paragraph the agent came for — is a rounding error. On a typical page, the overwhelming majority of the tokens an agent ingests are structural noise. The signal it needs can be under three percent of what it’s forced to read and pay for.

Your model pays full price for all of it. Input tokens aren’t free, and at agent scale, with thousands of autonomous round trips a day, the volume of other people’s unstructured content flowing through your context windows is enormous — and entirely outside the governance frameworks Gartner is recommending.

This is the Agent Tax.

An unoptimized website is, in effect, billing your token budget for its own negligence.
You eat the cost of its bloat every time an agent reads it.

You can route to a smaller model and trim your own prompts all day; none of it touches the dead weight you’re importing from the outside.

Tokenmaxxing has a supply side

Tyagi’s tokenmaxxing critique aims at the consumer — the developer who reaches for the frontier model and the maximal context out of habit. But there’s a supply side he doesn’t name. Every publisher of unstructured, agent-hostile content is a tokenmaxxer too, just involuntarily, forcing every agent that visits to over-consume on their behalf.

And the incentives don’t self-correct. The model providers bill by the token, so bloat is revenue to them, not a defect. The site owner doesn’t see the agent’s bill, so the cost is invisible on their side. The only party with both the visibility and the incentive to care is you — the enterprise whose budget is actually being drained — and until now you’ve had no way to even measure it, let alone fix it on infrastructure you don’t own.

But two things are shifting the leverage back. First, the same optimization Tyagi prescribes for internal context applies just as cleanly to external content: structured, agent-readable pages cost a fraction of the tokens to parse. A clean machine-readable spec, a published llms.txt, an agent card that states what a site does and how to use it — these collapse a 50,000-token scrape into a few hundred tokens of actual answer. Second, you increasingly get to choose. As agent-native sources emerge, your agents can prefer the supplier whose content is cheap to read over the one whose content sets your budget on fire. Agent-readiness stops being a courtesy the publisher extends and becomes a procurement criterion you enforce.

What to actually do about it

Do everything Gartner says — the internal governance is real and overdue. Then extend the same logic one layer out, to the content your agents depend on but didn’t author:

Measure the tax before you argue about the model. Before assuming your bill is a model-pricing problem, find out how much of it is parsing problem. The waste in external content is often larger and far cheaper to eliminate than another round of prompt-trimming.
Audit the sites your agents actually hit. The vendor docs, the data sources, the partner APIs in your critical workflows. A site that scores well for agent-readiness is one your agents read cheaply and reliably; one that scores badly is a line item you can’t see.
Make readiness a procurement question. When two suppliers offer the same data, the one whose content is structured for agents is materially cheaper to consume at scale. Price that in.
Fix your own house too. If your product is something other people’s agents read — and increasingly it is — your bloat is now their tax, and their tolerance for it is dropping. Agent-readiness is becoming a reason to be chosen.

This is the problem we built BotVisibility to make visible. It runs 58 checks across five readiness levels — discoverability, usability, optimization, indexability, and agent-native readiness — against any URL, and hands back a prioritized score in about twelve seconds, no signup. Point it at a site your agents depend on, or at your own, and you’ll see the tax in concrete terms: what an agent has to wade through to get an answer, and how much of that is waste you can remove.

Gartner is right that the tokenmaxxing era is ending. But the moat isn’t a smaller model or a tighter prompt. It’s structured content — yours and the content you choose to consume. The budget nobody’s governing yet is the one being spent on everyone else’s mess. Govern that, and the salary-sized bill stops looking inevitable and starts looking like a choice.