The Cheapest Year of AI You'll Ever Have
Why the token economics you’re building on aren’t real yet — and what happens to your AI strategy when they become real.
Your AI bill isn’t what you think it is
In June 2025, the developers paying $20 a month for Cursor — the AI coding tool that had become an industry darling — woke up to a different deal. Cursor had moved its Pro plan onto usage-based pricing, and people who’d budgeted twenty dollars started seeing overage charges several times that. One team reported a $7,000 annual allowance drained in a single day of ordinary work. The company apologised within days and offered refunds. But the lesson had already landed: the price they’d built their workflow around was never the real price. It was an introductory one.
That isn’t a Cursor problem. It’s the shape of the whole market — and you can read it straight off the frontier labs’ own numbers.
OpenAI, the most-used AI company on earth, spends roughly $1.69 for every dollar of revenue it brings in. It lost something on the order of $9 billion in 2025 against about $13 billion in sales, and it tells its own investors to expect losses growing into the tens of billions — a projected $74 billion operating loss in 2028 alone — before it turns a profit near the end of the decade. Most of that cost is inference: the actual compute burned every time you or your team send a prompt. The company prices that compute below what it costs to serve, on purpose, to win the market before the bill comes due.
So when your finance team looks at a $4,000-a-month API line and calls it the cost of AI, they’re reading a subsidised number. What you’re paying doesn’t cover what the model costs to run. The gap is being funded by investors who expect to be repaid — and the repayment is you, later, at a price set once the land grab is over.
This is the part of the AI conversation business leaders aren’t yet having out in the open. So let’s have it.
The economics you’re building on aren’t economics yet
Every frontier model you can call right now — Anthropic’s Claude line, OpenAI’s GPT line, Google’s Gemini line — is being sold at a price that doesn’t reflect the cost of producing it. The cost is in the GPUs, the data centres, the power, the cooling, the engineers, the training runs, the safety teams. The price is whatever number gets you to keep using it.
That gap is being closed by investors. Not by you.
You can see the shape of it in the capex numbers. The hyperscalers — Microsoft, Google, Amazon, Meta — are spending more on AI infrastructure per quarter than most countries spend on defence. Stargate is a quarter-trillion-dollar number. Nvidia’s revenue chart looks like a hockey stick. Microsoft writes the cheque, Nvidia ships the chips, OpenAI uses the chips, the money circles back. It is a great party. Somebody has to pay for it.
The investors are patient because they’re not investing in margins. They’re investing in market share. The model that becomes the default tool inside the world’s largest companies before the music stops gets to dictate the price afterwards. That’s the whole game.
And the bait is working. Per-token prices have fallen roughly tenfold per model generation for the last three years. You’ve been trained — and so has your finance team — to assume that AI gets cheaper. That assumption is structural. It’s also wrong.
It’s cheaper now because someone is choosing to make it cheaper. They’ll choose differently when they’re answerable to a different room.
This story has been told before. Several times.
The 2010s ran the same play in four different industries.
Uber. Subsidised rides from 2009 through to the IPO in May 2019. Investors burned through tens of billions to put a cheap car in front of every commuter. Cities reorganised around it. Commuters stopped owning cars. Within eighteen months of the IPO, average fares on major routes were up thirty to ninety percent, surge multipliers became the default rather than the exception, and driver pay dropped. The people who had restructured their lives around the subsidised price absorbed the difference.
MoviePass. Unlimited cinema for $9.95 a month from 2017. Three million subscribers at peak. Investors burned around three hundred million dollars keeping the offer alive. By 2019 it was rationed, then dead. Customers who’d built their movie-going habit around it had no second tier to fall back to.
WeWork. A decade of subsidised office space, growing occupancy by underpricing flexible-lease real estate. The IPO collapse in 2019 forced rationalisation. Tenants who’d built location strategies on cheap flex space lost the flex, the price, or both.
Twitter’s API. Effectively free or near-free from 2006 to early 2023. An entire ecosystem of dependent apps, research workflows, and academic infrastructure built on it. In February 2023 the minimum paid tier became $42,000 a year. Most of the dependent ecosystem died inside six months. The people running it weren’t unsophisticated. They just hadn’t priced the risk of the platform changing its mind.
I can hear you say though: “Mike, that’s not including AWS.”
Cloud compute got cheaper after maturity, not more expensive. But it got cheaper for two specific reasons. The underlying unit cost of compute kept falling, and the competition stayed live — Azure and Google Cloud kept Amazon honest. Durable cheapness requires both: falling input costs and a competitive field. AI inference might one day look like AWS. But that’s the destination, not the transition. The transition is where the rug pull lives.
The pattern across all of them is the same. Subsidise to capture share. Reach a liquidity event. Extract margin. Every operator who had built their cost structure on the subsidised price discovered, after the fact, that they had been running a workflow they couldn’t afford.
All the labs are pointed at the same exit
OpenAI is still loss-making on inference at scale. Anthropic is raising at numbers that only make sense if you assume they get to set the price in a few years’ time. Google is using Gemini partly as a moat for Search and Workspace, partly as a hedge (it borrowed over $20 billion in early 2026 to pay for it, so it can keep Gemini cheap without needing an IPO the way its rivals do). Microsoft is embedding Copilot into Office not because Copilot pays today but because it’ll be hard to unembed later. xAI is burning money to stay in the conversation.
All of them are signalling some kind of public-market path. The exact form varies — direct IPO, structured offering, partial spin-out — but the direction is the same. Each one is moving towards the moment where a much larger room starts asking when the margin shows up.
The enterprise contracts being written right now reflect this. Land-and-expand pricing. Cheap to start. Generous on commitments. Light on switching-cost language. Designed to make replacement painful inside eighteen months, regardless of what the renewal looks like. The labs aren’t doing anything sinister. They are running the playbook every infrastructure provider has ever run. You are the proof point. After the IPO, you become the cash flow.
What the rug pull looks like in practice
Not one event. A sequence. Each step individually defensible.
Per-token prices on the cheapest models stop falling, then drift up. Rate limits tighten on the lower tiers. The model you’ve been calling “frontier” gets renamed and the new frontier gets gated behind a higher tier. The free tier shrinks, or it stays the same nominal size and quietly degrades. Premium capabilities (long context, deep reasoning, native tool use) get bundled out of the base price and into a more expensive bundle. Enterprise contracts at renewal carry two-to-four-times increases, because the rebuild-around-something-else clock is shorter than the renewal clock.
None of these moves, on its own, looks like a betrayal. They look like the normal behaviour of a maturing platform. Cumulatively, on a workflow that was costed at today’s prices, they double or triple the run-rate cost inside eighteen to thirty-six months. The maths in your business case stops working without anyone in the room ever saying the words “rug pull.”
The trap most teams are walking into
Switching cost is the silent partner in every AI deployment. It builds quietly, before anyone notices it’s being built.
Prompt libraries get written against one model’s quirks — the way it handles instructions, the way it formats output, the way it avoids certain refusals. RAG architectures hard-wire one vendor’s embedding model. Fine-tunes lock you to a specific base. Internal training rolls out around one chat interface. Habit forms around one keyboard shortcut. Three months in, your team isn’t using “AI.” They’re using one specific company’s product, in one specific way.
The cost of switching grows with every workflow. By the time pricing changes, the cost of leaving has overtaken the cost of paying. That isn’t an accident. It’s the design.
This is the same trap that ERP set for the corporate world in the 1990s. Same trap Salesforce set in the 2000s. Same trap AWS set in the 2010s. The mechanism is identical. What’s different this time is the gradient of the subsidy and the speed at which the lock-in forms. The runway from “trying it” to “structurally dependent on it” is measured in months, not years.
What a clear-eyed operator does about it right now
Five moves. None of them require slowing down your AI program. All of them buy you optionality that’s nearly free today and expensive to retrofit later.
The first is to cost the work at three times today’s per-token price. Not as a forecast — as a stress test. Plug the number in, rerun the business case. If it still pays for itself at three-times pricing, you have a real workflow. If it doesn’t, you have a workflow that depends on a subsidy that won’t last. That’s not a reason to kill it. It’s a reason to know.
The second is to track per-workflow token consumption, not just total spend. Total spend is the number your CFO sees and your CFO can’t act on. Per-workflow spend is where the levers are. When prices move, you’ll want to know which three workflows are eating half the bill and which ten are pulling their weight. Build that visibility now, before you need it.
The third is to put an abstraction layer between your code and the model. Even a thin one. A single switching function. A config file with the model name in it instead of hard-coded calls. The goal isn’t to be model-agnostic in some philosophical sense. The goal is for the act of changing vendor to be a configuration change rather than a project.
The fourth is to keep at least one live workflow running on a second vendor’s model. Not for redundancy. For knowledge. The day you need to know what it costs to leave is the worst possible day to find out. Run the experiment now, while it’s cheap and the team has slack to do it badly.
The fifth is to renegotiate your enterprise contracts while you still have leverage in the room. The labs need land-and-expand customers right now to prove the market. Use that. Lock in pricing terms for longer than the vendor wants to. Insert caps on renewal increases. Put portability commitments into the contract — not because they’ll honour them perfectly, but because the act of writing them changes what’s normal at renewal time.
Pricing power flips at the IPO. Not before. You have a window. It is measured in quarters, not years.
The sixth move: own the model
Every one of those five moves is a hedge inside the rental model. You’re still renting frontier inference; you’re just renting it more carefully. There’s a sixth posture, and it’s the one the labs would rather you didn’t dwell on: for the workloads that matter most, stop renting and start owning.
Open-weight models — the ones you can download and run on hardware you control — have closed most of the gap to the frontier. You don’t call them over someone else’s API. You hold the weights. And weights don’t get renamed, deprecated, gated behind a higher tier, or repriced at renewal. Once they’re on your infrastructure, the lab no longer has a lever to pull on you, because you stopped being a line in their revenue forecast.
The cost structure flips. Per-token pricing is a variable cost on someone else’s meter, moving in a direction you don’t control. Self-hosted inference is a fixed infrastructure cost on a meter you own. At low volume, renting wins easily — that’s why everyone starts there. But at the volume of an embedded, always-on workflow, the maths inverts. The $4,200-a-month workflow from the start of this piece may run on a server you control for a known, flat number that doesn’t double the day the IPO prices.
Then there’s function. When you own the model, nobody silently degrades the free tier underneath you, retires the version your prompt library was tuned against, or decides your use case now needs a refusal. You fine-tune when you want, on your data, against your quirks. And the data never leaves your environment — which for anyone operating under real regulatory weight (health and safety, workers comp, aged care, anything with a privacy regime attached) isn’t a nice-to-have. It’s frequently the difference between a workflow you’re allowed to run and one you’re not.
Be honest about the cost of this, or it isn’t advice. Open models still trail the frontier — call it six to eighteen months behind on the hardest reasoning. You need people who can stand up and keep the infrastructure running; that capability isn’t free and it isn’t instant. And not every workload earns its keep in-house. So this isn’t all-or-nothing. The clear-eyed version is a split: identify the workflows that are stable, high-volume, and sensitive, and bring those onto models you own — while you keep renting the frontier for the genuinely hard, low-volume edge cases where the gap still matters.
The point is leverage. When the sequence in this piece plays out — prices drift up, tiers get gated, renewals carry their two-to-four-times increase — the operator who owns inference for their core workflows doesn’t open a renewal letter with a knot in their stomach. There’s no letter. They already left the room.
The number to run tomorrow
If every per-token cost in your AI stack tripled, would the program still pay for itself?
If you can’t answer that question in an afternoon, you have your answer.
Additional facts
After this piece went out, the research firm SemiAnalysis put a number on the subsidy at the subscription level — the level most teams actually buy at. It bought every Anthropic and OpenAI tier and ran long-horizon coding tasks against each until it hit the usage ceiling, then priced the consumed tokens at the labs’ own API rates.
- A $200-a-month Claude Max plan, run at its limit, consumes up to roughly $8,000 of API-rate tokens a month.
- The same $200 ChatGPT Pro plan: up to roughly $14,000 a month.
- The earlier rule of thumb had a $200 plan worth about $2,000 in tokens. The real ceiling is several times that — the subsidy is larger than even the sceptics assumed.
- These are ceiling figures — what a power user extracts hammering the plan to its limit, not what an average seat pulls. The gap is still the point: whether you buy AI as a per-token API bill or a flat monthly seat, you pay a fraction of what the compute costs to serve. The seat just hides it better.
Source: SemiAnalysis subscription tokenomics analysis, 2026.