Nobody is watching the AI bill

There is a story circulating in AI circles about a company that allegedly ran up a $500 million invoice from Anthropic in a single month. The charge came from Claude Code, the agentic coding assistant, running without spending limits in place. Whether the final number was negotiated down or the story has been embellished in the retelling is beside the point. The point is that it is entirely plausible. And that is the problem.

Most teams deploying AI right now are doing it fast. That is not wrong. Speed matters. But speed without a cost model attached to it is just a way to lose money at scale. And the agencies and product teams I speak to are, more often than not, watching the output and ignoring the meter.

The part nobody budgeted for

When a company runs a SaaS tool, someone in finance knows what it costs per seat. The number is on a contract. It renews annually. There is a line in the spreadsheet.

AI API usage does not work like that. It is consumption-based, it scales with activity, and it compounds when you attach agents to it. An agent that loops, retries, or runs in parallel does not send you a warning. It just runs. Claude, GPT-4o, Gemini, all of them bill per token. A coding agent working through a large codebase can burn tokens faster than any human engineer reading the same files.

The math is not complicated. What is complicated is that most deployments happen before anyone has done the math. A developer ships a feature. The feature works. The feature scales. The bill arrives four weeks later and nobody recognises the number.

This is not a technology failure. It is a governance gap.

Speed matters in AI deployment. But speed without a cost model is just a way to lose money at scale.
Max Pinas, founder, Studio Hyra

What agencies get wrong first

In an agency context the risk is specific. You are often deploying on behalf of clients, or building internal AI capability to serve more clients faster. Both situations create the same structural problem: the person who made the deployment decision is not the person who sees the invoice.

I have seen three failure modes repeat themselves.

The first is the prototype that graduated. Someone builds a quick AI feature to show a client. The demo lands well, the client asks to keep it running, and the prototype goes to production without any of the scaffolding a production system needs. No spend caps. No monitoring. No alerting. Just a live API key and optimism.

The second is the agent nobody reined in. Agentic workflows are genuinely useful. They are also genuinely expensive when they go wrong. A loop that calls an LLM ten times per task, running across a few hundred tasks per day, will produce a bill that looks nothing like the cost estimate from the week the agent was scoped.

The third is the shared key. One API key used across multiple projects, multiple clients, multiple environments. When the bill arrives, nobody can tell you which project generated which cost. You cannot cut what you cannot see.

The controls are not exotic

None of the fixes here require a dedicated platform team or a six-figure observability contract. They require discipline, and someone whose job it is to care.

Spend limits exist on every major AI platform. Anthropic, OpenAI, and Google all offer hard caps and soft alerts at the account level. Set them before you deploy, not after the first invoice. If your billing threshold needs to be "no limit" for a prototype, that prototype is not ready to be deployed.

Separate keys per project, per client, per environment. This sounds obvious. It is not consistently done. One key per deployment means one cost signal per deployment. That is the minimum unit of visibility you need to manage anything.

Build token usage into your scoping. When you estimate the cost of an AI feature, work backwards from the token count. How many calls per user session? How many sessions per day? What is the average prompt length? What does the model charge per million input and output tokens? These are not hard numbers to find. The providers publish their pricing. The work is to do the multiplication before you ship, not after.

Log what runs. If you are using an orchestration layer like LangChain, LlamaIndex, or a custom setup, make sure token counts and latency are captured at the call level. Aggregate them daily. A cost graph that spikes on a Tuesday tells you something happened on Tuesday. Without the graph, you find out when Anthropic does.

If your billing threshold needs to be 'no limit' for a prototype, that prototype is not ready to be deployed.
Max Pinas, founder, Studio Hyra

Who owns this

The honest answer is that right now, often nobody does. AI deployment has outpaced the organisational structures that would normally govern it. In most agencies, there is no AI ops function. There is a developer who is enthusiastic about LLMs and a client who is enthusiastic about the results. That is a fine way to start. It is not a fine way to run.

The role that needs to exist, formally or informally, is someone who asks two questions before anything goes live. First: what does this cost at ten times the expected load? Second: what triggers an alert or a hard stop if that load is reached?

Those two questions do not require a new hire. They require a decision about who is responsible. In a small agency that might be the technical lead. In a larger one it might be a delivery manager or a principal engineer. The title does not matter. The accountability does.

There is a broader point here about how AI work gets sold and delivered. If you are quoting a fixed fee for a project that includes LLM calls, you are taking on the margin risk of every token that runs. That risk needs to be modelled, capped, and either priced into the engagement or passed through to the client with transparent usage reporting. Neither option is complicated. Both require the conversation to happen before the contract is signed.

The value question

I want to be clear that none of this is an argument against moving fast with AI. The teams doing interesting work right now are the ones who have shipped, learned, and iterated. Caution for its own sake is just slow failure.

But cost discipline is not caution. It is the thing that lets you keep shipping. A team that burns its AI budget in month one on an uncapped prototype cannot run experiments in month three. A client that gets an unexpected invoice does not come back for the next engagement.

The $500 million story, real or embellished, is useful because it is extreme enough to make the point clearly. You do not have to get anywhere near that number for ungoverned AI spend to cause real damage to a project, a client relationship, or a studio's finances.

The tools to prevent it are already in your providers' dashboards. The discipline to use them is a choice. Make it before you deploy.

Nobody is watching the AI bill

The part nobody budgeted for

What agencies get wrong first

The controls are not exotic

Who owns this

The value question

Keep reading.

Europe is not caught between two powers. It is being squeezed out by both.

AI ran the ransomware attack. Now figure out who owns that.

Momentum starts with a conversation.

Keep reading.

Europe is not caught between two powers. It is being squeezed out by both.

AI ran the ransomware attack. Now figure out who owns that.