The controls are not exotic
None of the fixes here require a dedicated platform team or a six-figure observability contract. They require discipline, and someone whose job it is to care.
Spend limits exist on every major AI platform. Anthropic, OpenAI, and Google all offer hard caps and soft alerts at the account level. Set them before you deploy, not after the first invoice. If your billing threshold needs to be "no limit" for a prototype, that prototype is not ready to be deployed.
Separate keys per project, per client, per environment. This sounds obvious. It is not consistently done. One key per deployment means one cost signal per deployment. That is the minimum unit of visibility you need to manage anything.
Build token usage into your scoping. When you estimate the cost of an AI feature, work backwards from the token count. How many calls per user session? How many sessions per day? What is the average prompt length? What does the model charge per million input and output tokens? These are not hard numbers to find. The providers publish their pricing. The work is to do the multiplication before you ship, not after.
Log what runs. If you are using an orchestration layer like LangChain, LlamaIndex, or a custom setup, make sure token counts and latency are captured at the call level. Aggregate them daily. A cost graph that spikes on a Tuesday tells you something happened on Tuesday. Without the graph, you find out when Anthropic does.