Technology6 min read

When you cut a system prompt by 80 percent, what were the other 80 percent doing?

July 4, 2026

Anthropic recently disclosed that they reduced Claude Code's system prompt by roughly 80 percent. The model's behaviour improved.

That single data point is worth sitting with. Not because it's counterintuitive, though it is. But because it gives us a rare primary-source window into how frontier model behaviour is actually shaped, and what most teams get wrong when they try to do the same thing.

At Studio Hyra, we spend a lot of time inside this problem. We build systems where language models take consequential actions: generating content, routing decisions, talking to customers. Getting the model to behave well is the craft. And what Anthropic's disclosure confirms is something we've learned the hard way: most of what people write in a system prompt is not instruction. It's anxiety.

A large, abstract, organic-looking sculpture composed of interconnected rounded geometric shapes.

The 80 percent that wasn't working

Here's what typically fills a long system prompt. Rules written in response to one bad output. Edge-case guards that contradict earlier guards. Tone instructions that fight the model's natural register. Prohibitions stacked on top of prohibitions, each one added after someone filed a complaint.

It accumulates the way technical debt does. Nobody designs a 4,000-token system prompt. They inherit one.

The problem is structural. When you add a constraint to stop one behaviour, you rarely test what else it shifts. A rule that says 'never suggest alternatives' might be there because the model once suggested a bad one. But that rule now suppresses genuinely useful suggestions in every other context. You've traded a narrow fix for a broad regression, and you won't notice until a user complains about something completely different.

This is what the other 80 percent was doing. Not guiding the model. Confining it. And in the confining, producing the kind of stilted, over-hedged, weirdly reluctant outputs that make people say the model 'doesn't feel right' without being able to name why.

Most of what people write in a system prompt is not instruction. It's anxiety.
Max Pinas, founder, Studio Hyra

Context does what rules can't

What Anthropic replaced those rules with, broadly, is context. Not 'do not do X' but 'here is what you are, here is who you're talking to, here is what success looks like in this situation.'

This distinction matters more than it sounds. Rules operate on surface patterns. Context operates on intent. A model that understands the intent of a situation will handle edge cases that no rule anticipated. A model that only has rules will fail the moment reality drifts slightly outside the scenarios someone foresaw.

Claude's model specification, which Anthropic has made public, is a good illustration of this. It's not a list of banned behaviours. It's a coherent account of values, priorities, and reasoning. The model is given something to think with, not just a set of fences to stay inside. When you read it, you notice how much of it is explanatory. It doesn't just say what the model should do. It says why, and what to do when two good things are in tension.

That approach scales. A rule list doesn't. The world generates new situations faster than anyone can add rules.

An aerial view of an abstract city grid made of various colorful geometric blocks.

What this means if you're building on top of these models

Most teams using models via API treat the system prompt as a control panel. Flip a switch here, add a restriction there. It feels like engineering. It isn't.

A system prompt is more like a brief. The question it should answer is: what does this model need to understand about its situation to make good decisions on its own? Not: what am I afraid it will do?

In practice, that means three things.

Start with identity, not rules. Who is this model in this context? What is it for? What kind of person would find it useful? A clear answer to those questions does more work than a page of restrictions.

Write for the case you haven't thought of. Your edge cases will outnumber your foreseen cases within a month of launch. The only way to handle that is to give the model enough understanding of purpose that it can reason about novel situations. Rules break at the edges. Purpose doesn't.

Audit for anxiety. When you review your system prompt, for each line ask: am I writing this because it helps the model do its job, or because something once went wrong and I'm still nervous about it? Both are valid starting points. Only one of them belongs in the final prompt.

This last one is harder than it sounds. The anxiety lines often feel like the important lines. They're specific, they're concrete, they feel like they're doing something. They usually aren't.

The contrarian point

Here's where I want to push back on one reading of this story.

The lesson is not 'system prompts should be short.' That's the wrong takeaway and it will lead you to a different kind of failure. A model with no context isn't liberated. It's just unmoored. It will fill the vacuum with defaults, and defaults are averages.

The lesson is that length is a symptom, not a cause. A short, well-crafted system prompt is the output of having thought clearly about what you actually need. A long, cluttered one is the output of not having thought clearly and compensating with volume.

Anthropics's 80 percent cut wasn't a minimalism exercise. It was a clarity exercise. They got clearer about what Claude Code is and what it's for, and discovered that most of the previous instructions were either redundant given that clarity, or actively working against it.

That's the process worth borrowing. Not the number.

Several floating, colorful, geometric forms orbiting a central, glowing, ethereal sphere.

Length is a symptom, not a cause. A short system prompt is the output of having thought clearly. A long one is the output of compensating with volume.
Max Pinas, founder, Studio Hyra

What we do with this at Studio Hyra

When we audit a system prompt for a client, the first thing we do is categorise every line by its function. Some lines give identity. Some give context about the user. Some describe success. Some describe failure modes. And some exist purely because someone was scared.

Then we ask. could this scary line be replaced by a positive statement of intent? Almost always, the answer is yes. 'Don't give financial advice' becomes 'This assistant helps with product decisions, not financial planning.' The model understands both. But only one of them gives it something to reason with when the conversation goes somewhere unexpected.

This is the craft of working with frontier models. Not prompting them harder. Understanding what they need to do good work, and giving them that, in as few words as it takes.

Anthropics's disclosure is a useful reminder that even the people who built these models are still learning this. The gap between writing instructions and shaping behaviour is real, and it doesn't close just because you have access to the weights.

If your AI system doesn't behave the way you want, the first question isn't 'what rule am I missing?' It's 'what does the model not understand about its situation?' That reframe, more than any individual technique, is what separates teams that ship working AI products from teams that keep patching prompts.

Start there.

Ready when you are

Momentum starts with a conversation.

No forms, no intake. Just a real conversation with the people who do the work.

Book a callBook a call

What this means if you're building on top of these models

Most teams using models via API treat the system prompt as a control panel. Flip a switch here, add a restriction there. It feels like engineering. It isn't.

In practice, that means three things.

This last one is harder than it sounds. The anxiety lines often feel like the important lines. They're specific, they're concrete, they feel like they're doing something. They usually aren't.

When you cut a system prompt by 80 percent, what were the other 80 percent doing?

The 80 percent that wasn't working

Context does what rules can't

What this means if you're building on top of these models

The contrarian point

What we do with this at Studio Hyra

Momentum starts with a conversation.

When you cut a system prompt by 80 percent, what were the other 80 percent doing?

The 80 percent that wasn't working

Context does what rules can't

What this means if you're building on top of these models

The contrarian point

What we do with this at Studio Hyra

Keep reading.

Nvidia is acting like a central bank for AI startups

Inference costs halved. So why are AI budgets still growing?

Momentum starts with a conversation.

Keep reading.

Nvidia is acting like a central bank for AI startups

Inference costs halved. So why are AI budgets still growing?