When AI writes most of the code, judgment becomes the job

Anthropic recently shared a number that stopped a lot of people mid-scroll: Claude now writes more than 80 percent of the production code that ships from their own engineering teams. Their engineers are shipping roughly eight times more per day than they were in 2024. Those are not projections. That is what is happening inside one of the most careful AI labs in the world, right now.

The instinct in most agency and product conversations is to celebrate that number. Faster output. Lower cost per feature. More runway from the same headcount. All of that is real. But the more interesting question is the one nobody is asking at their sprint planning: when the system writing the code is also improving the system that writes the code, who is actually making the decisions?

Microscopic cellular structures intertwined within a precise geometric framework, on a soft gradient background.

The loop nobody drew on the whiteboard

There is a specific dynamic here that is worth naming precisely, because the vague version of it produces vague thinking.

A developer using an AI coding assistant is still the author. They set the goal, review the output, push the commit, own the consequence. The model is a very fast, very capable instrument. This is the world most product teams think they are living in.

But when the model is generating the majority of its own infrastructure, its own test suites, and its own tooling, the relationship changes. The engineer is no longer authoring. They are supervising. And supervision at eight times the previous velocity means each review decision carries more weight, not less, because the blast radius of a missed judgment call is larger.

This is not a hypothetical. Anthropic is publicly wrestling with whether self-improving AI warrants a coordinated global development pause. That they are even raising the question tells you the loop is real and the people closest to it are not dismissing it.

Faster output is not the same as better judgment. Someone still has to own the decision. The question is whether your process is designed for that, or just for speed.
Max Pinas, Studio Hyra

What this means for agencies and product teams

Most agencies are not Anthropic. You are not building frontier models. But the same structural question applies at a smaller scale, and it is arriving faster than most teams have noticed.

If your developers are using AI to write most of the code, your actual scarce resource has shifted. It is no longer typing speed or syntax recall. It is taste, judgment, and the ability to catch a plausible-looking wrong answer before it lands in production. Those are not skills that come from prompt engineering courses. They come from years of debugging things that should have worked and did not.

The teams that will be in trouble are the ones that optimised for throughput without investing in the judgment layer. They will ship faster, accumulate invisible debt, and eventually hit a wall where no one on the team can explain why the system behaves the way it does. The teams that will be fine are the ones that treated the speed gain as an opportunity to do harder thinking, not less thinking.

There is also a client conversation that agencies need to get ahead of. When a client asks how long a feature will take, and the honest answer is "two days with AI assistance," the question that follows is: "Then why does it cost what it costs?" The answer is not the typing. It was never the typing. The answer is the scoping, the architecture decision, the choice of what not to build, and the review that catches the model's confident mistake on line 340. If you cannot articulate that, you will lose the argument.

Stylized neural network with glowing nodes and intricate connections, against a soft gradient.

The governance gap nobody is filling

Anthropics' internal debate about pausing self-improving AI development is a governance question at a civilisational scale. Your agency's version of the same question is more modest but structurally identical: who has the authority to say no, and at what point in the process?

Most teams have not answered this. They have adopted AI coding tools because the productivity gains are obvious, and they have assumed that existing code review processes will catch what needs catching. That assumption is under pressure.

Code review was designed for human-paced output. When a pull request contains 800 lines written in 40 minutes, the review is not happening with the same depth it would for 800 lines written over two days. The reviewer knows less about the reasoning behind each choice, because there was no visible reasoning process to observe. The model does not leave notes about what it considered and rejected.

A few things actually help here. First, shrink the review unit. Smaller PRs reviewed more often beat large PRs reviewed quickly. This requires slowing down the commit cadence, which feels counterintuitive when the whole point was to go faster. Do it anyway. Second, make the engineer who prompted the output explain the approach out loud before the review. Not the code, the approach. If they cannot, the code should not ship. Third, build explicit decision logs for anything architectural. Not a document nobody reads, a short record of what was considered, what was chosen, and why. AI can help write those too, but a human needs to confirm they are accurate.

Abstract data visualization with interconnected floating geometric and organic forms.

The model does not leave notes about what it considered and rejected. That is the part your review process was never built for.
Max Pinas, Studio Hyra

The question worth asking at your next retrospective

Not "how much are we shipping" but "how well do we understand what we shipped?"

Those two things have always been related. Now they are decoupled. And the teams and studios that notice the gap first are the ones who get to set the terms of how this works, rather than inheriting someone else's answer.

At Studio Hyra, we have been working through this for clients in product and agency contexts over the past year. The pattern we keep seeing is that the bottleneck is never the model's capability. It is the human process around it. The model can write the code. The question is whether your team is structured to own it.

That ownership question does not get solved by a better prompt. It gets solved by being honest about what your team actually understands, building processes that surface confusion early, and treating judgment as the skill worth investing in now that raw output is cheap.

Eighty percent of code written by the model. Eight times the daily shipping rate. Those numbers will keep moving. The judgment layer is the one variable that does not scale automatically.

When AI writes most of the code, judgment becomes the job

The loop nobody drew on the whiteboard

What this means for agencies and product teams

The governance gap nobody is filling

The question worth asking at your next retrospective

Keep reading.

Meta let an algorithm pick who got fired. A lawsuit wants to see inside it.

xAI uploaded your SSH keys. That ends the trust conversation.

Momentum starts with a conversation.

Keep reading.

Meta let an algorithm pick who got fired. A lawsuit wants to see inside it.

xAI uploaded your SSH keys. That ends the trust conversation.