Three things that matter in the second half of 2026
After working with production agent systems across several client tracks, three disciplines separate the ones that hold from the ones that quietly get switched off.
1. A measurement framework per agent, not per product
The default metric for most shipped AI features is adoption. How many users opened it. How many sessions. How many seats activated. That is a product metric, not an agent metric. It tells you nothing about whether the agent is doing what you built it to do.
Useful metrics live one level down. Task success rate. did the agent complete the assigned task without a human having to intervene or redo it? Tool call accuracy: when the agent reached for a function or an API, did it call the right one with the right arguments? Cost per outcome: not cost per token, not cost per session, but cost per unit of actual value delivered.
These are harder to instrument, especially if your agent was scaffolded quickly for a demo. But they are the only numbers that tell you whether the system is earning its place in the stack. Pick two or three before you go live. You can always add more later. Starting with none means you are flying without instruments.
2. Identity, audit logs, rollback, and human override are the floor, not the ceiling
Every production agent needs to know who it is acting on behalf of, leave a trace of what it did, be reversible when it acts on bad data, and have a clear path for a human to step in and take over.
Those four things are not a compliance checklist. They are the mechanical properties that make an agentic system safe to run at scale. Without them, a single bad run can corrupt state, charge a customer incorrectly, delete something irreversible, or trigger a downstream process that takes days to unwind.
The pushback I usually hear is that adding this infrastructure slows the team down. It does, slightly, the first time. By the third agent it is a two-hour setup because the patterns are already in place. The cost of retrofitting them after an incident is orders of magnitude higher.
This is the same logic that made version control non-negotiable for code. No one argues about it anymore. Audit logs and rollback for agents will get there too. The teams building now who treat these as optional are simply borrowing time.
3. ROI at the outcome level, not the tool level
This is where a lot of the business case collapses quietly. A team builds an agent, measures the cost of running it (compute, API calls, engineering time), compares it against the license cost of the SaaS tool it replaced, and calls it a win.
But the tool cost was never the real cost. The real cost was the time a person spent doing a task that produced a result. The right question is: does the agent produce that result faster, more accurately, and with fewer downstream corrections? That is an outcome. That is where the ROI lives.
Measuring at the tool level is comfortable because the numbers are easy to pull. Measuring at the outcome level requires you to define what a good result looks like, which forces a conversation about quality that many teams are not ready to have. Have that conversation before you ship. It makes everything else sharper.