Abhishek Gandotra is VP of Product at American Express. He is the author of The Sovereign Agent: A Manifesto for the Inference Economy.
To maintain your company’s edge, measure human judgment and decision-making regarding AI outputs.
Every enterprise dashboard I’ve seen in 2026 tracks the same AI metrics: adoption rate, time saved, tickets automated, cost per inference. These numbers are going up and to the right at almost every company that has deployed agentic AI at scale.
They are also the wrong numbers.
I’ve spent 20 years building platforms in financial services—first at Green Dot, where we powered Apple Pay Cash and served millions of accounts, and now at American Express. In both environments, I’ve watched organizations measure the thing that’s easy to count and miss the thing that actually matters. AI adoption is no different. The metrics that feel like progress are masking a convergence problem that will show up in 18 months as a competitiveness crisis—and by then, the damage will be structural.
Here’s what I mean, and what to do about it.
The Convergence Trap
When an entire team adopts the same AI model with the same default settings, their output starts to look the same, converging not toward excellence, but toward the model’s median. Marketing emails start sounding identical. Engineering design documents propose the same architectures. Strategic analyses reach the same conclusions. This is not a failure of the people. It is the predictable result of a system where 12 professionals are all polishing the same model’s default output.
From the outside, productivity is up. From the inside, differentiation is gone.
I’ve seen this pattern play out in content marketing, legal document preparation, financial reporting and software quality assurance—industries where AI-assisted output scaled the fastest. The firms that maintained competitive differentiation had one thing in common: they measured quality of intent, not just output volume.
Three Metrics That Actually Matter
1. Human-Originated Decision Rate
What percentage of your team’s output was directed by specific human intent before the model was engaged and how much was generated by the model and then edited? There is a material difference between “I specified what I needed and the AI drafted it” and “the AI produced something and I cleaned it up.” The first leverages the technology for intentional reasons. The second is dependence on the tech. Most organizations cannot currently distinguish between the two, and that inability is a strategic blind spot.
2. Output Divergence Across The Team
If five people on the same team produce work on the same topic using the same tools, how different are the results? In a healthy AI-augmented workflow, the outputs should reflect the distinct judgment and domain expertise of each individual. If the outputs are converging—if you could swap the bylines and nobody would notice—the team has collapsed into what I call model-median production. The humans are adding formatting, not value.
3. Refusal Rate
How often does someone on the team reject model output and substitute their own judgment? A refusal rate of zero is not a sign of good AI tools. It is a sign that nobody on the team is exercising judgment. The most valuable AI-augmented workers I’ve observed are the ones who use the tools aggressively and override them selectively. They know when the model’s answer is good enough and when “good enough” isn’t good enough for this particular decision.
The Taxonomy Shift Happening Under Your Metrics
The traditional knowledge worker did one thing: produce output. Write the code, draft the memo, build the model, send the email. AI has compressed that production work so dramatically that paying a human for pure production is becoming economically irrational.
What’s emerging is a three-tier structure. At the base, production-level work—the doer role—is being absorbed by AI at an accelerating rate. In the middle, judgment-on-output work—what I call the editor role—is where most knowledge workers have landed, often without realizing it. They review AI drafts, they refine AI code, they approve AI analyses. They are more productive than ever and less differentiated than ever.
At the top, a third role is expanding: the person who decides what work should be done at all. Not the person who writes the design document, but the person who decides whether the design document should exist. Not the person who builds the feature, but the person who kills the feature. This is the architect role, and it is defined not by technical skill but by the willingness to make decisions with real consequences—and to own the outcome when those decisions are wrong.
The companies that will lead in the next two years are the ones whose leadership teams understand this taxonomy and are actively managing the migration from doer to editor to architect across their workforce.
What This Means For Leadership
If you are running a team or a business unit, here is the practical implication: stop celebrating adoption metrics. Your competitors have the same tools, the same models, the same APIs. Adoption is table stakes.
Start measuring judgment. Start tracking who on your team specifies intent before engaging the model versus who reacts to model output. Instead of asking “Did we use AI?” in every project review, ask: “Where did a human override the AI, and why?”
The organizations that will maintain competitive advantage in the inference economy are not the ones that adopted AI fastest. They are the ones where humans retained the capacity—and the organizational permission—to say, “The model is wrong, and here’s what we should do instead.”
That capacity is a muscle. If you don’t measure it, you can’t train it. And if you don’t train it, you’re building your company’s future on rented land.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?











