Kuan Archer, Founder and Principal, Archer Innovative Solutions Group LLC.
The AI bill has arrived. It may be larger than anyone budgeted. Token consumption has increased due to AI adoption and the rise of agentic AI. This conversation is playing out in leadership meetings across nearly every industry, typically about six months after the costs have already gotten ahead of the organization. Executives are grappling with an AI cost problem.
Most organizations treat AI spending like a SaaS subscription. It is actually a variable cost driven by workflow design and architectural choices made months before anyone looked at the bill. Here is a more disciplined breakdown for evaluating cost:
1. Consumption pattern is the most important budget variable.
The most useful starting point is classifying every use case by its consumption pattern at design time, before the cost structure has set itself. For instance:
• Continuous workloads are always on, always ingesting. Real-time monitoring and live customer interaction systems are the most common examples. These are the largest token consumers and typically can have the highest budget-risk category.
• Discrete workloads are triggered by user actions. A developer requesting a code review or an analyst pulling a document summary are typical examples. Spend scales with adoption behavior, making it more predictable but susceptible to budget surprises when rollout exceeds expectations.
• Batch workloads run on schedules or volume thresholds. This might include nightly contract processing and periodic knowledge base refreshes. The most controllable category, and typically the last to receive serious budget scrutiny.
In my experience, most organizations skip this classification entirely, deploying first and building a budget framework only after the usage patterns are already locked in.
2. High sophistication and strong ROI are different things.
A persistent misunderstanding in enterprise AI planning is that use case sophistication and ROI are correlated. The relationship can be weaker than most planning assumptions acknowledge.
A simple fraud-detection classifier may score low on sophistication but deliver exceptional ROI, given that every dollar of fraud prevented goes directly to the bottom line. A sophisticated multi-agent research assistant might score much higher on complexity while delivering modest returns, because the work it replaces is lower value per task.
The better question is what failure costs. For operational decisions made in real time, that number is almost always the larger one.
3. Nobody should be running a single model for everything.
A common pattern I see in early enterprise AI deployments is defaulting to a single frontier model across the board. The reasoning is understandable: simpler integration, fewer architectural decisions upfront. The cost, however, usually ends up substantially higher than necessary.
A well-designed stack distributes work across model tiers based on what each task actually requires. Fine-tuned domain models handle structured pattern recognition at lower cost, while frontier models are reserved for open-ended reasoning where quality at the ceiling genuinely matters. A well-scoped retrieval layer handles knowledge lookup without burdening either. Each layer carries a meaningfully different cost-per-token profile, and the decisions made here are budget decisions as much as they are technical ones.
Using a frontier model for tasks a lightweight model could handle is a potential computational waste that compounds with scale. Building model-task matching into the architecture from the start is where we see most of the savings actually live.
4. Prompt engineering is an economics decision.
Prompt design affects cost in ways that compound quickly at scale. A prompt that forces a model to work around unnecessary context or navigate ambiguous instructions consumes tokens on overhead rather than output. Getting the context right upfront can reduce compute consumption on the same task.
A retrieval layer that gives the model what it needs improves output quality and reduces load on the most expensive component simultaneously. Both are budget decisions that belong in the cost conversation early.
The Bottom Line
Durable advantage in AI comes from cost discipline as much as raw capability. Organizations should treat token spend the way they treat any significant variable cost: knowing how they will hold each use case accountable for earning its share of the budget.
That rigor accelerates adoption by making the investment case defensible and the cost curve manageable. As usage scales, the gap between organizations actively managing token spend and those simply paying the bill tends to widen.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?











