The Most Expensive Part Of AI Might Not Be The Model

Deepak Mittal is the CEO of CloudKeeper, a company delivering outcome-driven AI and cloud cost optimization for businesses worldwide.

Companies spent the last two years trying to get AI into production. Now, a different conversation is starting to happen within engineering and finance teams: How much does it actually cost to run AI at scale?

That question gets complicated very quickly. Training large models still gets most of the attention. For many enterprises, however, the bigger operational challenge is ongoing inference, experimentation, GPU utilization and unpredictable consumption patterns. AI workloads behave very differently from traditional cloud workloads, and many FinOps practices were never designed for this kind of infrastructure demand.

This matters because AI usage is growing fast. Goldman Sachs estimated that global AI infrastructure spending could reach between $4 trillion and $8 trillion by 2031 as companies invest in data centers, chips, networking and power infrastructure. That level of investment changes how enterprises think about cloud economics.

Token costs add up faster than most teams expect.

For years, cloud optimization focused heavily on areas such as compute sizing, storage efficiency and reserved instance planning. AI introduces a different kind of operational pressure. Token usage can fluctuate heavily. GPU resources are expensive and often underused. AI teams experiment constantly. Newer AI systems increasingly rely on continuous inference and orchestration instead of occasional workloads.

The result is a cloud consumption model that becomes difficult to forecast once AI adoption starts spreading across teams.

One area where this becomes obvious is token pricing. Many enterprises still underestimate how dramatically token costs can vary across models. Small differences may look manageable during pilot projects. At production scale, however, those differences compound quickly. The FinOps Foundation published a detailed breakdown of how token pricing actually works across AI systems, including how costs vary based on input tokens, output tokens, context windows and usage patterns.

This becomes even more important as organizations move beyond simple chatbot deployments.

More AI activity means more infrastructure pressure.

AI systems are becoming more operationally complex. Enterprises are now managing retrieval systems, orchestration layers, vector databases, autonomous workflows and multimodel environments. McKinsey noted (registration required) that AI infrastructure is becoming a critical business capability that extends far beyond software alone, and the infrastructure demands keep growing.

Agentic AI is adding another layer of pressure. These systems perform tasks continuously instead of responding to isolated prompts. That means more inference activity, more API calls and more persistent compute consumption. McKinsey also highlighted how agentic AI systems are increasing orchestration complexity and making infrastructure management more dynamic. This creates a challenge for traditional FinOps models.

Many organizations still approach AI infrastructure with cloud optimization strategies built for predictable workloads, but AI workloads are rarely predictable. Usage spikes can happen suddenly, experimentation expands rapidly across teams and model selection decisions may be driven more by hype than operational efficiency. In many environments, visibility remains limited.

Bigger models aren’t always the smartest choice.

GPU utilization is becoming a major concern. AI infrastructure is expensive enough that idle or poorly utilized resources create significant operational waste. Some enterprises are now reconsidering where AI workloads should run altogether. Interest in private AI infrastructure is growing because organizations want better control over governance, cost predictability and resource allocation.

Another interesting trend is happening around model size. For a while, enterprise AI conversations focused heavily on using the largest available models. That thinking is starting to evolve. Smaller language models are becoming increasingly practical for targeted enterprise use cases. In many scenarios, companies are finding that lightweight models provide acceptable performance with significantly lower infrastructure costs and lower latency.

That changes the economics considerably. Instead of relying on a single large model for every workload, organizations are beginning to think more carefully about workload-aware model selection. Some tasks may justify premium reasoning models. Others may work perfectly well with smaller and cheaper alternatives.

This is where AI cost optimization becomes more strategic than tactical. Enterprises are starting to evaluate how AI architecture decisions affect long-term operational efficiency. Model routing, inference optimization, caching and workload allocation are becoming important business decisions because infrastructure costs scale very quickly once AI usage expands.

AI spending is finally getting boardroom attention.

Many organizations approved AI experimentation budgets over the last two years without fully understanding what operational scaling would look like. That’s beginning to change. Most leadership teams now want visibility into AI ROI, infrastructure efficiency and ongoing operating costs—and they should.

AI infrastructure demand is growing faster than many organizations expected. According to Goldman Sachs, AI-optimized data centers can now cost between $15 million and $20 million per megawatt because of GPU density, cooling requirements and infrastructure complexity. Those economics eventually affect enterprise decision-making.

This doesn’t mean organizations should slow down AI adoption, but it does mean AI deployment strategies need more operational discipline than many companies currently have. AI projects that look manageable during experimentation can become very expensive once usage scales across products, employees and customers.

FinOps teams are now being asked to solve problems that barely existed a few years ago. They need visibility into token consumption, inference efficiency, GPU allocation and workload behavior across increasingly distributed AI environments.

That requires a broader view of cloud and AI optimization. The organizations that handle this well will probably be the ones that understand how to balance performance, cost efficiency and operational scale before complexity becomes difficult to control.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

What's On

Prosecution ‘Set The Grounds’ For Kohberger’s Bid To Withdraw Guilty Plea—Here’s How: Howard Blum

Cracker Barrel to pay Julie Felss Masino’s’s security costs, exit fees: reports

Decline In Newborn Vitamin K Shots Means More Life-Threatening Bleeds

Novak Djokovic, Jannik Sinner, Carlos Alcaraz To Headline Six Kings Slam

Immigrant-led business group votes to sue NYC over Mamdani’s taxpayer-funded grocery stores

The Most Expensive Part Of AI Might Not Be The Model

Decline In Newborn Vitamin K Shots Means More Life-Threatening Bleeds

If The AI Singularity Is Here, Where Is The Evidence?

International Standards Bodies Seek To Keep Pace With The AI Wave

Do Men Or Women Fart More Often? This Study Provides An Answer

Medicare For All Is A Brand, Not A Plan

Most Enterprise AI Isn’t Enterprise AI Yet

What's On

The Most Expensive Part Of AI Might Not Be The Model

Token costs add up faster than most teams expect.

More AI activity means more infrastructure pressure.

Bigger models aren’t always the smartest choice.

AI spending is finally getting boardroom attention.

Related News