AI Spend Is Making Cloud Waste Trend Up Again

GPU instances and inference endpoints have reopened the cloud cost problem that FinOps was starting to solve. Governance needs to catch up.

Cloud cost management was making progress. Teams were tagging resources, right-sizing instances, and buying Savings Plans. Then AI workloads arrived—and the cost curve bent upward again. GPU instances cost 10–40x their CPU equivalents. Inference endpoints run 24/7 whether or not anyone is asking questions. Training jobs can burn through five figures in a weekend. The FinOps playbook that worked for web applications needs new chapters.

Why AI Workloads Break Traditional FinOps

Traditional cloud cost management assumes relatively predictable, steady-state workloads. You provision a fleet of instances, they run services, and cost scales roughly with traffic. AI workloads violate every part of this assumption. Training jobs are bursty and unpredictable—a hyperparameter sweep might spin up 50 GPU instances for six hours, then nothing for two weeks. Inference demand is hard to forecast because product teams are still discovering what users do with AI features.

The unit economics are also different. A single p4d.24xlarge instance costs roughly €28 per hour. A team running fine-tuning experiments without cost guardrails can spend more in a day than their entire monthly EC2 budget for non-AI workloads. And unlike CPU instances, GPU instances have limited Savings Plan coverage and sparse spot availability.

Inference Economics: The Hidden Steady-State Cost

Training gets the headlines, but inference is where the ongoing cost lives. A self-hosted model endpoint on a g5.2xlarge costs approximately €1.20/hour —roughly €870/month if it runs continuously. If your AI feature handles 50 requests per hour, that is €0.024 per request. At 5 requests per hour, it is €0.24—an order of magnitude difference in unit cost for the same infrastructure.

The decision between self-hosted inference and managed API (OpenAI, Anthropic, Bedrock) is fundamentally a utilisation question. Managed APIs charge per token with no idle cost. Self-hosted endpoints have high fixed cost and low marginal cost. The crossover point depends on volume, latency requirements, and data residency constraints.

Governance Framework for AI Spend

Build governance around three controls: budgets, approval gates, and automated shutdown.

Budgets: Set per-team monthly GPU budgets. Track burn rate daily, not monthly. A team that has spent 80% of its budget by day 15 needs a conversation, not a surprise at month-end.
Approval gates: Require explicit sign-off for training jobs above a cost threshold (e.g. €500). This is not about slowing people down—it is about making cost a conscious input to experiment design.
Automated shutdown: Training jobs should have hard time limits. Idle inference endpoints should scale to zero. Use SageMaker Serverless Inference or Lambda-based inference for low-traffic features.

AI Spend Is Making Cloud Waste Trend Up Again

Why AI Workloads Break Traditional FinOps

Inference Economics: The Hidden Steady-State Cost

Governance Framework for AI Spend

Mastering AWS Costs: A CTO's Guide to FinOps

Kubernetes Cost Control: Requests, Limits, and the Traps That Inflate Bills

Lean Startup in the AI Age: What Still Works, What Breaks, What Replaces It

Decision Framework

Failure Modes