Flexera logo
Image: FinOps for AI: 8 steps to managing AI costs and resources

It’s no secret that AI, particularly, Generative AI, is the tech wunderkind of the moment. The potential benefits are massive, although many are yet to be fully realized. Across industries, AI use cases seem limitless—often making AI feel like a hammer searching for nails. However, it is important to determine the appropriate size of that hammer, and whether that nail really is a nail.

With AI’s broad promise, organizations often experiment across various use cases, leading to the risk of overprovisioning resources and consequently wasted spend. Due to the dynamic nature of how AI models are trained and used, the consumption of resources can be difficult to predict and control. Every new dataset could lead to a breakthrough—or send you down an expensive rabbit hole.

As organizations expand their cloud footprints, the need and desire for cost accountability and optimization become paramount. The experimental nature of AI can run counter to those goals, making visibility, accountability, and optimization more important than ever. Like other cloud services, AI offerings are typically easy to use—and even easier to overuse—racking up inflated bills.

Enter FinOps. At its current maturity level, FinOps practices and processes are most likely already in place at most enterprise organizations, and if not, they are on their near-term radar. Although AI differs from traditional cloud-based workloads, the core FinOps principles still apply. To truly optimize AI costs, you need visibility into resource usage, accountability to attribute that usage (and costs) to the appropriate parties, and opportunities to optimize based on what you have observed. Fortunately, FinOps provides the framework for this.

Managing AI costs step by step

Managing AI costs parallels managing other cloud costs, which is beneficial since many organizations already have processes in place. To control and optimize your AI costs, ensure you have the following elements:

Step 1: Visibility
Visibility is key. The ability to see all the resources used across the entire organization as it relates to AI resource consumption is the basis for all that comes next. Some PaaS AI offerings offer limited visibility as charges may appear as a single line item, so when it comes to visibility, your mileage may vary.

Step 2: Accountability
Once you have visibility into what’s being used, the next step is knowing who is using it. Identifying the users or groups responsible for resource consumption can reveal potential overuse or inefficiencies.

Step 3: Governance
Accountability is reinforced through governance. Enthusiasm around AI’s potential can drive experimentation, leading to overuse and overspend. Governance controls prevent rampant use of AI resources or any cloud resources for that matter. It is important to have these governance controls act as guardrails, rather than roadblocks. While you don’t want to suppress the efforts of well-intentioned users, you do want to keep them on a responsible path (from a financial exposure perspective).

Step 4: Tagging
Tagging, tagging, and more tagging. When resources are tagged appropriately, they can be properly attributed to the correct user, team, project, application, business unit, etc. Tagging enhances visibility and accountability, helping identify areas of potential overspend. However, the opportunities for tagging PaaS AI resources is not as robust as it is for IaaS resources, so the ability to get granular tags may not be possible.

Step 5: Budgets and associated alerts

Establish budgets and alerts to avoid automation running amok, as well as “unconscious overprovisioning.” Ensure you have budgets in place for teams using AI services and alerts that trigger when AI spend is trending towards exceeding those budgets. Tagging, accountability, and governance can contribute to more granular budgets for individual teams, business units, and more. Usage data and patterns may show that one business unit is consuming far more resources than others, but the data may bear this out as acceptable, and budgets for this group can be adjusted accordingly.

Step 6: Optimization
With visibility into AI resource usage and accountability in place, you can now optimize. When AI services are being consumed (and not via cloud infrastructure that is running those services), the concept of “underutilization” does not apply in the traditional sense. Meaning resources are used when they are called upon, and are not sitting idly by waiting for a task to perform. As such, it’s essential to evaluate whether the resources being consumed are delivering adequate ROI. This is where organizations must strike a balance—just because you can add another document to be indexed doesn’t mean you should. Optimization means using data judiciously to reduce training time and scale back when necessary. Optimization is an inexact science, and one in which every organization’s decision threshold will be different. There is no one golden metric that applies in all use cases. The trick is to determine what your golden metric is, and to leverage that in your optimization decisions.

Step 7: Operating model
The last phase of the FinOps framework is operating—defining strategies to optimize resources, and refining workflows to implement those strategies. The same principles apply to AI resource management. This phase allows you to refine processes or create new processes to implement what was learned in the earlier phases, so those lessons can be leveraged and not need to be relearned.

Step 8: Ongoing optimization
The phases of the FinOps framework are circular for a reason—optimization is an ongoing process, and you are never done. Though it gets easier with practice, true cost optimization, AI or otherwise, involves a finish line that is often in sight but never crossed.

Final thoughts

As with other cloud resources, visibility and accountability are essential for optimizing AI usage. While gaps may exist in how AI services are offered and consumed, the FinOps framework provides a solid foundation for AI optimization. The key is adapting the framework to address those gaps and ensuring AI cost optimization is a part of your overall, comprehensive FinOps practice. Read more on the Flexera blog about adjusting your FinOps strategy.