You may have read our previous blog posts Introducing FinOps and FinOps at Flexera: Our Journey to Cloud Cost Optimization. In this next installment in the series on cloud financial management, we’ll share more about that lifecycle and how to use key performance indicators (KPIs) to ensure you remain on track to achieve your FinOps goals.
Using KPIs to measure and drive success isn’t a new topic, but in a fast-moving cloud environment, it can be very difficult to establish the success or otherwise of your well-intended processes and actions without having the appropriate cloud financial management measures in place.
The FinOps Lifecycle
There are three main stages of the FinOps lifecycle—inform, optimize and operate—and while some customers who already have good visibility of their cloud spend may jump straight in to the optimize or operate stages, we recommend you start at the inform stage—you can’t manage what you can’t see, so this stage is vital to ensure you have full visibility of your spend and can categorize it by dimensions appropriate to your business.
During this stage of the cloud financial management lifecycle, you’re looking to ensure you have complete visibility of all costs (a robust tagging strategy is essential here), and that those costs are allocated to appropriate business units, cost centers or other relevant categories.
Some suggested KPIs to help you track progress at this stage are as follows – use percentages carefully, often the actual costs are more informative:
The percentage of resources, or cost of:
- All untagged or non-compliant resources (i.e., wrong spelling or case) tagged resources
- Any unallocated costs, i.e., those not associated to a cost center or relevant dimension
- Amount (percentage or dollars) of total cloud spend charged back (or not)
- Cloud spend rate (daily/monthly)
At this stage, it’s important to consider the longer-term goal of establishing unit economic KPIs—what some may define as the nirvana stage of FinOps—whereby cloud spend can be reported using a business metric such as cost per customer, cost of goods sold (COGS), etc. While you may be a long way from reporting at this level, it’s worth considering which unit economic KPIs may be useful in the future so you can plan your tagging strategy and cost-allocation activities for implementation at a future stage. Changing these at later stages can be time consuming and harder to justify.
At this stage in the lifecycle, it’s important to be clear on the business outcomes you’re looking to achieve, and these will likely change over time. If you’re in the early stages of a cloud migration, it may be that agility—getting your applications and services into the cloud quickly—is the primary objective. For others it may be workloads that have been migrated and are costing more than expected, and so cost control may be important. It may also be a combination of both if different teams or lines of business are at different stages in their cloud maturity.
Setting spend goals here can help facilitate those outcomes. Balancing speed, cost and quality aren’t new challenges for any IT organization, but with visibility of cloud costs during the inform stage, you should be better able to find the right line between these constraints. Accelerating your migration program and using more expensive cloud services are options when you need to go faster and know how this impacts cost. These goals may be specific spend thresholds (dollar amounts) for those who are migrating to the cloud but may also be savings targets (percentage of spend) for those already in the cloud and looking to optimize spend.
Once you’ve established these goals, the optimize phase is also about identifying opportunities to reduce spend when you find yourself exceeding expected cost thresholds or budgets you’ve set. There are two main ways to do this—usage optimization (i.e, consume less resources) or rate optimization (pay less for the resources consumed). The details behind these approaches will be a topic covered in a later blog, but some KPIs you might want to consider putting in place to track the impact of these levers are as follows:
- Percentage of compute usage covered by discounts (RIs, CUDs, SPOT, etc.)
- Savings realized by discounts—subtract discounted costs from the list price
- Discount achieved as a percentage of costs—i.e., list price/discounted price x 100
- Compute costs per hour or month—useful for trending over time
- Budget variances and forecast accuracy
At this point in the lifecycle, you get to implement the identified savings opportunities and, importantly, define and establish repeatable processes that will enable you to iterate through the lifecycle effectively. As these processes mature, you can also explore opportunities for automation to reduce the resource overhead.
To track the effectiveness of this stage of the cloud financial management lifecycle, you may want to measure the potential savings opportunities over time (by team or service or other dimension) versus the savings realized over time. Teams that act upon the recommendations made in the optimize phase should see their potential savings and actual savings realized increase, but unit costs (cost per instance hour) could be used to articulate the effect of acting on the recommendations.
Some example KPIs to track over time:
- Savings recommendations (dollars)
- Savings realized (percentage)
- Savings realized as a percentage of recommendations
- Unit costs (cost per compute hour, cost of goods sold)
You may also wish to consider a balanced scorecard approach. In many cases, the savings opportunities of each team will be directly related to the amount of spend, and so weighting their scores based on their percentage of overall consumption of your cloud bill will provide a more balanced view of how effective teams are when implementing savings recommendations.
As you iterate through the lifecycle each time and your confidence grows in the recommendations being made and implemented, you may choose to then automate certain areas of the operate phase—using tooling to implement “lights on/off” policies, deleting unattached volumes and releasing unused IP addresses. These are just some examples of commonly automated tasks, and as your confidence grows, you’ll want to continue to develop your automation capabilities as this will enable you to operate at scale as your cloud consumption grows.