Allocating Kubernetes Cost: How to manage and optimize Kubernetes costs?

What is the problem with allocating Kubernetes costs? And, why do you need to worry about it?

Kubernetes has quickly come to power as one of the most popular cloud technologies. It enables the deployment of scalable and fault-tolerant modern applications in a shorter amount of time. You declare the state you want your environment to be in, and it works continually to maintain it, freeing developers from manual infrastructure management responsibilities. 

With all the benefits of Kubernetes, there are also challenges. Companies are spending a lot on adopting container technologies, which feeds into the overall IT spend. When more and more teams transition to production environments on Kubernetes to develop and deliver their applications, one of the first challenges they encounter is how to accurately allocate Kubernetes costs to different teams, services, applications, and departments. Cost allocation and Cost Monitoring around Kubernetes become a challenge. 

The main issue with Kubernetes cost allocation is a lack of visibility into how Kubernetes cluster resource usage relates to the cost of the underlying cloud infrastructure, Kubernetes is running on top of.

The shared resources model in Kubernetes pools the whole underlying computes infrastructure into a single entity called a cluster. Multiple teams and services use the same underlying infrastructure in this shared resources model. This makes it difficult for IT managers and Finance departments to accurately allocate Kubernetes costs to different services, teams, departments, and applications.

Source: kubernetes.io

To state the basics, the cost of a cluster is equal to the cost of the servers that it operates on.  However, this does not address the issues of transparent cost allocation. A complete Kubernetes cost analysis is required for that.

Consider the following scenario: We’re developing a product line with four distinctive backend services. We might run this with a load balancer in front of a distinct group of resources running each service in the case of conventional server-based architecture. So, as per the number of resources each service is running on, we can roll the cost of operating those into the total cost of that service.

Suppose, out of the four services running, service A is computing on 20 resources and still running out of memory and CPU, but service B, C, and D are rightly operating with a much lesser number of resources each. Clearly, it’s service A, that’s driving our costs and it’s time to reassess it.

However, in the case of Kubernetes, debugging costs are not as straightforward.

Suppose, all of these four services were running on a single Kubernetes cluster of 30 servers. The total cost of running that cluster(total cost of the servers) wouldn’t by itself tell us anything about which of our services might be the source of the majority of our charges. It would be like looking at our entire AWS bill in vain without anything broken down into individual line items.

How to analyze and manage Kubernetes costs?

Multiple teams and services utilize shared resources from the same underlying infrastructure in the Kubernetes cluster. These clusters in turn are made up of individual units of memory and CPU (compute and storage) called nodes. Keeping track of where nodes and clusters are actually running is essential to cost allocation and chargeback efforts.

Generally, scaling is driven primarily by two things: CPU and memory. So, if a particular service A is using 60% of an instance’s CPU and 40% of its memory while all other services(B, C, D) are using 5% of each, we can use these numbers and distribute the total cost of the machine into four buckets as per these services- A, B, C, and D and unused.

Now we’re back to a model that reveals what significant unit is driving costs, and we’ll be able to carry this model directly to Kubernetes.

Instead of simply spinning up an instance of each service on each server and letting it do its job, each of these services will be encased in a logical collection of pods, which will be scheduled onto the cluster as needed by the Kubernetes container orchestration process. The node costs are divided among the pods, each of these pods hosts some subset of our services. “Pod” is the atomic unit of scheduling, and we can reconstruct any higher-level abstractions we choose to utilize based on it.

Native cloud providers charge resources on an hourly basis, but over the course of that hour, a variety of pods belonging to various services, and namespaces could spin up and down on a single instance. The only part of the process particular to Kubernetes is gathering the compute and memory usage metrics.

Source: kubernets.io

Kubernetes exposes a number of Prometheus-formatted metrics that we can use: pod CPU utilization and pod memory utilization providing minute-by-minute details.

We now have enough data to estimate how much our Kubernetes service will cost to run. First, we’ll examine our cloud provider(AWS/GCP) bill, which has per-resource costs for the cluster nodes.

Then, we collect Prometheus usage metrics from our Kubernetes clusters to correlate Kubernetes pods back to resource IDs (and thus to our AWS/GCP bill). 

Now, we can calculate a pod’s hourly utilization by the sum of its per-collection-period totals   — the maximum(reserved, utilized resources) divided by the number of collection periods per hour.

As before, break down the instance’s costs into memory and CPU, then part those costs based on utilization. And there you go- Per-pod costs. As a result, service costs are simply the total of all pod costs of that particular service. Similarly, by summing over pods, we can calculate expenses for various higher-level Kubernetes abstractions such as namespace.

Putting all of this in a spreadsheet, and that’s how you have it all figured out Kubernetes costs allocation.

Sounds too much to handle right? Don’t worry, we’ve got you covered! OpsLyft enables software developers to save time & effort in analyzing the complex Kubernetes environments. 

Don’t wanna go through this rigorous mind-numbing data analysis to get the Kubernetes costs right?

OpsLyft helps you perform a smooth and easy Kubernetes cost analysis and view full cost breakdowns down to the hour by namespace, deployment, service, labels, pods, and containers across any major cloud provider (GCP in current scope) or on-prem Kubernetes environment.

What is the mechanism behind it? In a nutshell, we use our own advanced algorithms to automatically allocate expenses inside your Kubernetes clusters by combining container utilization data, AWS/GCP cost data, and information about your business context. There are no manual rules or efforts and the costs are defined in terms of what matters to your company — for example, by product and feature or by team and business unit.

Users can identify which product features those costs correspond to, which can help them answer fundamental questions about how much it costs to build and run their products.

And you can impart that insight to the engineers in charge of each component of your product, allowing them to make better decisions that will benefit your company’s COGS.

With OpsLyft, you can simply focus on how your cloud costs match your business strategy.  Request a demo today to learn how we can analyze and manage your Kubernetes costs for you!

Leave a Comment

Your email address will not be published. Required fields are marked *