Moving from Cost Anomaly Detection to Conditional Cost Anomaly Detection

What is cost anomaly detection and why do you need it?

With all the cloud services and numerous cloud cost management software accessible in the market, it’s more convenient than ever before to effectively compute and monitor every single aspect of day-to-day business activity to get cloud benefits such as reducing cloud costs by identifying mismanaged resources and eliminating waste.

But what happens when there is an unexpected spike and changes in cloud bills all of a sudden? There can be numerous reasons behind these unexpected cost surprises such as changes in pricing plans, malicious attacks, unplanned usage of resources, etc. Nobody likes to wake up one day at the end of the month and have a huge bill way over their cloud budget hanging over their head. This unexpected change within these data charts is considered an anomaly.

Now as engineering teams are accountable for handling the responsibility of managing cloud costs, DevOps engineers want to proactively monitor and identify unusual patterns in costs & usage, it is reactive in most cases but also ends up becoming a time-consuming and drudgery task. We have built Conditional anomaly detection to address the problem of irregular cloud spend outside regular usage patterns, it works on identifying/detecting the root cause of cost and usage spikes using which DevOps engineers can save their skills and time involved in continuously monitoring the cloud costs and usage.

Building Blocks of Conditional anomaly detection and where are we going with it?

Conditional anomaly detection addresses the problem of the users by providing them with insights that show them where their money has gone causing the unexpected spike. It provides in-depth root cause analysis through which a DevOps engineer can actually collect an insight to proactively take actions and minimize unintentional spending. 

Conditional anomaly detection uses custom-built machine learning (ML) models to continuously monitor unusual spending patterns across different AWS and GCP services on a day-to-day basis, and whenever there is a significant increase in cost, an alert is triggered.

When you enable anomaly detection for a metric, Conditional anomaly detection uses a statistical model to forecast and determine the cost anomalies. Anomaly detection analyzes 14 days of cost data to predict the cost. If the predicted cost and the actual cost incurred deviates beyond the fixed parameters it is marked as the cost anomaly. Then according to the anomaly threshold, alerts are sent through simple notifications.

Most of the existing solutions in the market offer similar solutions up to this point. However, we believe the solution is not complete yet as the problem is not fully addressed.

Until now we have been analyzing on a daily basis with the data sources for the last 14 days. Now consider a scenario where your organization has a business-wise event going on, for instance, a sale season where you intentionally need to deploy new systems and services that will result in a spending increase i.e., a spike in the cloud costs, so the resulting anomaly in the cost here would be reasonable and not a mistake.

These kinds of seasonal outliers which occur at specific times and seem abnormal when viewed against the meta-information/business context associated with them are called Conditional Anomalies.

So, now we are taking into consideration how Conditional anomaly detection as a cost anomaly detection model can address these seasonal and trend patterns.

How are we planning on doing this?

Some existing cloud-native solutions only work on the infrastructural level and not on the application level i.e., they focus only on point anomalies and not the conditional ones.

While some other platforms are approaching conditional anomaly detection differently. For example, some of them have a manual feedback mechanism where the user can add some components manually to let the system know that this is not an anomaly. And eventually, they are depending on their ML algorithms to learn seasonality patterns through this.

But this approach does not solve the problem completely. As here, the DevOps engineers are accountable to consistently coordinate with the business teams to get the granular data about the sales/business inflow which can cost them a lot of their time and effort. 

This tertiary work has to be chopped down and this is what Conditional anomaly detection is working on now.

By integrating the data sources of product and business analytics i.e., performance and operational data in form of logs and metrics, and correlate them to the cost data we have, this allows users to watch for anomalous costs in other business contexts — like products and features or business units and teams clarifying whether an anomaly is an actual mistake or just a false positive.

This is our vision to build a solid solution that aims on detecting conditional anomalies along with point anomalies.

Benefits of Conditional anomaly detection to DevOps Engineers and Engineering Teams at this moment:

  • This feature evaluates each of the services you use individually, allowing small and big anomalies to be detected which allows users to analyze and determine the root cause of the anomaly, such as projects, services, or the usage type that is driving the cost increase.
  • We analyze cloud costs on an hourly and day-to-day basis instead of waiting for the cloud bill every 30 days and face an unexpected cost spike situation causing panic.

What does the future hold for Conditional anomaly detection?

  • After moving from point anomalies to conditional anomalies comes the collective anomalies. Collective Anomalies occur when a single outlier doesn’t possess a sharp deviation from the usual patterns in the data set, but a subset of data points within the data set is considered anomalous if those points as one combined unit deviate significantly from the entire data set.

For detecting such anomalies, we will be building a model with algorithms that understand the relationships between different time series at the time of detecting and investigating anomalies.

  • Right now, Conditional anomaly detection is processing on the service level but we aim to exercise at the resource level in the future.

The need for new technology has always been looked at as every decision that an engineer makes in the cloud impacts cost. Conditional anomaly detection leverages advanced Machine Learning technologies to detect unexpected spending and provide insights with root causes, so the engineers can quickly take action.

So rather than investing your time and effort in continuous manual monitoring of resources and yet getting caught off guard with an unexpectedly high cloud bill, you can have your team focus on delivering features!

So, if you are new here and want to be able to put innovation first putting the tech debt behind, now is the time to take the next step. Request a demo and take a look at the solid solution we are building to handle the above-discussed variables in real-time.

Leave a Comment

Your email address will not be published. Required fields are marked *