OpsLyft for Cloud Visibility

Cloud Asset management

Anyone who has worked with cloud providers knows much time is wasted trying to answer a simple question about the environment. 

Engineers have to go back and forth between tools such as console, Cloud SDK, or API, which makes it challenging to gain insight and answer questions about our environments.

A multitenant architecture makes aggregating cloud resource data even more challenging.

The typical pattern that we saw our customers face when they wanted an insight into the environment/infra was:

1. Find someone with the right access

2. Run a bunch of scripts that:

i. Extracts cloud asset details (From SDKs/APIs).

ii. Transform the data into a workable data structure.

iii. Run queries on the transformed data to get the required insight.

CAM aims to solve the same problem by allowing the users to gain visibility and manage their cloud assets across multiple cloud providers, accounts/projects in one place:

  • The manual effort that engineers have to put in to get the required insight from the environment is completely eliminated
  • Writing and running custom scripts is no longer required
  • CAM extracts, transforms, and loads the data and makes it available for the user to gain insights into it

Let us take a look into various problems that users face and how the model of CAM resolves them efficiently: –

Problem: It is difficult to manually access the previous cloud asset details and manage the current ones. 

Solution: Since CAM stores the asset details, it allows the user to see the associated cost and historical assets (assets that were part of the infrastructure). CAM, unlike other tools, is near real-time, where the asset details are updated frequently, enabling the user to develop greater visibility into the operational infrastructure.

Problem: Custom queries and insights are tough to generate across different accounts/projects.

Solution: CAM enables the users to query about all the resources running in the prod environment across different cloud providers. Details such as the utilization of the current status, tags, and the owner of each resource are key data points that the users can look at to gain insight into a specific resource.

Problem: Many times, large EC2 (or RDS, Redshift, ECS, etc.) instances may be created and sized to handle peak utilization but never reviewed later to see how well the storage, compute, and/or memory is being utilized. 

Solution: CAM provides the user with the data to analyze the assets and their utilization metrics, find resources that are not overprovisioned, underprovisioned, or Idle, and take appropriate actions.

Problem: Load balancers may not have associated resources or targets, RDS databases may have low or no connection counts, and a NAT gateway may not have any resources routing to it. EBS volumes may not be attached to running instances. An EIP may not be attached to any resource. 

Solution: CAM allows users to find resources whose utilization is determined by their being attached to other resources. It allows the user to slice and dice their cloud infra data in any way they please to give them the insight they require.

Therefore, through software automation, Cloud Asset Management helps decrease the time and effort lost manually and centralizes cloud data or details in an orderly manner that is easy to access.

Cloud cost-saving insights:

Ever-growing cloud cost has become one of the main concerns of companies lately. They know how challenging managing cloud cost is, not to mention the complicated and multifaceted pricing structure. Cloud bills are long, complex, and hard to unpack because every service has its billing metric. Understanding the usage to the point where you can make a decision confidently is next to impossible.

Some of the main issues that we see our users face are:

  1. Overprovisioning Resources- This happens when the team or individual chooses resources larger than what is actually needed to run the workload. This is frequently done as a safety step to ensure that the workload continues to run smoothly. While engineers can justify this strategy for performance, it increases cloud costs.
  1. Unused Resource- Engineers often spin up an instance for either development or some project and then forget to shut it down. This results in unused/Orphan resources that are not associated with an owner and continue to generate costs.

So how does someone work towards reducing cloud costs? 

The typical answer is:

  • Go through the billing console to identify major cost centers.
  • It might be a service, team, or environment that is the reason for increased cost
  • In the case of services, drill down into each resource under that service and hop back and forth between usage metrics and billing to understand the use case of the resource.
  • Reach out to the respective owners of these resources you identify, and discuss with them the rationale behind having an over-provisioned resource.
  • Decide to either resize or terminate the resource and take action against it

This is a very tedious and manual task, taking up valuable engineering time, and this is where CSI enters and solves the problem.

CSI aims to solve the same problem by removing all the manual effort that goes into trying to hack up a script or going back and forth between the console and dashboards. It identifies all the resources that are idle and underutilized that could potentially save cost.

With CSI, insight optimization becomes effortless; here’s how:

CSI provides the user with some default insights covering major services across cloud providers based on our thresholds.

CSI allows customers to create their insights, such as identifying all computing resources with CPU usage less than 60%, memory utilization less than 40%, or S3 buckets idle for more than 6 months. It also offers a wide range of indicators that may be used to assess resource consumption and identify possible cost-cutting options.

Regarding cost savings, CSI has made stakeholders’ tasks considerably easier. CSI monitors your cloud environment and discovers resources based on consumption data and user insights, as well as default thresholds specified by us, with the opportunity to filter these insights for particular users.

We understand that identifying the idle/unused/underutilized resources is just half the job done. If the user decides to resize a resource, a new question arises what should be the new configuration?

Resource capacity estimation is another challenge our users face, and to solve this problem, CSI offers a variety of recommendations to users for different scenarios. CSI currently provides recommendations for some of the major services across cloud providers and is constantly working towards supporting more services.

If the user is unsure how resizing the resource to the cheapest recommendation could affect the application performance, the user can choose between 3 recommendations: 

Cost-optimized, Balanced, and Performant. 

Leave a Comment

Your email address will not be published. Required fields are marked *