What can you do to reduce your AWS S3 Spend?

Last week we made sure that you know how one of the most important cost factors of AWS S3 is the storage class and understand the different classes available with their use cases.

Now let us tell you some easy-to-follow and highly effective strategies to reduce AWS Storage costs by more than 50%.

Cut down storage costs-

  • Set the Right S3 Class for New Objects Before the Creation

Keep in mind that every S3 object can be assigned a particular storage class.

Start thinking about the intended usage for each new object to be created in S3. Each S3 object should have its access pattern. As a result, there is an S3 class that best suits it. All new objects in Amazon S3 should be given the appropriate class. In S3, you can’t specify a default class for each bucket. You can, however, assign it to each object separately.

Start defining the best class for each new object in S3. And set this class in the operation that uploads this object to Amazon S3. This can be done using AWS CLI, AWS Console, or AWS SDK. As a result, each new object will have the right class. In the long run, this is the most time-saving and cost-effective technique.

  • Keep Amazon S3 Credentials Private and monitor credential usage.

A lot of developers create IAM access keys and/or secret keys inside their applications. By doing this, users can directly perform operations on S3 and simplify architecture but any user can potentially cause a lot of additional costs. This may be malicious or just a simple accident. At minimum:

Use temporary credentials that can be revoked. Give them the access they need to execute the work (minimal rights). For example, you should set up a CloudWatch alert on “BucketSizeBytes” on every S3 bucket where a third party can upload objects. This would prevent malicious users from uploading terabytes of data in your S3 bucket.

  • Amazon S3 and EC2 must be in the same AWS region.

For your S3 bucket, select the appropriate AWS region. Ensure if EC2 and S3 are located in the same AWS region. Because data transfer between EC2 and S3 in the same region is free, the key benefit is performance and cheaper transfer costs. The cost of downloading a file from another AWS region is $0.02 per GB.

When processing data inside the same area, for example, the cost of inter-region data transfer from S3 to EC2 gets mostly eliminated. S3 charges would triple if the bucket was in a different zone, assuming that each file is downloaded three times each month (3 * 0.02 = $0.06 / GB).

  • Never start with Amazon Glacier right away.

If you are new to AWS and don’t understand Glacier or your application requirement changes, it may end up in you paying a lot later on.

  • Delete unwanted files after a certain date and remove unused S3 Objects.

You have probably noticed that you pay for the amount of data stored on S3. So, if you remove unused objects, you will also reduce S3 costs. You can check the contents of your S3 buckets in several ways:

  1. For example, you can list the objects on each bucket using AWS Console, AWS CLI, or SDI. This will show object names (or keys) without downloading the object’s contents.
  2. CloudWatch Metrics is another way to examine the contents of S3 buckets. To find the bucket’s total size, use the BucketSizeBytes metric. To get the number of objects stored in it, use the NumberOfObjects metric. These bucket metrics indicate the size of the buckets. Then start removing any unused items from the largest buckets.
  3. A lot of deployments use S3 for log collection. You may automate deletion using S3 life cycles. Delete objects 7 days after their creation time. E.g. However, it makes sense to cancel them after a year, if you use S3 for backups.
  4. You can also activate the S3 Inventory in a bucket. This tool prepares a CSV (or Apache ORC) file, which lists all objects in a bucket. And it’s delivered to another S3 bucket on a daily or weekly basis. This is a good approach when you have thousands of objects in a bucket, and you want to quickly find some of their properties (like size, encryption status, or last modified time). Note that the S3 Inventory has a small cost when active.

  • Enable “Lifecycle” feature:

When using an S3 versioned bucket, use the “lifecycle” feature to delete old versions. By default, deleting or overwriting data in an S3 versioned bucket keeps all data indefinitely, and you will be charged for it endlessly. In most cases, you only want to keep earlier versions for a limited period. You can set up a lifecycle rule for that.

You can set time-based criteria that can trigger ‘Transition’ (shifting things to a different storage class) and ‘Expiration’ with AWS Lifecycle Management (permanent deletion of objects). After a few days of object creation, for example, you can switch from S3 Standard to S3 Glacier. Therefore, you can move each object to the most appropriate storage class. By lowering S3 storage, you can keep your S3 costs low. Note that you can always transition the objects to a longer-term storage class. But you can´t transition to a shorter-term storage class. You can also set a lifecycle rule for a whole bucket, or based on a prefix.

So, you don’t need to transition your objects one by one. One of the most effective tools for reducing S3 expenditures is S3 Lifecycle Management. You should always consider using it. You should set an expiration for log files (or any other transitory data) that you store as S3 objects. For example, you can set log objects to expire 30 days after creation. And they will be removed automatically.

  •  Find and eliminate incomplete multipart upload bytes
  1. After you initiate a multipart upload, Amazon S3 retains all the uploaded parts of the file until you either complete or abort the multipart upload. If the multipart upload is not completed successfully, then Amazon S3 will continue to store the uploaded parts. As a result, you’ll be charged for the storage of uploaded parts.
  2. The multipart upload feature is useful if you have very large objects (>100 MB), enabling you to upload a single object as a set of parts, which provides improved throughput and quicker recovery from network issues. Incomplete parts, on the other hand, remain in your bucket (in an unusable state) and incur storage costs until you complete the upload process or take specific action to remove them.
  3. Determine the number and size of incomplete multipart uploads for a given bucket or all buckets in your account using S3 Storage Lens. You can also use Amazon S3 Storage Lens to figure out which S3 buckets contain multipart uploads. You can preview how much data exists for the incomplete multipart uploads.
  4. You can also select incomplete multipart upload bytes as a metric in any other chart in the S3 Storage Lens dashboard. By doing so, you can further assess the impact of incomplete multipart upload bytes on your storage. Like, you can assess their contribution to overall growth trends, or identify specific buckets that are accumulating these incomplete multipart upload bytes.
  5. From there, you can take action by creating a lifecycle policy to expire incomplete multipart upload bytes from the bucket after a specified number of days. The lifecycle policy can be used to clean up incomplete multipart uploads after a certain number of days.
  6. If you use the AWS Command Line Interface (AWS CLI) to abort the multipart upload (abort-multipart-upload), then the operation deletes the (incomplete) uploaded parts. However, if you use other tools that use Amazon S3’s multipart upload API, then incomplete multipart uploads might leave behind the uploaded parts.
  • Compress Data Before you send it to S3.
  1. Almost always use fast compression, such as LZ4, which improves performance while lowering storage requirements and, as a result, costs. It makes sense to utilize compute-intensive compressions like GZIP or ZSTD in many use cases. When you upload a compressed file to S3, the quantity of data consumed in S3 is reduced, which results in cheaper Amazon S3 costs. It should be noted that to get the original file, you must first download it and then decompress it. However, in S3, you could save a lot of space (for example when using text files).
  2. While there is no charge for transferring data into an S3 bucket, there is a charge involved for data storage and requests like PUT, GET, LIST, etc. To avoid paying extra, it is essential to store your data in a compressed format.
  3. When using S3 for Big Data analytics, and staging Redshift data, try to use compressible formats like AVRON, PARQUET, ORC which will reduce the amount of S3 storage consumed. For analytics batch processing, use columnar-based storage for better compression and storage optimization.
  • Use Infrequent Access, Storage Class.

Infrequent Access (IA) storage class provides you the same API and performance as the regular S3 storage. IA is around four times less expensive than S3 standard storage ($0.007 GB/month vs $0.03 GB/month), but keep in mind that retrieval ($0.01 GB) must still be paid for. On the standard S3 storage class, retrieval is free.

If you download objects less than twice a month, IA saves you money. Consider the following three scenarios in which IA can save you a lot of money.

  • Case 1: Automating the transfer of unwanted files to IA

To distribute binaries, use S3. Normally they are only used for a month after they’ve been uploaded to S3.  However, in rare cases, we still need the capability to quickly move back to an older version.  After 30 days, we leverage the S3 lifecycle to automatically move binaries to IA. This approach saves money without compromising data availability.

  • Case 2: Using IA for disaster recovery

Backup files are used for disaster recovery. It makes sense for you to directly upload any object over 128KB to IA and save 60% on storage for a year without losing the availability or durability of the data.

  •  Case 3: Use IA for infrequently accessed data

Given that there is some class of S3 objects that is downloaded on average 20% times a month, it makes sense just to keep them in IA. For every 1GB, you save $0.021 GB / month S3 Standard cost GB/month – IA Standard Cost GB/Month – IA Access cost=0.03 – 0.007 – 20% * 0.01). Multiply that by a petabyte and that’s just the monthly savings.

IA is great, but not always?

Minimum data size cost and minimum storage retention period are some of the restrictions of IA. IA charges for at least 128KB data and a minimum of 30-day storage. In addition, data migration to and from “S3 standard” costs one API call. However, IA is significantly easier to use than Glacier. Recovery from Glacier can take a very long time and any increase in speed increases your cost.

Cut down AP Usage costs-

Here are some tips on how you can reduce costs for your API access.

API calls charge the same irrespective of the data size. Regardless of the size of the object, API requests are charged per object. It costs the same to upload 1 byte as it does to upload 1 GB. Small objects, on the other hand, can cause API charges to skyrocket.

Batch objects whenever it makes sense to do so. You should design a system to avoid a huge number of small files. It is usually a good approach to have some clustering that prevents small files. If you have small files, it usually makes sense to use some database like MySQL or DynamoDB instead of S3. You can also use a database to group objects and later upload them to Amazon S3.

Cut down Amazon S3 costs on Data Transfer?

If you conduct a lot of cross-region S3 transfers, it might be more cost-effective to replicate your S3 bucket to a different region rather than downloading each file between regions each time. In US-west-2, 1GB of data is expected to be sent 20 times to EC2 in US-east-1. You will be charged $0.20 for data transfer if you commence inter-region transfer (20 * 0.02). If you first download it to a mirror S3 bucket in us-east-1, you’ll only pay $0.02 for transfer and $0.03 for storage over a month. It is 75% less expensive. S3 includes a feature called cross-region replication. You’ll gain superior performance as well as cost savings.

If there are a lot of downloads from the servers which are stored in S3 (e.g., images on consumer site), you can use an AWS content delivery network called CloudFront.

 Some CDN services, such as Cloudflare, charge a one-time cost. If you have a lot of static assets, CDN can save you a lot of money over S3, because just a small percentage of original requests will hit your S3 bucket.

Let’s Conclude

See how easily you can optimize your storage costs. To take command over your cloud costs on a long-term basis, there is one last thing you can do. And that is to use a dedicated cost optimization tool instead of relying on the native cloud provider’s free options that do not go far enough to help you reduce your costs.

If your organization depends on AWS and you need help navigating AWS S3 costs, schedule a demo, and OpsLyft will give you some suggestions on how you might mitigate them. We have a straightforward and user-friendly way to help you determine just how much you can save by taking the necessary actions.

Leave a Comment

Your email address will not be published. Required fields are marked *