10/30/2024 | News release | Distributed by Public on 10/30/2024 11:37
We saved a LOT of money. We'd tell you the exact amount, but our lawyers said no.
Duolingo has come a long way since its humble beginnings-in just the last few years, we've doubled down on improving beloved features like Stories, and creating immersive lessons like DuoRadio and Adventures, just to name a few. While we're all very excited about the future of our product, we've also had to face a cold, hard truth: AWS doesn't accept good vibes as payment.
Every shiny feature demands resources to maintain, and many come with hefty price tags. Over time, these costs added up to millions of dollars a year! (We'd like to tell you the exact amount, but our lawyers said no.) And so, a grand quest was issued at the start of 2024: reduce cloud spending without compromising our product. Dozens of engineers contributed to this effort, and we raked in 20% savings (annualized) in just a few months! Here are some takeaways from that experience.
The first step was to understand where every dollar was sneaking off to. We needed easy access to data that answered, "How much are we spending, and on what?" And crucially, "How is this changing over time?"
We started with a third-party tool called CloudZero, which broke down our cloud costs into queryable line items. With this, we were able to skim through our top spenders and identify anomalies among them. There were definitely some surprises! For instance, one service's staging resources cost more than its prod counterpart… Turns out, someone had scaled it up to test something and forgotten to scale it back down.
Increasing the number of engineers looking at costs is a great first step to saving money, so we worked on improving discoverability as well as accessibility. We improved our data coverage to include cloud services beyond AWS, such as OpenAI. We integrated cloud spending into our existing metrics ecosystem, and sent out weekly reports so teams could passively monitor their services.
This may sound trivial, but you'd be amazed by how many expenses we had for things we didn't need (think about all your old subscriptions ). Through our investigation, we unearthed a bunch of such resources: ancient ElastiCache clusters, entire databases, and an entire microservice. Many of them belonged to legacy features whose code wasn't fully cleaned up-but if their owners knew the literal price of their tech debt, they might've had second thoughts!
It's not just about deleting unused resources, though-it can also be about deleting unnecessary data. Here are three examples:
The idea of "only paying for what you need" applies to compute resources as well! Most of our services were overprovisioned:
Upon a closer inspection of the AWS documentation, we found some cost-saving policies for DynamoDBandRDS that are optimized for certain read/write usage patterns. AWS also offers autoscaling and task scheduling configurations, so you don't have to run at full capacity 24/7 to accommodate occasional traffic spikes.
We saved several hundred thousand dollars a year by switching a single database to Aurora I/O optimized!Another cost cutting solution was refining our Reserved Instances (RI) strategy. We tracked out EC2, RDS, and ElastiCache usage and allocation. It gave us visibility into our baseline of necessary compute resources, which informed our bulk purchase of compute resources through RI that we couldn't get through the Spot market.
Every request in a microservice architecture can trigger a chain reaction-one service call often results in five more calls to other services. Not only does this increase load-balancing and inter-AZ bandwidth costs, it also means every service needs more tasks to handle the request volume!
Here are two ways we dealt with this:
Here's a critical takeaway: many of our cost savings initiatives involved cleaning up tech debt and simplifying complex code. Not only did we save a lot of money, we also improved the health of our codebase! Investing in engineering excellence often results in monetary savings, and incorporating cloud costs into design decisions can incentivize you to Do The Right Thing™.
Spread this knowledge far and wide! Even if you don't have time to commit fully to the cause, what's arguably more important is to ingrain cloud costs into your engineering culture. Show your mentees how to find this data. Include cost estimations on your tech specs. Do quarterly reviews of your team's spending trends and brainstorm high-ROI strategies to bring that down.
If you're looking to work at a place that values engineering efficiency and excellency, we're hiring engineers!