The actual infrastructure costs of running a global Edge API (part 2)
This is a follow up post to part 1, where we looked at how much it costs to run a feature flagging platform at scale.
Since we wrote part 1, a lot of changes have been afoot at Flagsmith Infrastructure HQ! Most notably, we launched our Global Edge API into production, and have been migrating existing customers, and onboarding new customers directly onto the Edge platform.
We carried out all this Edge API work to solve some big topics around hosting, infrastructure and scaling:
- We wanted to provide global low latency for all our customers wanting to serve flags to their applications.
- We didn’t want to worry about scaling our SDK endpoints, ever.
- We wanted global failover in the event of an entire AWS region outage
- We wanted to take control of our costs, and tie them directly to our API traffic
Our Edge API - a technical overview
The Flagsmith API is split into two logical groups of tasks:
- Serving flags to our customer’s applications.
- These are the most critical and represent about 99.99% of our traffic.
- Serving requests from our dashboard.
- Actions like creating flags, managing segments, adding flagsmith users etc. Most of this traffic comes from our React frontend but some comes from our REST API as well.
Both of these API taskshave quite different requirements when it comes to uptime, global responsiveness and complexity:
- Serving flags:
- This requires as many 9s of uptime as we can manage, it requires global low-latency, and is quite simple in terms of data and transactions.
- Serving dashboard requests:
- Can realistically manage more downtime and doesn’t have hardcore latency requirements.
After a lot of R&D, we settled on the following platform to power our Edge API:
- AWS Global Accelerator for latency-based routing and regional failover
- Lambda for our Edge compute
- DynamoDB global tables for our Edge datastore
This meant a fundamental change to how we pay for serving our requests.
Our prior infrastructure was based around a chonky RDS instance in the AWS London region, and elastic scaling of our ECS instances based on CPU load. This meant we had a big upfront fixed cost (The RDS reserved instance) and then smaller variable costs for our ECS cluster.
With the introduction of our Edge API, we were effectively going to a fully serverless architecture, both on the compute side and the data side.
Lambda Compute. What we pay!
We started off working with Lambda@Edge, but there were a number of caveats that eventually meant we gave up on using it. We decided instead to work with Lambda directly, and use Github Actions, Serverless framework and Pulumi to roll out updates to 8 regions (why 8? It’s DynamoDb related as discussed below!).
You really have two dials you can twiddle to optimise your Lambda costs:
- Memory size
- CPU architecture
We started out using 2GB instances on the amd64 architecture. Sizing lambdas is interesting - the more memory you define, the more compute you get, but the more you pay. Because we cared a lot about the latency and performance of these lambdas, we went live with 2GB. Once we had a decent level of traffic and data, we dialed the memory size down bit by bit to see how much it affected performance. We settled on 1GB as the right balance between oversizing the lambdas, and making them too small that they impacted the overall latency of the response.
We started off running our lambdas with amd64/Intel runners, but tested arm/Graviton instances, liked what we saw, and moved over to them once we were happy with the tests we carried out. This reduced our costs by about 30% with no loss of performance!
We are currently paying ~$3USD per million requests served for our Lambda Compute. Because we care a lot about latency, we do have provisioned concurrency enabled for these lambdas to reduce cold-starts. We do expect this number to come down as we scale, but not by much.
DynamoDB Global Tables. What we pay!
Because we need an Edge solution to achieve low latency, we decided we needed to replicate our data around the world. This avoided a bunch of complexity around caching, cache invalidation and all those hard problems. We evaluated a bunch of options and settled on DynamoDB global tables.
One oddity of this solution is that global tables are only available in 11 of the AWS regions. When we launched, there were only 8 regions available. We decided to deploy our data and compute in all 8 of these regions and see what the costs came out at with our production traffic workload.
How much does it cost? Pretty much exactly double our Lambda Compute costs. So ~$6 USD per million requests served. You can see how this splits out in the image below. Because our platform is very very read heavy, the bulk of the cost is in reads and replication.
There’s one thing that we plan on implementing in the near future, and that is DAX - a transparent caching layer that will hopefully bring these read costs down as we scale.
Global Accelerator. What we pay!
We get a lot out of Global Accelerator - it’s a great product and we really love it - we get latency based routing and global failover without having to ever worry about it! What does that cost? About 20 cents per million requests!
What about serving all that data?
Data transfer. That’s where AWS always get people right? For our workload, its pretty reasonable. Generally our responses are fairly small, and we do all the good stuff like gzipping and whatnot. What does data cost for us? About $1.50 per million requests.
So, are we happy with that?
Generally, yes! For a small team like ours, never having to worry about scaling or failover ever again has an enormous amount of value. Yes, we could probably power this SDK more cheaply using things like ECS and RDS, but we will invariably hit scaling limits with our database meaning upgrades, downtime and a bunch of other hairy problems. Moving to essentially a serverless database does cost us more per month. But we’re happy with what that gets us!