AWS ECS Fargate- Compute Capacity Planning

14 min readNov 2, 2021

Compute Sizing for AWS ECS Fargate, balancing act between cost and performance.

Context:

Capacity planning for an application running on cloud ? Does it not sound absurd? Because Cloud is all about elasticity, auto-scaling. And especially when the ECS Fargate is a Managed & Serverless offering from AWS, does it really make sense to discuss capacity planning for ECS Fargate?

Well, answer is No and Yes.

If your application is light weight, less used and you are not too sensitive about billing and performance, then the answer is NO, you don’t have to bother about the capacity planning for application running on ECS Fargate, at least until you are facing any issue in the Production environment.

But, if your application is heavily used with unpredictable/random load, And you are sensitive about the performance ( how quickly your container will spawn off) and with tight billing budget, then you should have some capacity planning exercise during Design phase of your project and during the rest of the phases of your project, to bring some balance between cost and performance.

In this article we will discuss what are the factors/features of AWS Fargate to consider/utilize while doing capacity planning. We also will see what could be the different steps and consideration for capacity planning while either creating a new application or re-platforming an existing application to ECS Fargate.

In the first few sections we will try to recap/touch what are different terms around the compute unit in AWS/Docker world, how Auto scaling in Fargate can be implemented and few other basic stuff associated with the topic. And then we will understand what are the steps we can follow to do proper capacity sizing of containers running in ECS Fargate for a freshly built application or for a re-host/re-platform application into ECS Fargate.

* This article only cover ECS on Fargate, and does not cover ECS on EC2 or EKS. But most of the concept are common and applicable to other container platform as well.
* This article does not cover capacity planning of Network bandwidth and storage, this article only considering CPU and Memory capacity planning

Why we Should do Compute Capacity planning for Fargate ?

Though the AWS ECS Fargate is a managed and serverless offering, still you have to keep below points in mind, to convince yourself to do some level of capacity planning for your container level compute consumption & allocation.

AWS Manage the HW sizing of the Container Platform not for the container, and you are responsible to specify the compute size of your task/container.

Auto-scale scales the number not the size !! The auto-scaling increases the or decreases number of container/task/pod, but size of individual pod/container is very important to get the right balance between performance and cost.

Same size does not fit all !! Container run time computing size depends on the what web server, what runtime platform, which loading library getting used and your application’s memory/CPU footprint, etc.

Balancing act between performance and cost !! So too small size of a single container may result in performance bottleneck, too frequent auto scaling up, and even may cause crashing of application, at the same time too big sized container may cause higher billing due to underutilized computing resource.

Some computing unit/term in AWS and Docker world:

The number of CPU units used by the task can be expressed as an integer using CPU units, for example 1024, or as a string using vCPUs, for example ‘1 vCPU’ or ‘1 vcpu’.

Please note, in AWS
1 CPU Core (physical core) = 2 vCPU
1 vCPU = 1024 CPU units

For some AWS instance types 1 Core =1vCPU. Means single core has one processing Thread. So please check your ECS/EKS underneath cluster instance type, accordingly you plan the rest.
In DataDog, cpu utilization shows in mcore units, which is Mili-Core. 1 Core = 1000 mcore. For EKS/ECS instance type if 1 core is 1 vCPU then, 1 vCPU= 1000 mcore, but for that instance type if 1 core = 2 vCPU then, 1 vCPU= 500 mcore,

Amazon ECS task definitions for Fargate require that you specify CPU and memory at the task level. Although you can also specify CPU and memory at the container level for Fargate tasks, this is optional. Most use cases are satisfied by only specifying these resources at the task level.

Capacity reservation at Task level from ECS console

Capacity reservation at Container level from ECS console

Many Small containers vs Few Large containers:

The question always comes in mind is whether I should go for many small containers and do auto scaling, or go with few large size containers. When those question comes in mind, below points may help you to decide.

Cost is not the differentiator !!

When the total required compute size is fixed, cost is not a differentiating factor for having small numbers of big container vs big numbers of small containers.

For a fixed computing need, Monthly billing is same for different size of container and number combination.

Too small sized container? Be ready for Container cold start latency !!

Too small sized container may result into frequent auto scaling which bring the momentary overall slow performance of the whole service due to container cold start latency for new spawned off container/pod/task.

Lower sized container may also cause performance issue due to less breathing space, frequent garbage collection, etc.

Too big sized container? Hope you are not wasting compute resource!!

Too big sized container may result into waste of compute resource. Application/service running in a container should utilize at least 85% of the compute resource in average over a period. Otherwise, you are going to pay more while not using the provisioned capacity in its fullest.

So, finding the appropriate number of containers and size of each container is crucial for having a balance between performance, Usage & billing.

Suggested CPU vs Memory ratio

The table below shows the valid combinations of task-level CPU and memory. It is recommended to follow this CPU vs Memory capacity ratio while planning for capacity for ECS tasks.

Ref: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html#fargate-tasks-size

Typical Fargate Task compute sizing as starting point

Below is the typical compute sizing as a starting point reference, in case you don’t want or don’t have a way to do upfront capacity sizing and ok to go in the path of trial and error.

starting point reference capacity sizing

Again, the above table is just a reference starting point. For near-real sizing please do proper sizing (mentioned below in this article), performance testing, and monitoring

Now you may ask what is the definition of Small/Mid/Large application? Below what can be followed, but you may set your own guideline based upon the applications you play/experienced with.

Autoscaling Services

While you are planning for compute capacity for tasks/containers, it should have a strong support of auto-scaling, to use the capacity in an optimized manner and to bring a balance between cost and performance.

Autoscaling is very important to making sure that your services stay online when traffic increases unexpectedly. In AWS Fargate one way to ensure your service autoscales is to increase and decrease the number of copies of your application container that are running in the cluster.

Default existing mechanism of AWS ECS Fargate auto scaling implementation

For details know how of ECS Fargate service auto scaling, please refer below link.

Service auto scaling

Automatic scaling is the ability to increase or decrease the desired count of tasks in your Amazon ECS service…

docs.aws.amazon.com

Configuring AutoScaling from AWS ECS console

To refer sample CloudFormation template to Scale service up and down based on CPU usage or memory usage, please refer below link.

https://containersonaws.com/architecture/autoscaling-service-containers/

Capacity Provider:

Amazon ECS capacity providers are used to manage the infrastructure the tasks in your clusters use.

It has great significance for ECS on EC2, but for Fargate it is already managed by AWS. FargateProvider is the default capacity provider for ECS running in Fargate mode, in rare scenarios we choose Fargate Spot as capacity provider.

As this article focuses on the ECS Fargate and not the ECS on EC2, hence we will not go into details of Capacity Provider.

How to Calculate Capacity requirement ?

Unfortunately there’s no one-size-fits-all answer here and it’s dependent on the characteristics of your application component, acceptable performance, cost and so on. Sorry.

Here are some basic ways to do rough capacity planning for ECS Fargate. and for 2 different scenarios.

1: Before going to the Performance testing stage (while in DEV, ST, SIT, etc.) Or you are not planning for any performance testing before going to the production

2. After the Performance test, but before going to the production.

Before going to the Performance Testing:

Below explain a very basic way to do capacity planning and may not be perfect for your use case/scenario. But you will get a sense of capacity.

The very basic way is to check the memory usage while you’re coding the app and do CPU proportionate ratio calculation against that memory.

STEP 1 : Check the memory utilization for a single user during development. You have not deployed your application in DEV Fargate cluster, but you are mostly done with the code/build. Check the memory consumption of your application for a single user. Below are some tools to check the memory utilization .NET or Java application, similar are available for other computer programing languages.

— .NET Memory Profiler
— Visual Studio Diagonistic Tool ( https://docs.microsoft.com/en-us/visualstudio/profiling/memory-usage?view=vs-2019 )
— .NET Object Allocation tool (https://docs.microsoft.com/en-us/visualstudio/profiling/dotnet-alloc-tool?view=vs-2019)
— JConsole
— jvisualvm
— Cloudwatch Application insight

STEP 2: Multiply that memory utilization with your predicted number of concurrent users. That is your total memory capacity requirement. You can add 10% and 30% buffers to calculate safe base and peak memory requirement.

STEP 3: Now find the CPU from the below sample CPU vs Memory ratio against the nearest value for Memory capacity found from STEP 2.

STEP 4: Determine number of containers (tasks) to fulfill above CPU and Memory requirement. The minimum number of tasks should be 2.

Example : While a developer is building any application using IDE in their desktop and the application’s major functionalities completed, test those major happy paths for a 10 minutes, and monitor delta memory consumption increase, using the different desktop based diagnostic tool.

Let’s assume that the delta value is 40MB. And your NFR is for 50 concurrent users. So the memory requirement is 50 users * 40MB = 2000 MB = ~2 GB. If you add a 50% buffer then it is ~3 GB. Now keep additional 0.5 GB aside for the .net platform or JRE to consume or some side-car monitoring container (datadog, NewRelic, etc) or any other container associated processes to consume. So in total 3 + 0.5 = ~4 GB is the memory requirement.

So if we go by the above CPU vs Memory ratio table then 1 vCPU CPU capacity is safe one.

So we will go with 2 tasks each of 2 GB Memory and 0.5 vCPU. ( Which comes to our total requirement of ~4GB RAM, 1 vCPU compute)

After Performance testing, but before going to the production:

This one is very straight forward, totally based upon the actual behavior.

Before starting the performance testing in Perf/Stage/SIT/UAT environment, make sure ECS platform monitoring tool is in place. e.g. Prometheus/Grafana, New Relic, DataDog, CloudWatch Container Insight, etc.

During/Post performance testing, check the CPU and Memory utilization at Task level and how the existing auto scaling is being utilized. accordingly right-sized the capacity and auto scaling parameter.

This is also true for post production monitoring and right-sizing exercise.

High level capacity sizing steps for application migration:

When you are migrating your application to a container platform (as a re-platform), As you have something in place to analyze and compare, it becomes relatively easy to do capacity planning for the ECS Fargate task. Below are some high level steps you can follow for capacity sizing.

Measure current utilization : Measure the current utilization of compute resources. Don’t take what is the full capacity available of the box/server/instance/host, rather concentrate on monthly average compute usage. Talk to your sysadmin to get those data.
If you want to feel safe, you can add a 10% buffer on the current compute usage value. But not necessary. But actually you can reduce 10%, as the compute consumption in your old platform also involves OS and other OS related processes. But let’s go with the same current consumption value.
Sizing of task/container: Now based on the past compute consumption data, plan required number of tasks and their size. At a minimum 2 tasks, and CPU and Memory combination as mentioned in the above section of this article.
Auto scaling : Utilizing the past compute consumption trend set the auto scaling pattern. You can follow the below pattern

— Desired Number of Task : max ( 2, Lowest Monthly average consumption in last one whole year / size of each task)
— Minimum Number of Task : max ( 2, Lowest Daily consumption in last 7 days / size of each task)
— Maximum Number of Task : max ( 2, Highest Monthly average consumption in last one whole year / size of each task) + 2

Performance Test, Usage Monitoring and Re-Size: set up platform monitoring tools (CloudWatch Container Insight, AppDynamic, DataDog, etc.) in PERF and PROD environments. Perform a performance test and resize the task based on the compute consumption analysis during the performance test.

High level capacity sizing steps for greenfield application:

For a newly built application to be hosted in EWS Fargate, it is challenging to do upfront sizing calculation. Same challenge exists even for an existing application which is sharing infrastructure with many other applications, also getting compute consumption for that single application is tough. But still with some basic calculation you can start with near accurate sizing. Below are the high level steps.

NFR : Define your NFR, how many concurrent users? how long a single User will stay in the application in one session, what are the happy path/functionality an User will perform in one session.
Calculate compute usage while Developing : Using different Development tools add-ons calculate the compute usage for a single user of the application and accordingly calculate the whole compute requirement for your application based on NFR. Please refer to the section “How to Calculate Capacity Requirement ?” in this article for details.
Typical sizing: In case if you don’t want to go in the path of development tool based capacity planning, or you don’t have much time to do so, then you can go with typical sizing based on your application usage requirement. For details please refer to the section “Typical Fargate Task compute sizing as starting point”, in this article.
Auto scaling : Utilizing the calculated compute consumption requirement to set the auto scaling pattern. You can follow the below pattern

— Desired Number of Task : max ( 2, Average consumption based on NFR/ size of each task)
— Minimum Number of Task : max ( 2, Lowest consumption based on NFR / size of each task)
— Maximum Number of Task : max ( 2, Highest consumption based on NFR / size of each task) + 2

Performance Test, Usage Monitoring and Re-Size: set up platform monitoring tools (CloudWatch Container Insight, AppDynamic, DataDog, etc.) in PERF and PROD environments. Perform a performance test and resize the task based on the compute consumption analysis during the performance test.

Best Practices:

Always keep Auto scaling of Service in place, only doing capacity sizing of tasks is not enough unless you have auto-scaling in place.
Keep CPU and memory at the task level 10% more than the combined CPU and memory at all containers within that task
Even if you do initial capacity planning before going to production, You have to monitor your container to see how it behaves, and resize the capacity of task/container. and it is a continuous process.
Ensure performance testing planned and executed before going to production
Container platform monitoring tools are in place in lower environments, not only in PROD environments.
Don’t try to incline towards too much for a lower sizing task/container with implementation of auto scale-out too much. It may look best from a cost perspective as you want to start with low sizing tasks and on-demand increase many small-small tasks as needed. But remember auto scale out comes with a performance delay time to provision each task/container, so when you need capacity to fulfill additional user requests, you will get some invocation delay. Keep a balance between number of tasks vs compute size in each task.
Too many small size tasks/containers (with auto scaling support) may be lucrative from a cost perspective, but may have impact in application performance, as low memory allocation may result in too many garbage collection and hence impact in app performance. So, suppose you have an estimated requirement of total 10 GB memory capacity, in that case 5 containers/tasks each of 2 GB memory is better than 10 tasks/containers each of 1 GB. Keep a balance between number of tasks vs compute size in each task.
Based on your need, determine the scaling policy option between Step Scaling vs Target Tracking.
Please remember in ECS, auto scaling is set at Service level, not at Task level. So if you are planning for back end scheduled job via non service based task, there auto scaling is not applicable
For some Service Auto Scaling Considerations please refer below link
https://docs.aws.amazon.com/AmazonECS/latest/userguide/service-autoscaling-stepscaling.html
Auto Scaling consideration — Set appropriate Scale-in and scale-out cooldown period/time of auto scaling based the initialization time of your image/container.
Auto Scaling consideration — Set the appropriate Service Health check grace period based on the time your task requires to be up and responding to ALB. This grace period can prevent the ECS service scheduler from marking tasks as unhealthy and stopping them before they have time to come up
For Java and .NET application, don’t go below 2GB memory and 0.5 vCPU per task. unless it is very very light weight and simple application.
Better to have one container per task.
Based on your application nature, set the auto scaling based on CPU and Memory utilization target . If your application is computation intensive then choose CPU utilization based auto scaling, if you think your application is memory intensive then set our auto scaling based on memory utilization target.

Note: Opinions and approaches expressed in this article are solely my own and do not express the views or opinions of my employer, AWS or any other organization.
Some of the product names, logos, brands, diagram are property of their respective owners
Please: Post your comments to express your view where you agree or disagree, and to provide suggestions.