🌐
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring provisioned concurrency for a function
Configuring provisioned concurrency for a function - AWS Lambda
After this, the function can continue to scale on standard, unreserved concurrency if you haven't reached your account concurrency limit. When utilization drops and stays low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps. Both of the Application Auto Scaling alarms use the average statistic by default. Functions that experience quick bursts of traffic may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e.
🌐
AWS
aws.amazon.com › blogs › compute › scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage
Scheduling AWS Lambda Provisioned Concurrency for recurring peak usage | Amazon Web Services
August 13, 2020 - Application Auto Scaling allows you to configure automatic scaling for different resources, including Provisioned Concurrency for Lambda. You can scale resources based on a specific CloudWatch metric or at a specific date and time.
🌐
GitHub
github.com › aws-samples › aws-lambda-autoscale-provisioned-concurrency-example
GitHub - aws-samples/aws-lambda-autoscale-provisioned-concurrency-example: Sample to demonstrate how to observe and fine tune autoscale AWS Lambda provisioned concurrency
Provisioned concurrency is a way to prepare certain number of AWS Lambda execution environments in advance to respond immediately to incoming requests. It’s a way to minimise impact of a cold start on response latency.
Starred by 7 users
Forked by 2 users
Languages   TypeScript 82.1% | JavaScript 12.3% | Shell 5.6%
🌐
Ran The Builder
ranthebuilder.cloud › post › optimize-aws-lambda-with-dynamic-provisioned-concurrency
Optimize AWS Lambda with Dynamic Provisioned Concurrency
July 8, 2024 - Learn more: ... Target Tracking ... metric. According to this policy, the number of provisioned concurrency instances is scaled to maintain the instance level aligned with the target value....
🌐
Medium
georgemao.medium.com › understanding-lambda-provisioned-concurrency-autoscaling-735eb14040cf
Lambda Provisioned Concurrency AutoScaling is Awesome. Make sure you understand how it works! | by George Mao | Medium
September 3, 2020 - The shorter your functions run, the less concurrency you need. You should use Auto scaling to optimize your PC costs. There’s a feature in Auto scaling called TargetTracking. This flavor of auto scaling tries to maintain a utilization percentage you specify. When you enable this feature two CloudWatch alarms are deployed and both monitor the ProvisionedConcurrencyUtilization metric: A scale up alarm that requires 3 data points over 1 minute each
🌐
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling
Understanding Lambda function scaling - AWS Lambda
For each concurrent request, Lambda provisions a separate instance of your execution environment. As your functions receive more requests, Lambda automatically handles scaling the number of execution environments until you reach your account's concurrency limit.
🌐
Olio Apps
olioapps.com › blog › how-we-solved-aws-lambda-scaling-with-auto-scaling-and-provisioned-concurrency
How We Solved AWS Lambda Scalaing with Auto Scaling and Provisioned Concurrency
April 30, 2025 - Provisioned concurrency is configured for high-traffic endpoints. CloudWatch alarms monitor usage and trigger scaling actions. We maintain a buffer by targeting 70% utilization—this keeps 30% of provisioned Lambdas in reserve for surges.
🌐
Reddit
reddit.com › r/aws › lambda provisioned concurrency
r/aws on Reddit: Lambda provisioned concurrency
July 3, 2023 -

Hey, I'm a huge serverless user, I've built several applications on top of Lambda, Dynamo, S3, EFS, SQS, etc.

But I have never understood why would someone use Provisioned Concurrency, do you know a real use case for this feature?

I mean, if your application is suffering due to cold starts, you can just use the old-school EventBridge ping option and it costs 0, or if you have a critical latency requirement you can just go to Fargate instead of paying for provisioned concurrency, am I wrong?

🌐
Luminis
luminis.eu › home › blog › aws lambda provisioned concurrency autoscaling with aws cdk
AWS Lambda Provisioned Concurrency AutoScaling with AWS CDK - Luminis
June 7, 2024 - If everything is set up correctly the scaling policy should trigger application autoscaling to scale up the number of provisioned concurrent lambda functions.
Find elsewhere
🌐
Serverless
serverless.com › plugins › serverless-provisioned-concurrency-autoscaling
Serverless Provisioned Concurrency Autoscaling - Serverless Framework: Plugins | Serverless Framework
Add concurrencyAutoscaling parameters under each function you wish to autoscale in your serverless.yml. Add customMetric: true if you want to use Maximum instead of Average statistic. functions: hello: handler: handler.hello provisionedConcurrency: ...
🌐
How-To Geek
howtogeek.com › home › cloud › how to optimize aws lambda functions with provisioned concurrency & auto scaling
How to Optimize AWS Lambda Functions with Provisioned Concurrency & Auto Scaling
July 9, 2023 - --scalable-dimension lambda:function:ProvisionedConcurrency · Then, you can enable an auto scaling policy, using the function name and alias as the resource ID, and configuring it with a JSON scaling policy.
🌐
Roope
roope.sh › blog › scaling-down-lambda-provisioned-concurrency
Automatically Scaling Down Lambda Provisioned Concurrency | Roopesh
With the solution confirmed to be working, we updated our Terraform Lambda module to apply this change to all Lambdas. You can find a snippet of the Terraform resource aws_appautoscaling_policy here. Now, even when there are no incoming requests, the CloudWatch alarm for provisioned concurrency autoscaling triggers using the new metric, automatically scaling down and leading to lower cloud bills!
Top answer
1 of 2
9

Lambda automatically scales out for incoming requests, if all existing execution contexts (lambda instances) are busy. There is basically nothing you need to do here, except maybe set the maximum allowed concurrency if you want to throttle.

As a result of that, there is no integration with AutoScaling, but you can still use an Application Load Balancer to trigger your Lambda Function if that's what you're after.

If you're building a purely serverless application, you might want to look into the API Gateway instead of the ALB integration.


Update

Since you've clarified what you want to use auto scaling for, namely changing the provisioned concurrency of the function, there are ways to build something like that. Clément Duveau has mentioned a solution in the comments that I can get behind.

You can create a Lambda Function with two CloudWatch events triggers with Cron-Expressions. One for when you want to scale out and another one for when you want to scale in.

Inside the lambda function you can use the name of the rule that triggered the function to determine if you need to do a scale out or scale in. You can then use the PutFunctionConcurrency API-call through one of the SDKs mentioned at the bottom of the documentation to adjust the concurrency as you see fit.


Update 2

spmdc has mentioned an interesting blog post using application auto scaling to achieve this, I had missed that one - you might want to check it out, looks promising.

2 of 2
0

The proper way to scale AWS Lambda provisioned concurrency is by using Application Auto Scaling.

At the time of writing, you cannot do so through the AWS console, but you can use AWS CLI by passing the --schedule flag as follows:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --resource-id function:my-function:BLUE \
  --min-capacity 0 \
  --max-capacity 100

If you want to scale up only during peak hours, you can run the following:

aws application-autoscaling put-scheduled-action --service-namespace lambda \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --resource-id function:CreateOrder:prod \
  --scheduled-action-name scale-out \
  --schedule "cron(45 11 * * ? *)" \
  --scalable-target-action MinCapacity=250

More details from the official AWS blog post.

🌐
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring provisioned concurrency for a function
Provisioned Concurrency scaling example - AWS Lambda
After this, the function can continue to scale on standard, unreserved concurrency if you haven't reached your account concurrency limit. When utilization drops and stays low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps. Both of the Application Auto Scaling alarms use the average statistic by default. Functions that experience quick bursts of traffic may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e.
Top answer
1 of 2
1
What is your rate of invocations? Is it more than 5000/sec? If so, you are hitting the Invocations per Second limit, which is set to 10 times the number of configured provisioned concurrency. In your case 10*500=5000 invocations/sec.
2 of 2
1
The presence of spillover invocations in your scenario indicates that your provisioned concurrency (PC) is not sufficient to handle the current load. While the PC utilization is only at 28.5%, it's important to note that this metric represents the ratio of the provisioned concurrent executions being used to the total provisioned concurrency. It doesn't necessarily reflect the actual demand or the number of concurrent invocations at any given moment. In your case, the load test shows that you had a peak of 189 concurrent executions, but your provisioned concurrency was set to 500. This means that during the test, there were instances where the available provisioned concurrency was fully utilized, resulting in spillover invocations. These spillover invocations occur when the provisioned concurrency is exhausted, and additional requests cannot be immediately served by existing instances. Cold starts can still happen with provisioned concurrency, but their occurrence is minimized compared to using on-demand concurrency. When a cold start occurs, it means that the lambda function needs to initialize a new execution environment to handle the incoming request. With provisioned concurrency, you can pre-warm a certain number of instances to minimize the impact of cold starts, but if the demand exceeds the provisioned concurrency, spillover invocations may experience cold starts. To address the spillover invocations and potential cold starts, you have a few options: Increase Provisioned Concurrency: If the load test consistently exceeds the provisioned concurrency, consider increasing the provisioned concurrency limit to better accommodate the peak demand and minimize spillover invocations. Adjust Auto Scaling Parameters: Review your auto scaling configuration and ensure that the min and max capacity are set appropriately. If the current settings are not effectively scaling to meet the demand, you may need to fine-tune these parameters to better align with your application's requirements. Monitor and Analyze Load Patterns: Understand the patterns and fluctuations in your application's load. Analyze the metrics over time to identify peak usage periods and adjust your provisioned concurrency and auto scaling settings accordingly. By optimizing the provisioned concurrency and auto scaling parameters based on your application's load patterns, you can better utilize provisioned concurrency and minimize spillover invocations and potential cold starts.
🌐
Dashbird
dashbird.io › home › knowledge base › aws lambda › provisioned concurrency
AWS Lambda Provisioned Concurrency | Dashbird
June 29, 2021 - The Provisioned Concurrency level ... to the account regional limits5. It is possible to use Application Auto Scaling6 to automatically scale up and down the concurrency provisioned threshold....
🌐
Serverless
serverless.com › blog › aws-lambda-provisioned-concurrency
Provisioned Concurrency: What it is and how to use it with the Serverless Framework
This setting can be made very simply in the AWS Console. Go to the function in the Lambda service, scroll all the way to the bottom and set it at what you want the minimum provisioned concurrency to always be.
🌐
Build with Jeroen
jeroenreijn.com › 2022 › 07 › aws-lambda-provisioned-concurrency-autoscaling-configuration-with-aws-cdk.html
AWS Lambda Provisioned Concurrency AutoScaling configuration with AWS CDK
July 11, 2022 - Through the AWS CDK, there are two ways of configuring autoscaling for the provisioned concurrency configuration of our function. ... So let’s explore both options. The Function Alias has a short-hand method for configuring provisioned concurrency scaling. You can do this by calling the .addAutoScaling method on the Alias. Adding a scaling strategy on the alias is pretty straight forward. You can use both scaling on utilization and scale by schedule.
🌐
Quintagroup
quintagroup.com › blog › aws-lambda-provisioned-concurrency-auto-scaling
AWS Lambda Provisioned Concurrency: Auto Scaling — Quintagroup
June 14, 2023 - provisioned_concurrent_executions - (Required) Amount of capacity to allocate. Must be greater than or equal to 1. qualifier - (Required) Lambda Function version or Lambda Alias name. Auto Scaling will save huge amounts of your money while using AWS Lambda Provisioned Concurrency.