lambda provisioned concurrency auto scaling - Brave Search

Amazon Web Services

docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring provisioned concurrency for a function

Configuring provisioned concurrency for a function - AWS Lambda

After this, the function can continue to scale on standard, unreserved concurrency if you haven't reached your account concurrency limit. When utilization drops and stays low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps. Both of the Application Auto Scaling alarms use the average statistic by default. Functions that experience quick bursts of traffic may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e.

aws.amazon.com › blogs › compute › scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage

Scheduling AWS Lambda Provisioned Concurrency for recurring peak usage | Amazon Web Services

August 13, 2020 - Application Auto Scaling allows you to configure automatic scaling for different resources, including Provisioned Concurrency for Lambda. You can scale resources based on a specific CloudWatch metric or at a specific date and time.

Videos

Understanding AWS Lambda scaling and throughput - YouTube

November 17, 2022

AWS Lambda Provisioned Concurrency | Lambda Scaling and Concurrency ...

December 17, 2019

AWS Lambda Concurrency | Reserved Concurrency | Provisioned ...

August 23, 2020

NO MORE COLD STARTS IN SERVERLESS APPS: Using Lambda Provisioned ...

December 4, 2019

How does AWS Lambda Concurrency Work? - YouTube

February 9, 2025

github.com › aws-samples › aws-lambda-autoscale-provisioned-concurrency-example

GitHub - aws-samples/aws-lambda-autoscale-provisioned-concurrency-example: Sample to demonstrate how to observe and fine tune autoscale AWS Lambda provisioned concurrency

Provisioned concurrency is a way to prepare certain number of AWS Lambda execution environments in advance to respond immediately to incoming requests. It’s a way to minimise impact of a cold start on response latency.

Starred by 7 users

Forked by 2 users

Languages TypeScript 82.1% | JavaScript 12.3% | Shell 5.6%

Ran The Builder

ranthebuilder.cloud › post › optimize-aws-lambda-with-dynamic-provisioned-concurrency

Optimize AWS Lambda with Dynamic Provisioned Concurrency

July 8, 2024 - Learn more: ... Target Tracking ... metric. According to this policy, the number of provisioned concurrency instances is scaled to maintain the instance level aligned with the target value....

georgemao.medium.com › understanding-lambda-provisioned-concurrency-autoscaling-735eb14040cf

Lambda Provisioned Concurrency AutoScaling is Awesome. Make sure you understand how it works! | by George Mao | Medium

September 3, 2020 - The shorter your functions run, the less concurrency you need. You should use Auto scaling to optimize your PC costs. There’s a feature in Auto scaling called TargetTracking. This flavor of auto scaling tries to maintain a utilization percentage you specify. When you enable this feature two CloudWatch alarms are deployed and both monitor the ProvisionedConcurrencyUtilization metric: A scale up alarm that requires 3 data points over 1 minute each

Amazon Web Services

docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling

Understanding Lambda function scaling - AWS Lambda

For each concurrent request, Lambda provisions a separate instance of your execution environment. As your functions receive more requests, Lambda automatically handles scaling the number of execution environments until you reach your account's concurrency limit.

olioapps.com › blog › how-we-solved-aws-lambda-scaling-with-auto-scaling-and-provisioned-concurrency

How We Solved AWS Lambda Scalaing with Auto Scaling and Provisioned Concurrency

April 30, 2025 - Provisioned concurrency is configured for high-traffic endpoints. CloudWatch alarms monitor usage and trigger scaling actions. We maintain a buffer by targeting 70% utilization—this keeps 30% of provisioned Lambdas in reserve for surges.

reddit.com › r/aws › lambda provisioned concurrency

r/aws on Reddit: Lambda provisioned concurrency

July 3, 2023 -

Hey, I'm a huge serverless user, I've built several applications on top of Lambda, Dynamo, S3, EFS, SQS, etc.

But I have never understood why would someone use Provisioned Concurrency, do you know a real use case for this feature?

I mean, if your application is suffering due to cold starts, you can just use the old-school EventBridge ping option and it costs 0, or if you have a critical latency requirement you can just go to Fargate instead of paying for provisioned concurrency, am I wrong?

pings won't save you from cold starts. if the workload just crosses what the current capacity can handle, a new instance will be warmed up. you have no control over whether it will be a ping or an actual user. pinging works as long as one single instance can serve all demands. fargate requires 24/7 running tasks, because the startup times are even worse than lambda's. if you want 24/7 running tasks together with scaling and all, sure, do that, but it requires a whole lot more setup.

I mean, if your application is suffering due to cold starts, you can just use the old-school EventBridge ping option and it costs 0 This isn't nearly as effective as there's no real way to make EventBridge keep 100 or 1000 or more environments warm. If you have a very low traffic application maybe this method still makes sense, but for anything else PC is going to be more reliable

luminis.eu › home › blog › aws lambda provisioned concurrency autoscaling with aws cdk

AWS Lambda Provisioned Concurrency AutoScaling with AWS CDK - Luminis

June 7, 2024 - If everything is set up correctly the scaling policy should trigger application autoscaling to scale up the number of provisioned concurrent lambda functions.

Find elsewhere

Google Bing Mojeek

serverless.com › plugins › serverless-provisioned-concurrency-autoscaling

Serverless Provisioned Concurrency Autoscaling - Serverless Framework: Plugins | Serverless Framework

Add concurrencyAutoscaling parameters under each function you wish to autoscale in your serverless.yml. Add customMetric: true if you want to use Maximum instead of Average statistic. functions: hello: handler: handler.hello provisionedConcurrency: ...

howtogeek.com › home › cloud › how to optimize aws lambda functions with provisioned concurrency & auto scaling

How to Optimize AWS Lambda Functions with Provisioned Concurrency & Auto Scaling

July 9, 2023 - --scalable-dimension lambda:function:ProvisionedConcurrency · Then, you can enable an auto scaling policy, using the function name and alias as the resource ID, and configuring it with a JSON scaling policy.

roope.sh › blog › scaling-down-lambda-provisioned-concurrency

Automatically Scaling Down Lambda Provisioned Concurrency | Roopesh

With the solution confirmed to be working, we updated our Terraform Lambda module to apply this change to all Lambdas. You can find a snippet of the Terraform resource aws_appautoscaling_policy here. Now, even when there are no incoming requests, the CloudWatch alarm for provisioned concurrency autoscaling triggers using the new metric, automatically scaling down and leading to lower cloud bills!

stackoverflow.com › questions › 65714717 › how-to-auto-scale-an-aws-lambda-using-auto-scaling-service

How to auto-scale an AWS lambda using auto-scaling service - Stack Overflow

Lambda automatically scales out for incoming requests, if all existing execution contexts (lambda instances) are busy. There is basically nothing you need to do here, except maybe set the maximum allowed concurrency if you want to throttle.

As a result of that, there is no integration with AutoScaling, but you can still use an Application Load Balancer to trigger your Lambda Function if that's what you're after.

If you're building a purely serverless application, you might want to look into the API Gateway instead of the ALB integration.

Update

Since you've clarified what you want to use auto scaling for, namely changing the provisioned concurrency of the function, there are ways to build something like that. Clément Duveau has mentioned a solution in the comments that I can get behind.

You can create a Lambda Function with two CloudWatch events triggers with Cron-Expressions. One for when you want to scale out and another one for when you want to scale in.

Inside the lambda function you can use the name of the rule that triggered the function to determine if you need to do a scale out or scale in. You can then use the PutFunctionConcurrency API-call through one of the SDKs mentioned at the bottom of the documentation to adjust the concurrency as you see fit.

Update 2

spmdc has mentioned an interesting blog post using application auto scaling to achieve this, I had missed that one - you might want to check it out, looks promising.

The proper way to scale AWS Lambda provisioned concurrency is by using Application Auto Scaling.

At the time of writing, you cannot do so through the AWS console, but you can use AWS CLI by passing the --schedule flag as follows:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --resource-id function:my-function:BLUE \
  --min-capacity 0 \
  --max-capacity 100

If you want to scale up only during peak hours, you can run the following:

aws application-autoscaling put-scheduled-action --service-namespace lambda \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --resource-id function:CreateOrder:prod \
  --scheduled-action-name scale-out \
  --schedule "cron(45 11 * * ? *)" \
  --scalable-target-action MinCapacity=250

More details from the official AWS blog post.

Amazon Web Services

docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring provisioned concurrency for a function

Provisioned Concurrency scaling example - AWS Lambda

After this, the function can continue to scale on standard, unreserved concurrency if you haven't reached your account concurrency limit. When utilization drops and stays low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps. Both of the Application Auto Scaling alarms use the average statistic by default. Functions that experience quick bursts of traffic may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e.

repost.aws › questions › QUcokiH6lITPSP5SxILk1VKg › lambda-provisioned-concurrency-metrics

Lambda Provisioned Concurrency Metrics | AWS re:Post

What is your rate of invocations? Is it more than 5000/sec? If so, you are hitting the Invocations per Second limit, which is set to 10 times the number of configured provisioned concurrency. In your case 10*500=5000 invocations/sec.

The presence of spillover invocations in your scenario indicates that your provisioned concurrency (PC) is not sufficient to handle the current load. While the PC utilization is only at 28.5%, it's important to note that this metric represents the ratio of the provisioned concurrent executions being used to the total provisioned concurrency. It doesn't necessarily reflect the actual demand or the number of concurrent invocations at any given moment. In your case, the load test shows that you had a peak of 189 concurrent executions, but your provisioned concurrency was set to 500. This means that during the test, there were instances where the available provisioned concurrency was fully utilized, resulting in spillover invocations. These spillover invocations occur when the provisioned concurrency is exhausted, and additional requests cannot be immediately served by existing instances. Cold starts can still happen with provisioned concurrency, but their occurrence is minimized compared to using on-demand concurrency. When a cold start occurs, it means that the lambda function needs to initialize a new execution environment to handle the incoming request. With provisioned concurrency, you can pre-warm a certain number of instances to minimize the impact of cold starts, but if the demand exceeds the provisioned concurrency, spillover invocations may experience cold starts. To address the spillover invocations and potential cold starts, you have a few options: Increase Provisioned Concurrency: If the load test consistently exceeds the provisioned concurrency, consider increasing the provisioned concurrency limit to better accommodate the peak demand and minimize spillover invocations. Adjust Auto Scaling Parameters: Review your auto scaling configuration and ensure that the min and max capacity are set appropriately. If the current settings are not effectively scaling to meet the demand, you may need to fine-tune these parameters to better align with your application's requirements. Monitor and Analyze Load Patterns: Understand the patterns and fluctuations in your application's load. Analyze the metrics over time to identify peak usage periods and adjust your provisioned concurrency and auto scaling settings accordingly. By optimizing the provisioned concurrency and auto scaling parameters based on your application's load patterns, you can better utilize provisioned concurrency and minimize spillover invocations and potential cold starts.

stackoverflow.com › questions › 71102547 › how-to-make-my-provisioned-capacity-lambdas-properly-scale-down-in

amazon cloudwatch - How to make my Provisioned Capacity Lambdas properly scale down/in? - Stack Overflow

I think I've figured this out. Provisioned Concurrency alarms have a setting on what to do when there is "Insufficient data" (i.e. no data).

There's a setting way down on the metric (open the Advanced Configuration section).

I had this set to Treat missing data as missing and this needs to be set to Treat missing data as bad (breaching threshold) for the Lambda Provisioned concurrency to be scaled down when there are no users.

In my case, I've opened up a ticket with the serverless-provisioned-concurrency-autoscaling plugin to get this fixed, since I don't control it myself.

dashbird.io › home › knowledge base › aws lambda › provisioned concurrency

AWS Lambda Provisioned Concurrency | Dashbird

June 29, 2021 - The Provisioned Concurrency level ... to the account regional limits5. It is possible to use Application Auto Scaling6 to automatically scale up and down the concurrency provisioned threshold....

serverless.com › blog › aws-lambda-provisioned-concurrency

Provisioned Concurrency: What it is and how to use it with the Serverless Framework

This setting can be made very simply in the AWS Console. Go to the function in the Lambda service, scroll all the way to the bottom and set it at what you want the minimum provisioned concurrency to always be.

Build with Jeroen

jeroenreijn.com › 2022 › 07 › aws-lambda-provisioned-concurrency-autoscaling-configuration-with-aws-cdk.html

AWS Lambda Provisioned Concurrency AutoScaling configuration with AWS CDK

July 11, 2022 - Through the AWS CDK, there are two ways of configuring autoscaling for the provisioned concurrency configuration of our function. ... So let’s explore both options. The Function Alias has a short-hand method for configuring provisioned concurrency scaling. You can do this by calling the .addAutoScaling method on the Alias. Adding a scaling strategy on the alias is pretty straight forward. You can use both scaling on utilization and scale by schedule.

quintagroup.com › blog › aws-lambda-provisioned-concurrency-auto-scaling

AWS Lambda Provisioned Concurrency: Auto Scaling — Quintagroup

June 14, 2023 - provisioned_concurrent_executions - (Required) Amount of capacity to allocate. Must be greater than or equal to 1. qualifier - (Required) Lambda Function version or Lambda Alias name. Auto Scaling will save huge amounts of your money while using AWS Lambda Provisioned Concurrency.