What is your rate of invocations? Is it more than 5000/sec? If so, you are hitting the Invocations per Second limit, which is set to 10 times the number of configured provisioned concurrency. In your case 10*500=5000 invocations/sec. Answer from Uri Segev on repost.aws
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring provisioned concurrency for a function
Configuring provisioned concurrency for a function - AWS Lambda
For provisioned concurrency environments, your function's initialization code runs during allocation, and periodically as Lambda recycles instances of your environment. Lambda bills you for initialization even if the environment instance never processes a request. Provisioned concurrency runs continually and incurs separate billing from initialization and invocation costs. For more details, see AWS Lambda Pricing
SQLServerCentral
sqlservercentral.com › forums › topic › aws-lambda-provisioned-concurrency-vs-ec2
AWS Lambda provisioned concurrency vs EC2 – SQLServerCentral Forums
May 5, 2022 - I am looking to host an intensive computation app on AWS, and can't afford to wait for Lambda cold starts. The app needs to be able to handle up to 300 users without losing in performances, but it won't be having 300 users at all time, so it needs to be able to scale up and down. I've been benchmarking both Lambda with provisioned concurrency and EC2, and here are my first conclusions :
Lambda provisioned concurrency
pings won't save you from cold starts. if the workload just crosses what the current capacity can handle, a new instance will be warmed up. you have no control over whether it will be a ping or an actual user. pinging works as long as one single instance can serve all demands. fargate requires 24/7 running tasks, because the startup times are even worse than lambda's. if you want 24/7 running tasks together with scaling and all, sure, do that, but it requires a whole lot more setup. More on reddit.com
Lambda Provisioned Concurrency Metrics
I have an isolated account with a lambda function that has provisioned concurrency (PC) set to 500 and PC autoscaling on 06. The autoscale min capacity is set to 500, and the autoscale max capacity... More on repost.aws
amazon web services - Cost efficiency for AWS Lambda Provisioned Concurrency - Stack Overflow
I'm now looking at the native solution to the cold start problem: AWS Lambda Concurrent Provisioning, which at first glance looks awesome, but when I start calculating, either I'm missing something, or this will simply be a large cost increase for a system with only medium load. More on stackoverflow.com
AWS Lambda provisioned concurrency vs keeping a lambda warm with events?
For events, keep in mind that pinging your lambda function with scheduled events will probably only keep one container warm, so if multiple requests are made to the function in parallel by users then some requests may still incur a cold start. If you want to keep more containers warm, you can invoke a separate lambda function on a schedule which concurrently makes multiple requests to your service. More on reddit.com
Videos
10:50
AWS Lambda Provisioned Concurrency | Lambda Scaling and Concurrency ...
17:51
How does AWS Lambda Concurrency Work? - YouTube
13:02
AWS Lambda Concurrency - Provisional & Reserved - YouTube
13:09
AWS Lambda Concurrency Explained - YouTube
17:45
AWS Lambda Concurrency | Reserved Concurrency | Provisioned ...
02:33
AWS Lambda Concurrency Explained | Reserved vs Provisioned ...
Reddit
reddit.com › r/aws › lambda provisioned concurrency
r/aws on Reddit: Lambda provisioned concurrency
July 3, 2023 -
Hey, I'm a huge serverless user, I've built several applications on top of Lambda, Dynamo, S3, EFS, SQS, etc.
But I have never understood why would someone use Provisioned Concurrency, do you know a real use case for this feature?
I mean, if your application is suffering due to cold starts, you can just use the old-school EventBridge ping option and it costs 0, or if you have a critical latency requirement you can just go to Fargate instead of paying for provisioned concurrency, am I wrong?
Top answer 1 of 5
14
pings won't save you from cold starts. if the workload just crosses what the current capacity can handle, a new instance will be warmed up. you have no control over whether it will be a ping or an actual user. pinging works as long as one single instance can serve all demands. fargate requires 24/7 running tasks, because the startup times are even worse than lambda's. if you want 24/7 running tasks together with scaling and all, sure, do that, but it requires a whole lot more setup.
2 of 5
10
I mean, if your application is suffering due to cold starts, you can just use the old-school EventBridge ping option and it costs 0 This isn't nearly as effective as there's no real way to make EventBridge keep 100 or 1000 or more environments warm. If you have a very low traffic application maybe this method still makes sense, but for anything else PC is going to be more reliable
Top answer 1 of 2
1
What is your rate of invocations? Is it more than 5000/sec? If so, you are hitting the Invocations per Second limit, which is set to 10 times the number of configured provisioned concurrency. In your case 10*500=5000 invocations/sec.
2 of 2
1
The presence of spillover invocations in your scenario indicates that your provisioned concurrency (PC) is not sufficient to handle the current load. While the PC utilization is only at 28.5%, it's important to note that this metric represents the ratio of the provisioned concurrent executions being used to the total provisioned concurrency. It doesn't necessarily reflect the actual demand or the number of concurrent invocations at any given moment.
In your case, the load test shows that you had a peak of 189 concurrent executions, but your provisioned concurrency was set to 500. This means that during the test, there were instances where the available provisioned concurrency was fully utilized, resulting in spillover invocations. These spillover invocations occur when the provisioned concurrency is exhausted, and additional requests cannot be immediately served by existing instances.
Cold starts can still happen with provisioned concurrency, but their occurrence is minimized compared to using on-demand concurrency. When a cold start occurs, it means that the lambda function needs to initialize a new execution environment to handle the incoming request. With provisioned concurrency, you can pre-warm a certain number of instances to minimize the impact of cold starts, but if the demand exceeds the provisioned concurrency, spillover invocations may experience cold starts.
To address the spillover invocations and potential cold starts, you have a few options:
Increase Provisioned Concurrency: If the load test consistently exceeds the provisioned concurrency, consider increasing the provisioned concurrency limit to better accommodate the peak demand and minimize spillover invocations.
Adjust Auto Scaling Parameters: Review your auto scaling configuration and ensure that the min and max capacity are set appropriately. If the current settings are not effectively scaling to meet the demand, you may need to fine-tune these parameters to better align with your application's requirements.
Monitor and Analyze Load Patterns: Understand the patterns and fluctuations in your application's load. Analyze the metrics over time to identify peak usage periods and adjust your provisioned concurrency and auto scaling settings accordingly.
By optimizing the provisioned concurrency and auto scaling parameters based on your application's load patterns, you can better utilize provisioned concurrency and minimize spillover invocations and potential cold starts.
DEV Community
dev.to › aws-builders › provisioned-concurrency-reduce-cold-starts-in-aws-lambda-functions-part-1-4mob
Provisioned Concurrency - Reduce Cold Starts in AWS Lambda Functions Part 1 - DEV Community
March 13, 2024 - If we have predictable patterns or metrics, we can reduce the amount of provisioned concurrent executions. We can also define some rollout and deployment preferences as part of this strategy, for example, using canary or linear deployments. ... Every time that we make a HTTPs call for the very first time we would see the impact of the Cold start, with a response time that seems to be pretty high -> 2 seconds. { "message": "Hello community builders - from AWS Lambda!"
Medium
aws.plainenglish.io › what-is-provisioned-concurrency-and-when-should-you-use-it-6eab44e9cd46
What is Provisioned Concurrency- and When Should You Use It? | by Joseph Schambach | AWS in Plain English
June 2, 2025 - What is Provisioned Concurrency- and When Should You Use It? Reduce cold start latency in Lambda functions. TL;DR: Provisioned Concurrency keeps your AWS Lambda functions “warm,” eliminating cold …
Terraform Registry
registry.terraform.io › providers › hashicorp › aws › latest › docs › resources › lambda_provisioned_concurrency_config.html
aws_lambda_provisioned_concurrency_config | Resources | hashicorp/aws | Terraform | Terraform Registry
provisioned_concurrent_executions - (Required) Amount of capacity to allocate. Must be greater than or equal to 1. qualifier - (Required) Lambda Function version or Lambda Alias name. ... region - (Optional) Region where this resource will be managed. Defaults to the Region set in the provider configuration. skip_destroy - (Optional) Whether to retain the provisioned concurrency configuration upon destruction.
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling › configuring reserved concurrency for a function
Configuring reserved concurrency for a function - AWS Lambda
Reserved concurrency acts as both ... a function incurs no additional charges. Provisioned concurrency – This is the number of pre-initialized execution environments allocated to your function....
AWS
aws.amazon.com › blogs › compute › new-for-aws-lambda-predictable-start-up-times-with-provisioned-concurrency
New for AWS Lambda – Predictable start-up times with Provisioned Concurrency | Amazon Web Services
July 10, 2020 - Builders can now choose the concurrency level for each Lambda function version or alias, including when and for how long these levels are in effect. This powerful feature is controlled via the AWS Management Console, AWS CLI, AWS Lambda API, or AWS CloudFormation, and it’s simple to implement. This blog post introduces how to use Provisioned ...
Medium
rajaswalavalkar.medium.com › aws-lambda-reserved-concurrency-v-s-provisioned-concurrency-scaling-db8e93703b02
AWS Lambda Reserved Concurrency v/s Provisioned Concurrency Scaling | by Rajas Walavalkar | Medium
January 4, 2022 - When a function has being with a reserved concurrency configuration then no other lambda function within the same AWS account and region can use that concurrency. There is no charge for configuring reserved concurrency for a function. Provisioned concurrency — This concurrency initializes a requested number of execution environments so that they are prepared to respond immediately to your function’s invocations.
Amazon Web Services
docs.aws.amazon.com › aws lambda › developer guide › understanding lambda function scaling
Understanding Lambda function scaling - AWS Lambda
For each concurrent request, Lambda provisions a separate instance of your execution environment. As your functions receive more requests, Lambda automatically handles scaling the number of execution environments until you reach your account's concurrency limit. By default, Lambda provides your account with a total concurrency limit of 1,000 concurrent executions across all functions in an AWS Region.
Quintagroup
quintagroup.com › blog › aws-lambda-provisioned-concurrency-auto-scaling
AWS Lambda Provisioned Concurrency: Auto Scaling — Quintagroup
June 14, 2023 - Saves money by automatically fitting according to the workload when it is unpredictable for setting the Provisioned Concurrency beforehand ... resource "aws_lambda_provisioned_concurrency_config" "this" { function_name = aws_lambda_function.this.function_name provisioned_concurrent_executions = var.provisioned_concurrent_executions qualifier = aws_lambda_alias.this.name }