How are you measuring the time it takes? Have you eliminated the network latency when measuring the endpoints on Cloud Run for a fair comparison? A few things you can try: Switch to second generation (if not already), it's faster when running but slower cold starts. Set min instances to 1 (if the container start time is what is adding to the time) Enable startup boost (doubles the CPUs during container start) You can add additional cores by increasing the CPU allocated (if your container can take advantage of that) There is no public documentation on the CPU model, and no guarantees other than: All CPU platforms used by Cloud Run support the AVX2 instruction set. Note that the container contract does not contain any additional CPU platform details. Answer from Dillonu on reddit.com
🌐
Google
docs.cloud.google.com › cloud run › configure cpu limits for services
Configure CPU limits for services | Cloud Run | Google Cloud Documentation
1 week ago - If you are configuring a new service, ... configuration page. Click the Container tab. Select the desired CPU limit from the dropdown list, using Custom if you want to use less than 1 CPU....
🌐
Reddit
reddit.com › r/googlecloud › faster cpu on cloud run?
r/googlecloud on Reddit: Faster CPU on Cloud Run?
September 25, 2024 -

Hello,

I have a FastAPI application running on cloud run, which has some endpoints doing fairly complex computations. On cloud run those endpoints take 3x more than when running them locally (on my m1 macbook). My guess is that the cpu provided by cloud run is just slower? Does anyone know which CPUs are attached by default, and if there's a solution for that?

Cheers

Discussions

Cloud Run, ideal vCPU and memory amount per instance? - Stack Overflow
Sure, sorry for the delay on the answer, but essentially, while throttling (default behavior) when there is no use, your container sorts of sleeps. Passing the --no-cpu-throttling avoids that, but also costs more, as CPU is always allocated. This article highlights it very well: cloud.google... More on stackoverflow.com
🌐 stackoverflow.com
Does the cpu flag in gcloud run deploy control the number of cpus per VM or the total amount of cpu
Looking at documentation: –cpu=CPUSet a CPU limit in Kubernetes cpu units. Cloud Run (fully managed) supports values 1, 2 and 4. For Cloud Run (fully managed), 4 cpus also requires a minimum 2Gi --memory value. Examples 2, 2.0, 2000m Cloud Run for Anthos and Knative-compatible Kubernetes ... More on discuss.google.dev
🌐 discuss.google.dev
1
0
September 22, 2022
Google Cloud Run: interpreting CPU utilization metrics for concurrency - Stack Overflow
In GCR docs about concurrency, it's recommended to allow concurrent connections unless you anticipate that each request will max out the CPU/RAM (https://cloud.google.com/run/docs/about-concurrency# More on stackoverflow.com
🌐 stackoverflow.com
Cloud run with CPU always allocated is cheaper than only allocated during request processing. How? - Stack Overflow
I use Cloud Run for my apps and trying to predict the costs using the GCP pricing calculator. I can't find out why it's cheaper with CPU always allocated instead of CPU allocated during request More on stackoverflow.com
🌐 stackoverflow.com
🌐
Google
docs.cloud.google.com › cloud run › configure cpu limits for jobs
Configure CPU limits for jobs | Cloud Run | Google Cloud Documentation
2 weeks ago - If your Cloud Run job interfaces ... about granting roles, see deployment permissions and manage access. You must set a minimum of 1 CPU for a Cloud Run job....
Top answer
1 of 2
3

There isn't a good answer to that question. You have to know the limits:

  • The max number of concurrent requests that you can handle concurrently with 4cpu or/and 32Gb of memory (up to 1000 concurrent requests)
  • The max number on instance on Cloud Run (1000)

Then it's a matter of tradeoff, and it's highly dependent of your use case.

  • Bigger instances reduce the number of cold starts (and so high latency when your service scale up). But, if you have only 1 request at a time, you will pay a BIG instance for a small processing
  • Smaller instances allow you to optimize cost and to add only a small slice of resource in your cluster, but you will have to spawn often a new instance and you will have several cold start to endure.

Optimize what you prefer, find the right balance. No magic formula!!

2 of 2
1

You can simulate a load of requests in your current settings using k6.io, check the memory and cpu percentage of your container and adjust them to a lower or higher setting to see if you can get more RPS out of a single container.

Once you are satisfied with a single container instance's let's say 100 rps per container instance, you can then specify using gcloud the flags --min-instances and --max-instances depending of course on the --concurrency flag, which in my explanation would be set to 100.

Also note that it starts at the default of 80 and can go up to 1000.

More info about this can be read on the links below: https://cloud.google.com/run/docs/about-concurrency https://cloud.google.com/sdk/gcloud/reference/run/deploy

I would also recommend you investigating if you need to pass the --cpu-throttling flag or the --no-cpu-throttling depending on your need for adjusting for cold starts.

🌐
Medium
medium.com › google-cloud › cloud-run-performances-with-multiple-cpus-a4c2fccb5192
Cloud Run performances with multiple CPUs | by guillaume blaquiere | Google Cloud - Community | Medium
October 9, 2020 - Indeed, Cloud Run is able to handle up to 80 concurrent requests. So, if you have a single-thread process (like the Fibonacci algorithm) and if you send 2 requests to Cloud Run at the same time · With 2 CPUs, each request should be processed on a separate CPU, in parallel, and take roughly the same time
🌐
Google Cloud
cloud.google.com › blog › products › serverless › cloud-run-gets-always-on-cpu-allocation
Cloud Run gets always-on CPU allocation | Google Cloud Blog
September 13, 2021 - Cloud Run, Google Cloud's serverless ... and memory when your app processes requests or events. By default, Cloud Run does not allocate CPU outside of request processing. For a class of workloads that expect to do background processing, ...
🌐
Datadog
datadoghq.com › blog › key-metrics-for-cloud-run-monitoring
Key metrics for monitoring Google Cloud Run | Datadog
January 13, 2025 - For Cloud Run services, you have the option to allocate CPU to container instances while requests are being processed, or enable CPU to always be allocated even if no requests are coming in.
Find elsewhere
🌐
Google
docs.cloud.google.com › cloud run › configure memory limits for services
Configure memory limits for services | Cloud Run | Google Cloud Documentation
4 days ago - CPU_VALUE: the needed CPU limit—for example, 2. This value determines the required memory. ... Respond y to any prompts to install required components or to enable APIs. Optional: Make your service public if you want to allow unauthenticated access to the service. After deployment, the Cloud Run service URL is displayed.
🌐
Google
discuss.google.dev › google cloud › serverless applications
Does the cpu flag in gcloud run deploy control the number of cpus per VM or the total amount of cpu - Serverless Applications - Google Developer forums
September 22, 2022 - Cloud Run (fully managed) supports values 1, 2 and 4. For Cloud Run (fully managed), 4 cpus also requires a minimum 2Gi --memory value. Examples 2, 2.0, 2000m Cloud Run for Anthos and Knative-compatible Kubernetes clusters support fractional values.
🌐
LinkedIn
linkedin.com › pulse › demystifying-google-cloud-run-pricing-untangling-cpu-memory-zakaria
Demystifying Google Cloud Run Pricing: Untangling CPU and Memory Costs for Optimal Savings
August 7, 2023 - Cloud Run operates on a pay-as-you-go model, meaning you only pay for the resources consumed during the execution of your containerized applications. The two primary cost factors are CPU usage, measured in vCPU-seconds, and memory usage, measured in GiB-seconds.
🌐
OneUptime
oneuptime.com › home › blog › how to configure cloud run cpu allocation to always-on
How to Configure Cloud Run CPU Allocation to Always-On
February 17, 2026 - By default, Cloud Run only allocates CPU to your container while it is actively handling a request. The moment the response is sent, CPU gets throttled.
🌐
GitHub
github.com › ahmetb › cloud-run-faq
GitHub - ahmetb/cloud-run-faq: Unofficial FAQ and everything you've been wondering about Google Cloud Run. · GitHub
On Cloud Run, you only pay while a request is being handled. On AWS Fargate, you pay for CPU/memory while containers are running, and since Fargate doesn't support scale-to-zero, a service receiving no traffic will still incur costs.
Starred by 2.3K users
Forked by 123 users
Languages   Shell
🌐
InfoQ
infoq.com › news › 2022 › 09 › google-startup-cpu-boost
Google Cloud Introduces Startup CPU Boost for Cloud Run and Cloud Functions 2nd Gen - InfoQ
September 30, 2022 - Google Cloud recently introduced startup CPU boost for Cloud Run and Cloud Functions 2nd gen, a new feature that allows developers to significantly reduce the cold start time of Cloud Run and Cloud Functions.
🌐
Medium
medium.com › @mkdev › google-cloud-run-always-on-vs-on-demand-cpu-allocation-bd5f8054c66d
Google Cloud Run always-on vs on-demand CPU allocation | by mkdev | Medium
October 8, 2024 - On the screen, you can see the block showing, for example, Cloud Run costing €1.53 when it’s connected all the time, and €0.36 when the CPU is only connected when using the service. We mentioned that it’s cheaper to connect the CPU only when needed.
🌐
Google Cloud
cloud.google.com › blog › topics › developers-practitioners › use-cloud-run-always-cpu-allocation-background-work
Use Cloud Run "always-on" CPU allocation for background work | Google Cloud Blog
March 28, 2022 - Last fall, the Cloud Run team announced ... cycle. By default, Cloud Run instances are only allocated CPU during request processing as well as during container startup and shutdown as per the instance lifecycle....
Top answer
1 of 3
2

Cloud Run is serverless by default: you pay as you use. When a request comes in, an instance is created (and started, it's called cold-start) and your request processed. The timer starts. When your web server send the answer, the timer stop.

You pay for the memory and the CPU used during the request processing, rounded to the upper 100ms. The instance continue to live for about 15 minutes (by default, can be changed at any moment) to be ready to process another request without the need to start another one (and wait the cold start again).

As you can see, the instance continue to live EVEN IF YOU NO LONGER PAY FOR IT. Because you pay only when a request is processed.


When you set the CPU always on, you pay full time the instance run. No matter the request handling or not. Google don't have to pay for instances running and not used, waiting a request as the pay per use model. You pay for that, and you pay less

It's like a Compute Engine up full time. And as a Compute Engine, you can have something similar to sustained used discount. That's why it's cheaper.

2 of 3
2

In general it depends on how you use cloud run. Google is giving some hints here: https://cloud.google.com/run/docs/configuring/cpu-allocation

To give a summary to the biggest pricing differences:

CPU is only allocated during request processing

you pay for:

  • every request on a per request basis
  • cpu and memory allocation time during request
CPU always allocated

you pay for

  • cpu and memory allocation with a cheaper rate for time of instance active

Compare the pricing here: https://cloud.google.com/run/pricing

So if you have a lot of request which do not use a lot of ressources and not so much variance in it, then "always allocated" might be a lot cheaper.

🌐
Google
docs.cloud.google.com › cloud run › billing settings for services
Billing settings for services | Cloud Run | Google Cloud Documentation
For a service set to instance-based billing, Cloud Run autoscales the number of instances based on CPU utilization for the entire lifecycle of the container instance, except when scaling to and from zero, where it only uses requests.
🌐
Medium
medium.com › google-cloud › optimize-your-cloud-run-functions-7bf0b6c188f4
Optimize your Cloud Run functions | by George Mao | Google Cloud - Community | Medium
November 19, 2024 - The final optimization technique ... extra cpu power (about 2x) during cold starts. Cloud Run Functions can run cpu intensive tasks for up to 10 seconds with boosted cpu....