Brave Search

reddit.com › r › googlecloud › comments › 1fp7ezx › faster_cpu_on_cloud_run

How are you measuring the time it takes? Have you eliminated the network latency when measuring the endpoints on Cloud Run for a fair comparison? A few things you can try: Switch to second generation (if not already), it's faster when running but slower cold starts. Set min instances to 1 (if the container start time is what is adding to the time) Enable startup boost (doubles the CPUs during container start) You can add additional cores by increasing the CPU allocated (if your container can take advantage of that) There is no public documentation on the CPU model, and no guarantees other than: All CPU platforms used by Cloud Run support the AVX2 instruction set. Note that the container contract does not contain any additional CPU platform details. Answer from Dillonu on reddit.com

Google

docs.cloud.google.com › cloud run › configure cpu limits for services

Configure CPU limits for services | Cloud Run | Google Cloud Documentation

1 week ago - If you are configuring a new service, ... configuration page. Click the Container tab. Select the desired CPU limit from the dropdown list, using Custom if you want to use less than 1 CPU....

reddit.com › r/googlecloud › faster cpu on cloud run?

r/googlecloud on Reddit: Faster CPU on Cloud Run?

September 25, 2024 -

Hello,

I have a FastAPI application running on cloud run, which has some endpoints doing fairly complex computations. On cloud run those endpoints take 3x more than when running them locally (on my m1 macbook). My guess is that the cpu provided by cloud run is just slower? Does anyone know which CPUs are attached by default, and if there's a solution for that?

Cheers

Top answer

1 of 4

2 of 4

You can specify CPU count, and RAM. Make sure any processing is done while the server is handling a request. Cloud run will aggressively throttle cpu down after the web request ends. See if you can scale horizontally if possible.

Discussions

Cloud Run, ideal vCPU and memory amount per instance? - Stack Overflow

Sure, sorry for the delay on the answer, but essentially, while throttling (default behavior) when there is no use, your container sorts of sleeps. Passing the --no-cpu-throttling avoids that, but also costs more, as CPU is always allocated. This article highlights it very well: cloud.google... More on stackoverflow.com

stackoverflow.com

Does the cpu flag in gcloud run deploy control the number of cpus per VM or the total amount of cpu

Looking at documentation: –cpu=CPUSet a CPU limit in Kubernetes cpu units. Cloud Run (fully managed) supports values 1, 2 and 4. For Cloud Run (fully managed), 4 cpus also requires a minimum 2Gi --memory value. Examples 2, 2.0, 2000m Cloud Run for Anthos and Knative-compatible Kubernetes ... More on discuss.google.dev

discuss.google.dev

September 22, 2022

Google Cloud Run: interpreting CPU utilization metrics for concurrency - Stack Overflow

In GCR docs about concurrency, it's recommended to allow concurrent connections unless you anticipate that each request will max out the CPU/RAM (https://cloud.google.com/run/docs/about-concurrency# More on stackoverflow.com

stackoverflow.com

Cloud run with CPU always allocated is cheaper than only allocated during request processing. How? - Stack Overflow

I use Cloud Run for my apps and trying to predict the costs using the GCP pricing calculator. I can't find out why it's cheaper with CPU always allocated instead of CPU allocated during request More on stackoverflow.com

stackoverflow.com

Videos

05:14

YouTube

Google Cloud Run always-on vs on-demand CPU allocation - YouTube

March 4, 2024

07:25

YouTube

Cloud Run scalability - YouTube

December 1, 2023

4.18K

facebook.com

Google Cloud recently released a startup CPU boost for Cloud Run ...

04:43

YouTube

How to use Cloud Run "always-on" CPU allocation for background ...

Hybrid work, Cloud Run CPU, & more! - YouTube

Cloud Functions & CPU optimization (Cloud Performance Atlas) - YouTube

docs.cloud.google.com › cloud run › configure cpu limits for jobs

Configure CPU limits for jobs | Cloud Run | Google Cloud Documentation

2 weeks ago - If your Cloud Run job interfaces ... about granting roles, see deployment permissions and manage access. You must set a minimum of 1 CPU for a Cloud Run job....

Stack Overflow

stackoverflow.com › questions › 71705733 › cloud-run-ideal-vcpu-and-memory-amount-per-instance

Cloud Run, ideal vCPU and memory amount per instance? - Stack Overflow

Top answer

1 of 2

There isn't a good answer to that question. You have to know the limits:

The max number of concurrent requests that you can handle concurrently with 4cpu or/and 32Gb of memory (up to 1000 concurrent requests)
The max number on instance on Cloud Run (1000)

Then it's a matter of tradeoff, and it's highly dependent of your use case.

Bigger instances reduce the number of cold starts (and so high latency when your service scale up). But, if you have only 1 request at a time, you will pay a BIG instance for a small processing
Smaller instances allow you to optimize cost and to add only a small slice of resource in your cluster, but you will have to spawn often a new instance and you will have several cold start to endure.

Optimize what you prefer, find the right balance. No magic formula!!

2 of 2

You can simulate a load of requests in your current settings using k6.io, check the memory and cpu percentage of your container and adjust them to a lower or higher setting to see if you can get more RPS out of a single container.

Once you are satisfied with a single container instance's let's say 100 rps per container instance, you can then specify using gcloud the flags --min-instances and --max-instances depending of course on the --concurrency flag, which in my explanation would be set to 100.

Also note that it starts at the default of 80 and can go up to 1000.

More info about this can be read on the links below: https://cloud.google.com/run/docs/about-concurrency https://cloud.google.com/sdk/gcloud/reference/run/deploy

I would also recommend you investigating if you need to pass the --cpu-throttling flag or the --no-cpu-throttling depending on your need for adjusting for cold starts.

Medium

medium.com › google-cloud › cloud-run-performances-with-multiple-cpus-a4c2fccb5192

Cloud Run performances with multiple CPUs | by guillaume blaquiere | Google Cloud - Community | Medium

October 9, 2020 - Indeed, Cloud Run is able to handle up to 80 concurrent requests. So, if you have a single-thread process (like the Fibonacci algorithm) and if you send 2 requests to Cloud Run at the same time · With 2 CPUs, each request should be processed on a separate CPU, in parallel, and take roughly the same time

Google Cloud

cloud.google.com › blog › products › serverless › cloud-run-gets-always-on-cpu-allocation

Cloud Run gets always-on CPU allocation | Google Cloud Blog

September 13, 2021 - Cloud Run, Google Cloud's serverless ... and memory when your app processes requests or events. By default, Cloud Run does not allocate CPU outside of request processing. For a class of workloads that expect to do background processing, ...

Datadog

datadoghq.com › blog › key-metrics-for-cloud-run-monitoring

Key metrics for monitoring Google Cloud Run | Datadog

January 13, 2025 - For Cloud Run services, you have the option to allocate CPU to container instances while requests are being processed, or enable CPU to always be allocated even if no requests are coming in.

Find elsewhere

Google Bing Mojeek

Google

docs.cloud.google.com › cloud run › configure memory limits for services

Configure memory limits for services | Cloud Run | Google Cloud Documentation

4 days ago - CPU_VALUE: the needed CPU limit—for example, 2. This value determines the required memory. ... Respond y to any prompts to install required components or to enable APIs. Optional: Make your service public if you want to allow unauthenticated access to the service. After deployment, the Cloud Run service URL is displayed.

Google

discuss.google.dev › google cloud › serverless applications

Does the cpu flag in gcloud run deploy control the number of cpus per VM or the total amount of cpu - Serverless Applications - Google Developer forums

September 22, 2022 - Cloud Run (fully managed) supports values 1, 2 and 4. For Cloud Run (fully managed), 4 cpus also requires a minimum 2Gi --memory value. Examples 2, 2.0, 2000m Cloud Run for Anthos and Knative-compatible Kubernetes clusters support fractional values.

linkedin.com › pulse › demystifying-google-cloud-run-pricing-untangling-cpu-memory-zakaria

Demystifying Google Cloud Run Pricing: Untangling CPU and Memory Costs for Optimal Savings

August 7, 2023 - Cloud Run operates on a pay-as-you-go model, meaning you only pay for the resources consumed during the execution of your containerized applications. The two primary cost factors are CPU usage, measured in vCPU-seconds, and memory usage, measured in GiB-seconds.

OneUptime

oneuptime.com › home › blog › how to configure cloud run cpu allocation to always-on

How to Configure Cloud Run CPU Allocation to Always-On

February 17, 2026 - By default, Cloud Run only allocates CPU to your container while it is actively handling a request. The moment the response is sent, CPU gets throttled.

Stack Overflow

stackoverflow.com › questions › 72510483 › google-cloud-run-interpreting-cpu-utilization-metrics-for-concurrency

Google Cloud Run: interpreting CPU utilization metrics for concurrency - Stack Overflow

Top answer

1 of 2

The legend is statistics. The 50% is the same as the median, and the 95% and 99% are percentiles. Meaning 50% of the measurements taken are below 0.67% CPU, 95% of the measurements taken are below 17.8%, and 99% of the measurements taken are 17.96%. Your CPU is not being used all that much.

2 of 2

To answer your questions:

Yes, the graph shows that your requests are using about 20% of your CPU. The legend below means that 95% of the time, your CPU usage will be around 20%.
Yes you can increase your concurrency up to a maximum of 1000. You can check this documentation on concurrency values and setting maximum concurrency (services). The default value for concurrency is 80.
I haven't tried this as it would depend on the load of the request. There are instances where there are single requests with lighter or heavier load.
Setting your minimum number of instances to 1 will reduce the number of cold starts as it will be ready to serve incoming requests as it will run on idle. The downside is that this would incur charges as the service is still running. Google recommends purchasing a committed use discount as these charges are very predictable. Full documentation regarding minimum instances can be found through this link.

GitHub

github.com › ahmetb › cloud-run-faq

GitHub - ahmetb/cloud-run-faq: Unofficial FAQ and everything you've been wondering about Google Cloud Run. · GitHub

On Cloud Run, you only pay while a request is being handled. On AWS Fargate, you pay for CPU/memory while containers are running, and since Fargate doesn't support scale-to-zero, a service receiving no traffic will still incur costs.

Starred by 2.3K users

Forked by 123 users

Languages Shell

InfoQ

infoq.com › news › 2022 › 09 › google-startup-cpu-boost

Google Cloud Introduces Startup CPU Boost for Cloud Run and Cloud Functions 2nd Gen - InfoQ

September 30, 2022 - Google Cloud recently introduced startup CPU boost for Cloud Run and Cloud Functions 2nd gen, a new feature that allows developers to significantly reduce the cold start time of Cloud Run and Cloud Functions.

Medium

medium.com › @mkdev › google-cloud-run-always-on-vs-on-demand-cpu-allocation-bd5f8054c66d

Google Cloud Run always-on vs on-demand CPU allocation | by mkdev | Medium

October 8, 2024 - On the screen, you can see the block showing, for example, Cloud Run costing €1.53 when it’s connected all the time, and €0.36 when the CPU is only connected when using the service. We mentioned that it’s cheaper to connect the CPU only when needed.

Google Cloud

cloud.google.com › blog › topics › developers-practitioners › use-cloud-run-always-cpu-allocation-background-work

Use Cloud Run "always-on" CPU allocation for background work | Google Cloud Blog

March 28, 2022 - Last fall, the Cloud Run team announced ... cycle. By default, Cloud Run instances are only allocated CPU during request processing as well as during container startup and shutdown as per the instance lifecycle....

Stack Overflow

stackoverflow.com › questions › 70334627 › cloud-run-with-cpu-always-allocated-is-cheaper-than-only-allocated-during-reques

Cloud run with CPU always allocated is cheaper than only allocated during request processing. How? - Stack Overflow

Top answer

1 of 3

Cloud Run is serverless by default: you pay as you use. When a request comes in, an instance is created (and started, it's called cold-start) and your request processed. The timer starts. When your web server send the answer, the timer stop.

You pay for the memory and the CPU used during the request processing, rounded to the upper 100ms. The instance continue to live for about 15 minutes (by default, can be changed at any moment) to be ready to process another request without the need to start another one (and wait the cold start again).

As you can see, the instance continue to live EVEN IF YOU NO LONGER PAY FOR IT. Because you pay only when a request is processed.

When you set the CPU always on, you pay full time the instance run. No matter the request handling or not. Google don't have to pay for instances running and not used, waiting a request as the pay per use model. You pay for that, and you pay less

It's like a Compute Engine up full time. And as a Compute Engine, you can have something similar to sustained used discount. That's why it's cheaper.

2 of 3

In general it depends on how you use cloud run. Google is giving some hints here: https://cloud.google.com/run/docs/configuring/cpu-allocation

To give a summary to the biggest pricing differences:

CPU is only allocated during request processing

you pay for:

every request on a per request basis
cpu and memory allocation time during request

CPU always allocated

you pay for

cpu and memory allocation with a cheaper rate for time of instance active

Compare the pricing here: https://cloud.google.com/run/pricing

So if you have a lot of request which do not use a lot of ressources and not so much variance in it, then "always allocated" might be a lot cheaper.

Google

docs.cloud.google.com › cloud run › billing settings for services

Billing settings for services | Cloud Run | Google Cloud Documentation

For a service set to instance-based billing, Cloud Run autoscales the number of instances based on CPU utilization for the entire lifecycle of the container instance, except when scaling to and from zero, where it only uses requests.

Medium

medium.com › google-cloud › optimize-your-cloud-run-functions-7bf0b6c188f4

Optimize your Cloud Run functions | by George Mao | Google Cloud - Community | Medium

November 19, 2024 - The final optimization technique ... extra cpu power (about 2x) during cold starts. Cloud Run Functions can run cpu intensive tasks for up to 10 seconds with boosted cpu....