Brave Search

1 week ago - G5 instances are the first in the cloud to feature NVIDIA A10G Tensor Core GPUs that deliver high performance for graphics-intensive and machine learning applications. Each instance features up to 8 A10G Tensor Core GPUs that come with 80 ray tracing cores and 24 GB of memory per GPU. They also ...

Vantage

instances.vantage.sh › aws › ec2 › g5.xlarge

g5.xlarge pricing and specs - Vantage

The g5.xlarge instance is in the GPU instance family with 4 vCPUs, 16 GiB of memory and up to 10 Gibps of bandwidth starting at $1.006 per hour.

Cloudzero

advisor.cloudzero.com › aws › sagemaker › ml.g5.xlarge

ml.g5.xlarge SageMaker ML Instance Specs And Pricing

CloudZero's intelligent platform helps you optimize cloud costs and improve infrastructure efficiency.

VPSBenchmarks

vpsbenchmarks.com › home › gpu plans › amazon ec2 › g5.12xlarge gpu plan

g5.12xlarge GPU Plan | VPSBenchmarks

G5 instances feature up to 8 NVIDIA A10G Tensor Core GPUs and second generation AMD EPYC processors. They also support up to 192 vCPUs, up to 100 Gbps of network bandwidth, and up to 7.6 TB of local NVMe SSD storage.

Vantage

instances.vantage.sh › aws › ec2 › g5.2xlarge

g5.2xlarge pricing and specs - Vantage

The g5.2xlarge instance is in the GPU instance family with 8 vCPUs, 32 GiB of memory and up to 10 Gibps of bandwidth starting at $1.212 per hour.

Amazon Web Services

aws.amazon.com › machine learning › amazon sagemaker ai › pricing

SageMaker Pricing

1 week ago - In the following example, a machine learning engineer in US East (N. Virginia) runs a human-based evaluation of Llama-2-7B for summarization task accuracy and uses their own private workforce to the evaluation. The recommended instance type for Llama-2-7B is ml.g5.2xlarge.

Amazon Web Services

amazonaws.cn › home › amazon ec2 › amazon ec2 g5 instances

Amazon EC2 G5 Instances

Cloudzero

advisor.cloudzero.com › aws › sagemaker › ml.g5.12xlarge

ml.g5.12xlarge SageMaker ML Instance Specs And Pricing

CloudZero's intelligent platform helps you optimize cloud costs and improve infrastructure efficiency.

CloudPrice

cloudprice.net › amazon web services › ec2 › g5.12xlarge

g5.12xlarge specs and pricing | AWS | CloudPrice

Amazon EC2 instance g5.12xlarge with 48 vCPUs, 192 GiB RAM and 4 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $4140.56 per month.

CloudOptimo

cloudoptimo.com › home › blog › aws g5 vs g6: which gpu is best for your ai & ml workloads?

AWS G5 vs G6: Which GPU is Best for Your AI & ML Workloads?

December 16, 2024 - Earlier generations of GPU instances, such as the G4 family, were effective for their time but have begun to show their limitations as workloads become more complex and data-heavy. The G5 instances addressed some of these challenges, offering enhanced performance and greater power to handle larger-scale AI models and high-resolution graphics rendering.

Find elsewhere

Google Bing Mojeek

EC2 Pricing Calculator

costcalc.cloudoptimo.com › aws-pricing-calculator › ec2 › g5.12xlarge

g5.12xlarge Pricing and Specs: AWS EC2

The g5.12xlarge instance is part of the g5 series, featuring 48 vCPUs and 40 Gigabit of RAM, with Gpu Instances.

CloudOptimo

cloudoptimo.com › home › blog › how aws g5 instances are changing ai and machine learning in the cloud?

How AWS G5 Instances are Changing AI and Machine Learning in the Cloud?

March 13, 2025 - Unlock AWS EC2 G5 instances for GPU-optimized workloads like ML, gaming, and HPC. Explore features, pricing, use cases, and cost management tips.

reddit.com › r/machinelearning › [r] benchmarking g5.12xlarge (4xa10) vs 1xa100 inference performance running upstage_llama-2-70b-instruct-v2 (4-bit & 8-bit)

r/MachineLearning on Reddit: [R] Benchmarking g5.12xlarge (4xA10) vs 1xA100 inference performance running upstage_Llama-2-70b-instruct-v2 (4-bit & 8-bit)

August 10, 2023 -

Hi Reddit folks, I wanted to share some benchmarking data I recently compiled running upstage_Llama-2-70b-instruct-v2 on two different hardware setups. If you'd like to see the spreadsheet with the raw data you can check out this link.
Hardware Config #1: AWS g5.12xlarge - 4 x A10 w/ 96GB VRAM
Hardware Config #2: Vultr - 1 x A100 w/ 80GB VRAM
A few questions I wanted to answer:

How does the inference speed (tokens/s) between these two configurations compare?
How does the number of input tokens impact inference speed?
How many input tokens can these machines handle before they start to hit OOM?
How does 4-bit vs 8-bit quantization affect all of the above?

Why this model?
I chose upstage_Llama-2-70b-instruct-v2 because it's the current #1 performing OS model on HuggingFace's LLM Leaderboard. Also, according to the documentation the model is able to support 10K+ tokens using RoPE which is allowed me to push memory on the machines to the point of OOM.
Why this hardware?
I have some projects I'm working on that will require high performance LLMs and these are the two most common configurations that we're considering. We do most of our cloud work on AWS so the g5.12xlarge is the "go to" option for inference with a model of this size. However, I have been very interested in understanding if there are compelling reasons to go with a 1xA100 setup which AWS doesn't offer.

Text Generation Performance (t/s) vs Input Tokens (t)

This chart shows how Text Generation Performance (t/s) responds to the number of input tokens (t) sent to the model. As expected, more input tokens results in slower generation speed.

GPU Load Performance (MM:SS)
This is a measure of how long it took to load the model into memory. I averaged this across 5 load attempts for each configuration.

Hardware	8-Bit GPU Load Time	4-Bit GPU Load Time
g5.12xlarge (4xA10)	0:59	1:00
1xA100	2:47	2:54

Average Text Generation Performance (tokens/second)
Note that these numbers are an average across all text generation attempts for each configuration.

Hardware	8-Bit Avg Text Generation Performance (tokens/second)	4-Bit Avg Text Generation Performance (tokens/second)
g5.12xlarge (4xA10)	2.07 t/s	4.08 t/s
1xA100	2.28 t/s	4.54 t/s

Maximum Context (tokens)
This was a measure of how many input tokens I could pass into the model before getting an OOM exception for each configuration.

Hardware	8-Bit Maximum Context (tokens)	4-Bit Maximum Context (tokens)
g5.12xlarge (4xA10)	2500 tokens	5500 tokens
1xA100	3000 tokens	8000 tokens

Summary
On text generation performance the A100 config outperforms the A10 config by ~11%. I was surprised to see that the A100 config, which has less VRAM (80GB vs 96GB), was able to handle a larger context size before hitting OOM errors. Additionally, it was interesting to see the A10 hardware was much faster at loading the model. I would presume this is because it can parallelize the load across the 4 separate GPUs. Unsurprisingly, 4-bit quantized models were much faster than 8-bit quantized models (almost 2x) and they were able to handle much larger context sizes before OOM.

Top answer

1 of 5

We need more benchmarking data like this! Thank you for organizing and presenting this. Interesting to see the A100 win out on pretty much every measure. Still, it's good to have data about other options for when an A100 is not available or not feasible. Let's keep this kind of benchmark data comong!

2 of 5

Isn't it like super low t/s? We can do more than 1 t/s on cpu, at this point it doesn't seem worth it to rend gpu if its basically not interactive. What software did you run it with?

CloudOptimo

cloudoptimo.com › home › blog › aws g4 vs g5 family: a detailed comparison of aws gpu instances

AWS G4 vs G5 Family: A Detailed Comparison of AWS GPU Instances

March 13, 2025 - Ideal for businesses with moderate GPU needs, or those running short-term AI inference jobs or video transcoding. ... Perfect for businesses focusing on AI inference, image recognition, and video transcoding.

Vantage

instances.vantage.sh › aws › ec2 › g5.12xlarge

g5.12xlarge pricing and specs - Vantage

The g5.12xlarge instance is in the GPU instance family with 48 vCPUs, 192 GiB of memory and 40 Gibps of bandwidth starting at $5.672 per hour.

CloudPrice

cloudprice.net › amazon web services › ec2 › g5.xlarge

g5.xlarge specs and pricing | AWS | CloudPrice

Amazon EC2 instance g5.xlarge with 4 vCPUs, 16 GiB RAM and 1 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $734.38 per month.

CloudPrice

cloudprice.net › amazon web services › ec2 › g5.2xlarge

g5.2xlarge specs and pricing | AWS | CloudPrice

Amazon EC2 instance g5.2xlarge with 8 vCPUs, 32 GiB RAM and 1 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $884.76 per month.

reddit.com › r/aws › nvidia driver for g5.xlarge

r/aws on Reddit: Nvidia driver for g5.xlarge

February 24, 2024 -

Hello,

I just take a g5.xlarge with an Amazon Linux 2023 AMI and I struggle to install an nvidia driver, so please I have some questions:

How can I install an nvida driver there ? I konw there is some AMI which comes already with a driver but I don't want thid solution

Which driver should I use ? My purpose is to play with llama2 to do some GenAI (RAG)

Are nvidia drivers free ?

Thanks a lot

Top answer

1 of 5

Please see the driver installation for G5DN on Amazon’s site here Gaming drivers (I assume you needed GRID because you are doing ML) are available further down on the page.

2 of 5

Do you have to use the Amazon Linux AMI? The Deep Learning AMI (Ubuntu) works quite well, and has the Nvidia drivers loaded. Launch it, verify that you can run the nvidia-smi binary, and get started.

reddit.com › r/aws › is sagemaker gpu instance g5.16xlarge enough for inferencing mixtral-8x7b-instruct?

r/aws on Reddit: Is Sagemaker gpu instance g5.16xlarge enough for inferencing Mixtral-8x7b-Instruct?

March 16, 2024 -

I am using gpu instance “g5.16xlarge” to do inferencing for Mixtral-8x7b-Instruct in AWS Sagemaker, but I am getting an error which says “try changing instance type”.

Does anyone know if “g5.16xlarge” is sufficient for Mixtral-8x7b-Instruct? I noticed in AWS Sagemaker JumpStart, the only gpu instance I can select is “g5.48xlarge” for Mixtral-8x7b-Instruct. Does that mean I will need at least “g5.48xlarge” to run Mixtral-8x7b-Instruct in AWS Sagemaker?

Would really appreciate any input on this. Thanks heaps.

Top answer

1 of 2

I have used g5.12xlarge for Mixtral-8x7b-instruct without issue. Not through jumpstart but by deploying via notebook and sdk. However this was for testing and inference load was low. I think the issue could be because of unavailability of the instance type(not sure without detailed log) you are using.

2 of 2

Should be able to get by with the 4 gpu G5 instance. https://huggingface.co/spaces/Vokturz/can-it-run-llm Check out the vram requirements for your model and see how many GPUs you need to satisfy that requirement. You could do it with a100 40gb cards but aws only does them in the p4 8 gpu nodes so that’s not helpful. Quantisation would reduce the memory requirements but IIRC the reasonable precision quants still need above 24gb so you will need at least 2 g5 gpus

AWS re:Post

repost.aws › questions › QUyOS7hwL8Tk-8PDexEZ81QA › am-using-4-gpu-g5-12xlarge-instance-to-train-object-detection-models-each-iteration-takes-1-sec-this-week-when-i-started-new-training-in-3-different-instances-training-is-taking-more-time-2-5-sec

am using 4 gpu g5.12xlarge instance to train object detection models. each iteration takes 1 sec. this week when i started new training in 3 different instances. training is taking more time. 2.5 sec | AWS re:Post

Top answer

1 of 1

The situation you're describing, where your object detection model training has suddenly slowed down without any changes to the model itself, is unusual and could be caused by several factors. First, it's important to note that the G5.12xlarge instance you're using is indeed suitable for object detection model training. These instances are part of the GPU families recommended for such tasks, offering good performance for deep learning workloads. However, the significant increase in iteration time from 1 second to 2.5 seconds across multiple instances suggests a systemic issue rather than an isolated incident. Here are a few possibilities to consider: 1. Instance availability and performance: There might be underlying hardware or capacity issues affecting the G5 instances in your region. AWS occasionally faces high demand for specific instance types, which can impact performance. 2. Network or storage bottlenecks: If your training data is being fetched from S3 or another storage service, there could be network congestion or storage performance issues slowing down data loading. 3. Background processes: Check if there are any new background processes or updates running on the instances that could be consuming resources. 4. Changes in the AWS environment: Recent updates or changes to the AWS infrastructure might have inadvertently affected performance. 5. Dataset changes: Even if you haven't changed the model, any changes to the training data or how it's being loaded could impact training speed. To troubleshoot this issue: 1. Monitor the GPU utilization and memory usage during training to ensure the GPUs are being fully utilized. 2. Try training on a different instance type, such as P3 or G4dn, to see if the problem persists. 3. Check CloudWatch metrics for your instances to identify any unusual patterns in CPU, network, or disk usage. 4. Verify that your training data is being accessed efficiently and there are no bottlenecks in data loading. 5. Consider reaching out to AWS support for a more in-depth investigation of the instance performance. If the problem continues, it may be worth exploring distributed training across multiple instances to potentially mitigate the slowdown and improve overall training time. **Sources** Object Detection - MXNet - Amazon SageMaker AI Scaling training - Amazon SageMaker AI