🌐
AWS
aws.amazon.com › amazon ec2 › instance types › g5 instances
Amazon EC2 G5 Instances | Amazon Web Services
1 week ago - G5 instances are the first in the cloud to feature NVIDIA A10G Tensor Core GPUs that deliver high performance for graphics-intensive and machine learning applications. Each instance features up to 8 A10G Tensor Core GPUs that come with 80 ray tracing cores and 24 GB of memory per GPU. They also ...
🌐
Vantage
instances.vantage.sh › aws › ec2 › g5.xlarge
g5.xlarge pricing and specs - Vantage
The g5.xlarge instance is in the GPU instance family with 4 vCPUs, 16 GiB of memory and up to 10 Gibps of bandwidth starting at $1.006 per hour.
🌐
Cloudzero
advisor.cloudzero.com › aws › sagemaker › ml.g5.xlarge
ml.g5.xlarge SageMaker ML Instance Specs And Pricing
CloudZero's intelligent platform helps you optimize cloud costs and improve infrastructure efficiency.
🌐
VPSBenchmarks
vpsbenchmarks.com › home › gpu plans › amazon ec2 › g5.12xlarge gpu plan
g5.12xlarge GPU Plan | VPSBenchmarks
G5 instances feature up to 8 NVIDIA A10G Tensor Core GPUs and second generation AMD EPYC processors. They also support up to 192 vCPUs, up to 100 Gbps of network bandwidth, and up to 7.6 TB of local NVMe SSD storage.
🌐
Vantage
instances.vantage.sh › aws › ec2 › g5.2xlarge
g5.2xlarge pricing and specs - Vantage
The g5.2xlarge instance is in the GPU instance family with 8 vCPUs, 32 GiB of memory and up to 10 Gibps of bandwidth starting at $1.212 per hour.
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker ai › pricing
SageMaker Pricing
1 week ago - In the following example, a machine learning engineer in US East (N. Virginia) runs a human-based evaluation of Llama-2-7B for summarization task accuracy and uses their own private workforce to the evaluation. The recommended instance type for Llama-2-7B is ml.g5.2xlarge.
🌐
Amazon Web Services
amazonaws.cn › home › amazon ec2 › amazon ec2 g5 instances
Amazon EC2 G5 Instances
1 week ago - G5 instances are the first in the cloud to feature NVIDIA A10G Tensor Core GPUs that deliver high performance for graphics-intensive and machine learning applications. Each instance features up to 8 A10G Tensor Core GPUs that come with 80 ray tracing cores and 24 GB of memory per GPU. They also ...
🌐
Cloudzero
advisor.cloudzero.com › aws › sagemaker › ml.g5.12xlarge
ml.g5.12xlarge SageMaker ML Instance Specs And Pricing
CloudZero's intelligent platform helps you optimize cloud costs and improve infrastructure efficiency.
🌐
CloudPrice
cloudprice.net › amazon web services › ec2 › g5.12xlarge
g5.12xlarge specs and pricing | AWS | CloudPrice
Amazon EC2 instance g5.12xlarge with 48 vCPUs, 192 GiB RAM and 4 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $4140.56 per month.
🌐
CloudOptimo
cloudoptimo.com › home › blog › aws g5 vs g6: which gpu is best for your ai & ml workloads?
AWS G5 vs G6: Which GPU is Best for Your AI & ML Workloads?
December 16, 2024 - Earlier generations of GPU instances, such as the G4 family, were effective for their time but have begun to show their limitations as workloads become more complex and data-heavy. The G5 instances addressed some of these challenges, offering enhanced performance and greater power to handle larger-scale AI models and high-resolution graphics rendering.
Find elsewhere
🌐
EC2 Pricing Calculator
costcalc.cloudoptimo.com › aws-pricing-calculator › ec2 › g5.12xlarge
g5.12xlarge Pricing and Specs: AWS EC2
The g5.12xlarge instance is part of the g5 series, featuring 48 vCPUs and 40 Gigabit of RAM, with Gpu Instances.
🌐
CloudOptimo
cloudoptimo.com › home › blog › how aws g5 instances are changing ai and machine learning in the cloud?
How AWS G5 Instances are Changing AI and Machine Learning in the Cloud?
March 13, 2025 - Unlock AWS EC2 G5 instances for GPU-optimized workloads like ML, gaming, and HPC. Explore features, pricing, use cases, and cost management tips.
🌐
Reddit
reddit.com › r/machinelearning › [r] benchmarking g5.12xlarge (4xa10) vs 1xa100 inference performance running upstage_llama-2-70b-instruct-v2 (4-bit & 8-bit)
r/MachineLearning on Reddit: [R] Benchmarking g5.12xlarge (4xA10) vs 1xA100 inference performance running upstage_Llama-2-70b-instruct-v2 (4-bit & 8-bit)
August 10, 2023 -

Hi Reddit folks, I wanted to share some benchmarking data I recently compiled running upstage_Llama-2-70b-instruct-v2 on two different hardware setups. If you'd like to see the spreadsheet with the raw data you can check out this link.
Hardware Config #1: AWS g5.12xlarge - 4 x A10 w/ 96GB VRAM
Hardware Config #2: Vultr - 1 x A100 w/ 80GB VRAM
A few questions I wanted to answer:

  1. How does the inference speed (tokens/s) between these two configurations compare?

  2. How does the number of input tokens impact inference speed?

  3. How many input tokens can these machines handle before they start to hit OOM?

  4. How does 4-bit vs 8-bit quantization affect all of the above?

Why this model?
I chose upstage_Llama-2-70b-instruct-v2 because it's the current #1 performing OS model on HuggingFace's LLM Leaderboard. Also, according to the documentation the model is able to support 10K+ tokens using RoPE which is allowed me to push memory on the machines to the point of OOM.
Why this hardware?
I have some projects I'm working on that will require high performance LLMs and these are the two most common configurations that we're considering. We do most of our cloud work on AWS so the g5.12xlarge is the "go to" option for inference with a model of this size. However, I have been very interested in understanding if there are compelling reasons to go with a 1xA100 setup which AWS doesn't offer.

Text Generation Performance (t/s) vs Input Tokens (t)

This chart shows how Text Generation Performance (t/s) responds to the number of input tokens (t) sent to the model. As expected, more input tokens results in slower generation speed.

GPU Load Performance (MM:SS)
This is a measure of how long it took to load the model into memory. I averaged this across 5 load attempts for each configuration.

Hardware8-Bit GPU Load Time4-Bit GPU Load Time
g5.12xlarge (4xA10)0:591:00
1xA1002:472:54

Average Text Generation Performance (tokens/second)
Note that these numbers are an average across all text generation attempts for each configuration.

Hardware8-Bit Avg Text Generation Performance (tokens/second)4-Bit Avg Text Generation Performance (tokens/second)
g5.12xlarge (4xA10)2.07 t/s4.08 t/s
1xA1002.28 t/s4.54 t/s

Maximum Context (tokens)
This was a measure of how many input tokens I could pass into the model before getting an OOM exception for each configuration.

Hardware8-Bit Maximum Context (tokens)4-Bit Maximum Context (tokens)
g5.12xlarge (4xA10)2500 tokens5500 tokens
1xA1003000 tokens8000 tokens

Summary
On text generation performance the A100 config outperforms the A10 config by ~11%. I was surprised to see that the A100 config, which has less VRAM (80GB vs 96GB), was able to handle a larger context size before hitting OOM errors. Additionally, it was interesting to see the A10 hardware was much faster at loading the model. I would presume this is because it can parallelize the load across the 4 separate GPUs. Unsurprisingly, 4-bit quantized models were much faster than 8-bit quantized models (almost 2x) and they were able to handle much larger context sizes before OOM.

🌐
CloudOptimo
cloudoptimo.com › home › blog › aws g4 vs g5 family: a detailed comparison of aws gpu instances
AWS G4 vs G5 Family: A Detailed Comparison of AWS GPU Instances
March 13, 2025 - Ideal for businesses with moderate GPU needs, or those running short-term AI inference jobs or video transcoding. ... Perfect for businesses focusing on AI inference, image recognition, and video transcoding.
🌐
Vantage
instances.vantage.sh › aws › ec2 › g5.12xlarge
g5.12xlarge pricing and specs - Vantage
The g5.12xlarge instance is in the GPU instance family with 48 vCPUs, 192 GiB of memory and 40 Gibps of bandwidth starting at $5.672 per hour.
🌐
CloudPrice
cloudprice.net › amazon web services › ec2 › g5.xlarge
g5.xlarge specs and pricing | AWS | CloudPrice
Amazon EC2 instance g5.xlarge with 4 vCPUs, 16 GiB RAM and 1 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $734.38 per month.
🌐
CloudPrice
cloudprice.net › amazon web services › ec2 › g5.2xlarge
g5.2xlarge specs and pricing | AWS | CloudPrice
Amazon EC2 instance g5.2xlarge with 8 vCPUs, 32 GiB RAM and 1 x NVIDIA A10G 22.35 GiB. Available in 16 regions starting from $884.76 per month.
🌐
Reddit
reddit.com › r/aws › nvidia driver for g5.xlarge
r/aws on Reddit: Nvidia driver for g5.xlarge
February 24, 2024 -

Hello,

I just take a g5.xlarge with an Amazon Linux 2023 AMI and I struggle to install an nvidia driver, so please I have some questions:

How can I install an nvida driver there ? I konw there is some AMI which comes already with a driver but I don't want thid solution

Which driver should I use ? My purpose is to play with llama2 to do some GenAI (RAG)

Are nvidia drivers free ?

Thanks a lot

🌐
Reddit
reddit.com › r/aws › is sagemaker gpu instance g5.16xlarge enough for inferencing mixtral-8x7b-instruct?
r/aws on Reddit: Is Sagemaker gpu instance g5.16xlarge enough for inferencing Mixtral-8x7b-Instruct?
March 16, 2024 -

I am using gpu instance “g5.16xlarge” to do inferencing for Mixtral-8x7b-Instruct in AWS Sagemaker, but I am getting an error which says “try changing instance type”.

Does anyone know if “g5.16xlarge” is sufficient for Mixtral-8x7b-Instruct? I noticed in AWS Sagemaker JumpStart, the only gpu instance I can select is “g5.48xlarge” for Mixtral-8x7b-Instruct. Does that mean I will need at least “g5.48xlarge” to run Mixtral-8x7b-Instruct in AWS Sagemaker?

Would really appreciate any input on this. Thanks heaps.

Top answer
1 of 1
1
The situation you're describing, where your object detection model training has suddenly slowed down without any changes to the model itself, is unusual and could be caused by several factors. First, it's important to note that the G5.12xlarge instance you're using is indeed suitable for object detection model training. These instances are part of the GPU families recommended for such tasks, offering good performance for deep learning workloads. However, the significant increase in iteration time from 1 second to 2.5 seconds across multiple instances suggests a systemic issue rather than an isolated incident. Here are a few possibilities to consider: 1. Instance availability and performance: There might be underlying hardware or capacity issues affecting the G5 instances in your region. AWS occasionally faces high demand for specific instance types, which can impact performance. 2. Network or storage bottlenecks: If your training data is being fetched from S3 or another storage service, there could be network congestion or storage performance issues slowing down data loading. 3. Background processes: Check if there are any new background processes or updates running on the instances that could be consuming resources. 4. Changes in the AWS environment: Recent updates or changes to the AWS infrastructure might have inadvertently affected performance. 5. Dataset changes: Even if you haven't changed the model, any changes to the training data or how it's being loaded could impact training speed. To troubleshoot this issue: 1. Monitor the GPU utilization and memory usage during training to ensure the GPUs are being fully utilized. 2. Try training on a different instance type, such as P3 or G4dn, to see if the problem persists. 3. Check CloudWatch metrics for your instances to identify any unusual patterns in CPU, network, or disk usage. 4. Verify that your training data is being accessed efficiently and there are no bottlenecks in data loading. 5. Consider reaching out to AWS support for a more in-depth investigation of the instance performance. If the problem continues, it may be worth exploring distributed training across multiple instances to potentially mitigate the slowdown and improve overall training time. **Sources** Object Detection - MXNet - Amazon SageMaker AI Scaling training - Amazon SageMaker AI