You train models on GPU in the SageMaker ecosystem via 2 different components:

  1. You can instantiate a GPU-powered SageMaker Notebook Instance, for example p2.xlarge (NVIDIA K80) or p3.2xlarge (NVIDIA V100). This is convenient for interactive development - you have the GPU right under your notebook and can run code on the GPU interactively and monitor the GPU via nvidia-smi in a terminal tab - a great development experience. However when you develop directly from a GPU-powered machine, there are times when you may not use the GPU. For example when you write code or browse some documentation. All that time you pay for a GPU that sits idle. In that regard, it may not be the most cost-effective option for your use-case.

  2. Another option is to use a SageMaker Training Job running on a GPU instance. This is a preferred option for training, because training metadata (data and model path, hyperparameters, cluster specification, etc) is persisted in the SageMaker metadata store, logs and metrics stored in Cloudwatch and the instance automatically shuts down itself at the end of training. Developing on a small CPU instance and launching training tasks using SageMaker Training API will help you make the most of your budget, while helping you retain metadata and artifacts of all your experiments. You can see here a well documented TensorFlow example

Answer from Olivier Cruchant on Stack Overflow
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker › pricing
SageMaker pricing - AWS
6 days ago - The model includes options for on-demand and reserved instances, with the latter providing cost savings for long-term commitments. For the most accurate and detailed pricing information, consult Amazon Redshift pricing. SageMaker Data Processing, which brings together capabilities from Amazon Athena, Amazon EMR, AWS Glue, and Amazon Managed Workflows for Apache Airflow (Amazon MWAA), offers a flexible, pay-as-you-go pricing model without upfront commitments or a minimum fee.
Discussions

amazon web services - AWS SageMaker on GPU - Stack Overflow
I managed to load the Jupyter Lab console on SageMaker and tried to find a GPU kernel since, I know it is the best for training neural networks. However, I could not find such kernel. ... When you create a new jupyter notebook instance, you have to select which machine you want to use, at that ... More on stackoverflow.com
🌐 stackoverflow.com
Why not all compute instance in sagemaker have gpus? how does a llm model supposed to run without it? and also Someone please explain the cost estimation
Machine learning is not only about LLMs and sagemaker is built to be universal. You can host a lot of predictive models like xgboost, decision trees and other on cpu. Even models like bert can be hosted on cpu (depending on latency requirements) More on reddit.com
🌐 r/LocalLLaMA
31
0
April 18, 2024
Which GPU instances are supported by the sagemaker algorithm forecasting-deepar?
I previously ran a hyperparameter tuning job for SageMaker DeepAR with the instance type ml.c5.18xlarge but it seems insufficient to complete the tuning job within the max_run time specified in my ... More on repost.aws
🌐 repost.aws
1
0
May 25, 2022
no gpu runtime available for sagemaker studio lab every time i try?
There is a shortage of GPU instances because there's a shortage of GPUs in the wider market unfortunately. :( Hopefully when crypto mining eventually implodes it will get better. More on reddit.com
🌐 r/aws
14
10
April 11, 2022
🌐
Saturn Cloud
saturncloud.io › sagemaker-pricing
Amazon SageMaker Pricing | Saturn Cloud
The details of Amazon SageMaker’s free tier pricing are in the table below. The Saturn Cloud price is the price per hour for the Saturn Cloud component, while the hosting price is the charge for the underlying AWS EC2 instances that the resources run on.
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker ai › pricing
SageMaker Pricing
6 days ago - For example, the P6e-GB200 UltraServer connects up to 18 p6e-gb200.36xlarge instances under one NVIDIA NVLink domain. With 4 NVIDIA Blackwell GPUs per instance, each P6e-GB200 UltraServer therefore supports 72 GPUs, enabling you to run your largest AI workloads with high performance on SageMaker.
Top answer
1 of 2
28

You train models on GPU in the SageMaker ecosystem via 2 different components:

  1. You can instantiate a GPU-powered SageMaker Notebook Instance, for example p2.xlarge (NVIDIA K80) or p3.2xlarge (NVIDIA V100). This is convenient for interactive development - you have the GPU right under your notebook and can run code on the GPU interactively and monitor the GPU via nvidia-smi in a terminal tab - a great development experience. However when you develop directly from a GPU-powered machine, there are times when you may not use the GPU. For example when you write code or browse some documentation. All that time you pay for a GPU that sits idle. In that regard, it may not be the most cost-effective option for your use-case.

  2. Another option is to use a SageMaker Training Job running on a GPU instance. This is a preferred option for training, because training metadata (data and model path, hyperparameters, cluster specification, etc) is persisted in the SageMaker metadata store, logs and metrics stored in Cloudwatch and the instance automatically shuts down itself at the end of training. Developing on a small CPU instance and launching training tasks using SageMaker Training API will help you make the most of your budget, while helping you retain metadata and artifacts of all your experiments. You can see here a well documented TensorFlow example

2 of 2
-2

If you want to train your model in a Sagemaker Studio notebook make sure you choose both a GPU instance type and GPU Image type: https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-images.html https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-instance-types.html

For example for Tensorflow GPU:

🌐
Cloudchipr
cloudchipr.com › blog › amazon-sagemaker-pricing
Amazon SageMaker AI Pricing: Detailed Breakdown and Ultimate Guide
This includes services like SageMaker Studio notebooks, notebook instances, Processing jobs, Data Wrangler, Training, Real-Time Inference, and Batch Transform. The plans work across any instance family, size, or region, providing you with the flexibility to adapt your infrastructure without worrying about varying costs. ... For example, if you start with a ml.c5.xlarge CPU instance in US East (Ohio) and later switch to a ml.inf1 GPU instance in US West (Oregon) for inference tasks, your Savings Plans will continue to apply the discounted rate automatically.
Find elsewhere
🌐
AWS Builder Center
builder.aws.com › content › 2gQdprHlr1l6dzICRquva417Xhp › unlocking-powerful-performance-in-amazon-sagemaker-notebooks
Unlocking Powerful Performance in Amazon SageMaker ...
May 13, 2024 - Connect with builders who understand your journey. Share solutions, influence AWS product development, and access useful content that accelerates your growth. Your community starts here.
🌐
Holori
holori.com › accueil › blog › ultimate aws sagemaker pricing guide
Holori - Ultimate AWS Sagemaker pricing guide
October 23, 2024 - Training a model is one of the most resource-intensive and costly aspects. SageMaker charges based on: Training instance type: More powerful instances like ml.p3.16xlarge with GPUs will cost significantly more than CPU-based instances.
🌐
RAPIDS
docs.rapids.ai › deployment › stable › cloud › aws › sagemaker
SageMaker — RAPIDS Deployment Documentation documentation
Choose Applications and IDEs > Notebooks > Create notebook instance. If a field is not mentioned below, leave the default values: Notebook instance name = Name of the notebook instance · Notebook instance type = Type of notebook instance. Select a RAPIDS-compatible GPU (see the RAPIDS docs) ...
🌐
AWS
aws.amazon.com › about-aws › whats-new › 2025 › 06 › price-reductions-amazon-sagemaker-ai-gpu-accelerated-instances
Announcing price reductions for Amazon SageMaker AI GPU-accelerated instances - AWS
June 12, 2025 - Following the announcement of the price reduction for Amazon EC2 NVIDIA GPU-accelerated instances, we are announcing up to 45% price reduction for Amazon SageMaker AI instances to enable more cost-efficient generative AI model development. The price reduction for SageMaker AI instances includes P4 (P4d and P4de) and P5 (P5, P5e and P5en) instance types.
🌐
ClassMethod
dev.classmethod.jp › articles › how-to-choose-the-right-amazon-sagemaker-instance-type
How to Choose the Right Amazon SageMaker Instance Type | DevelopersIO
ml.p3: These instances are designed for complex deep learning workloads that require high-performance GPUs, such as training large neural networks on large datasets. You could use these instances to train a deep learning model to generate realistic ...
🌐
CloudZero
cloudzero.com › home › blog › amazon sagemaker pricing guide: 2025 costs (and savings)
Amazon SageMaker Pricing Guide: 2025 Costs (And Savings)
August 15, 2025 - Amazon SageMaker Pricing Explained How Does Amazon SageMaker Pricing Work? How Much Does Amazon SageMaker Really Cost? How To Choose The Best SageMaker Instances To Optimize Costs Simplify Instance Selection With CloudZero Advisor Track, Analyze, And Reduce SageMaker Costs In Real Time Amazon SageMaker Pricing FAQs
🌐
Saturn Cloud
saturncloud.io › blog › how-to-use-aws-sagemaker-on-gpu-for-highperformance-machine-learning
How to Use AWS SageMaker on GPU for HighPerformance Machine Learning | Saturn Cloud Blog
March 12, 2024 - Open the SageMaker console and click on “Notebook instances”. Click on “Create notebook instance”. Choose an instance type that has a GPU, such as “ml.p3.2xlarge”.
🌐
AWS
calculator.aws
AWS Pricing Calculator
AWS Pricing Calculator lets you explore AWS services, and create an estimate for the cost of your use cases on AWS.
🌐
Medium
datapsycho.medium.com › how-to-misuse-aws-sagemaker-notebook-instances-c9900452b7aa
How to Misuse AWS Sagemaker Notebook Instances | by MrDataPsycho | Medium
June 5, 2022 - That's why, exactly, Sagemaker processing jobs and training jobs exist. You should write your code in such a way that it is capable of running on CPU and GPU, it is just an If-Else condition. Then you should develop your training pipeline in CPU and launch a GPU container for full scale training, with what ever biggest GPU instance you want.
🌐
Massed Compute
massedcompute.com › home › faq answers
How do I choose the right instance type for my A100 GPU in AWS SageMaker? - Massed Compute
July 31, 2025 - Optimize your AWS SageMaker A100 GPU performance by choosing the right instance type with our expert guide and selection tips.
🌐
AWS
docs.aws.amazon.com › aws prescriptive guidance › patterns › ai & machine learning › train and deploy a custom gpu-supported ml model on amazon sagemaker
Train and deploy a custom GPU-supported ML model on Amazon SageMaker - AWS Prescriptive Guidance
This pattern helps you train and build a custom GPU-supported ML model using Amazon SageMaker. It provides steps to train and deploy a custom CatBoost model built on an open-source Amazon reviews dataset. You can then benchmark its performance on a p3.16xlarge Amazon Elastic Compute Cloud (Amazon EC2) instance...