Yes, deploying the model to an Endpoint will create a separate instance - which will stay active until the Endpoint is deleted. When you invoke your model from the notebook, you're calling an API and the actual inference work will happen on the endpoint.

In general when using SageMaker I'd suggest keeping your notebook instances small (e.g. an ml.t3.medium) and use the on-demand infrastructure for heavier tasks: SageMaker Processing, Training, and Batch inference ("transform") jobs all also run on separate compute which automatically shuts down as soon as the job is completed. In cases where you do need more resources, can temporarily spin up a bigger notebook and shut it down again.

Answer from dingus on Stack Overflow
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker › pricing
SageMaker pricing - AWS
6 days ago - Gain unified access to all your data whether it’s stored in data lakes, data warehouses, or federated data sources, with governance built-in to meet your enterprise security needs. When using Amazon SageMaker, AWS will charge you the pricing for each AWS service that you use.
🌐
TutorialsPoint
tutorialspoint.com › sagemaker › sagemaker-pricing.htm
Amazon SageMaker - Pricing
Like training jobs, higher-performing instances such as GPUs will be costly. The billing is calculated on an hourly basis for each endpoint. Amazon SageMaker is dependent on Amazon S3 for storing datasets. You will be charged for data storage in S3, as well as any data transfers between S3 ...
Discussions

amazon web services - Endpoint Instance cost - Stack Overflow
Im new to AWS sagemaker. Im looking to deploy Jumpstart LLM model (Falcon-40b), to do that I have used ml.g5.24xlarge Notebook instance. My question is, when I deploy that model and created a endpoint, will this create new ml.g5.24xlarge instance or use the existing notebook instance? More on stackoverflow.com
🌐 stackoverflow.com
amazon web services - Which is lower cost, Sagemaker or EC2? - Stack Overflow
For example, ml.p2.8xlarge for training job at ap-northeast on Sagemaker takes 16.408 USD / hour, but p2.8xlarge for on-demand at ap-northeast on Ec2 takes 12.336 USD/hour. Is it cheap to just trai... More on stackoverflow.com
🌐 stackoverflow.com
SageMaker rough costs when deploying object detection model API
SageMaker Inference is one of the few services that charges not only for the data out but also data in. It is priced $0.016 per GB. Perhaps that's the associated cost? Regarding the cheapest machine question - perhaps you can use SageMaker Inference Recommender to test it out? Or just make your own load tests with tool such as Locust, should be fairly straightforward. More on reddit.com
🌐 r/aws
5
5
August 15, 2023
SageMaker costs for AI model
Thank you for providing details about your SageMaker deployment. I'll explain the pricing structure and address your questions. For your deployment using an ml.g5.xlarge instance, you will be charged based on the time your endpoint is running, regardless of whether it's processing requests or idle. More on repost.aws
🌐 repost.aws
2
0
March 18, 2025
🌐
Cloudforecast
cloudforecast.io › home › aws pricing & cost optimization › aws sagemaker pricing guide – cost breakdown & optimization tips
AWS SageMaker Pricing Guide - Cost Breakdown & Optimization Tips | CloudForecast
October 29, 2025 - Confused by AWS SageMaker pricing? This guide breaks down costs, real-world examples, and expert tips to forecast and optimize your SageMaker bill.
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker ai › pricing
SageMaker Pricing
6 days ago - Amazon SageMaker Asynchronous Inference charges you for instances used by your endpoint. When not actively processing requests, you can configure auto-scaling to scale the instance count to zero to save on costs.
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › deploy models for inference › best practices › inference cost optimization best practices
Inference cost optimization best practices - Amazon SageMaker AI
The following content provides techniques and considerations for optimizing the cost of endpoints. You can use these recommendations to optimize the cost for both new and existing endpoints.
🌐
Cloudchipr
cloudchipr.com › blog › amazon-sagemaker-pricing
Amazon SageMaker AI Pricing: Detailed Breakdown and Ultimate Guide
SageMaker Serverless Inference pricing is based on the compute capacity used (billed per millisecond) and the amount of data processed. Costs depend on the selected memory configuration, with an option to add Provisioned Concurrency for predictable ...
Find elsewhere
🌐
JFrog ML
qwak.com › post › the-hidden-cost-of-sagemaker
Uncovering The Hidden Costs Behind Sagemaker's Pricing | JFrog ML
On top of breaking down the different products offered and the cost structure entailed, we'll take a look at the complexity that comes with this offering and how SageMaker's pricing can easily skyrocket accordingly. In the following use case, we'll construct a machine learning model capable of real-time operation. This model will undergo periodic training and subsequently be deployed as a real-time endpoint.
🌐
Concurrencylabs
concurrencylabs.com › blog › sagemaker-ai-cost-savings
How To Keep SageMaker AI Cost Under Control and Avoid Bad Billing Surprises when doing Machine Learning in AWS - Concurrency Labs
December 4, 2024 - SageMaker Auto Scaling is also a feature that can save significant money when hosting an inference endpoint. It can be configured to add or remove instances based on available CloudWatch metrics, such as the ones related to invocations per instance or CPU/Memory utilization.
Top answer
1 of 5
41

I've listed some pros and cons from experience...

..., as opposed to marketing materials. If I were to guess, I'd say you have a much higher chance to experience all the drawbacks of SageMaker, than any one of the benefits.

Drawbacks

  • Cloud vendor lock in: free improvements in the open source projects in the future and better prices in competitor vendors are difficult to get. Why don't AWS invest developers in JupyterLab, they have done limited work in open source. Find some great points here, where people have experienced companies using as few AWS services as possible with good effect.
  • SageMaker instances are currently 40% more expensive than their EC2 equivalent.
  • Slow startup, it will break your workflow if every time you start the machine, it takes ~5 minutes. SageMaker Studio apparently speeds this up, but not without other issues. This is completely unacceptable when you are trying to code or run applications.
  • SageMaker Studio is the first thing they show you when you enter SageMaker console. It should really be the last thing you consider.
    • SageMaker Studio is more limited than SageMaker notebook instances. For example, you cannot mount an EFS drive.I spoke to a AWS solutions architect, and he confirmed this was impossible (after looking for the answer all over the internet). It is also very new, so there is almost no support on it, even by AWS developers.
  • Worsens the disorganised Notebooks problem. Notebooks in a file system can be much easier to organise than using JupyterLab. With SageMaker Studio, a new volume gets created and your notebooks lives in there. What happens when you have more than 1...
  • Awful/ limited terminal experience, coupled with tedious configuration (via Lifecycle configuration scripts, which require the Notebook to be turned off just to edit these scripts). Additionally, you cannot set any lifecycle configurations for Studio Notebooks.
  • SageMaker endpoints are limited compared to running your own server in an EC2 instance.
  • It may seem like it allows you to skip certain challenges, but in fact it provides you with more obscure challenges that no one has solved. Good luck solving them. The rigidity of SageMaker and lack of documentation means lots of workarounds and pain. This is very expensive.

Benefits

These revolve around the SageMaker SDK (the Sagemaker console and SageMaker SDK) (please comment or edit if you found any more benefits)

  • Built in algorithms (which you can easily just import in your machine learning framework of choice): I would say this is worse than using open source alternatives.
  • Training many models easily during hyperparameter search YouTube video by AWS (a fast way to spend money)
  • Easily create machine learning related AWS mechanical turk tasks. However, mturk is very limited within SageMaker, so youre better off going to mturk yourself.

My suggestion

If you're thinking about ML on the cloud, don't use SageMaker. Spin up a VM with a prebuilt image that has PyTorch/ TensorFlow and JupyterLab and get the work done.

2 of 5
17

You are correct about EC2 being cheaper than Sagemaker. However you have to understand their differences.

  • EC2 provides you computing power
  • Sagemaker (try to) provides a fully configured environment and computing power with a seamless deployment model for you to start training your model on day one

If you look at Sagemaker's overview page, it comes with Jupyter notebooks, pre-installed machine learning algorithms, optimized performance, seamless rollout to production etc.

Note that this is the same as self-hosting a EC2 MYSQL server and utilizing AWS managed RDS MYSQL. Managed services always appears to be more expensive, but if you factor in the time you have to spent maintaing server, updating packages etc., the extra 30% cost may be worth it.

So in conclusion if you rather save some money and have the time to set up your own server or environment, go for EC2. If you do not want to be bothered with these work and want to start training as soon as possible, use Sagemaker.

🌐
Reddit
reddit.com › r/aws › sagemaker rough costs when deploying object detection model api
r/aws on Reddit: SageMaker rough costs when deploying object detection model API
August 15, 2023 -

Hi all,

Use case - I've a mobile app where for roughly 30-45 seconds in total, the user sends every 2-3 seconds an image to AWS Gateway API, this image then is passed to pre-trained model and returns results. Initial idea was to retrieve real-time results, but for phase 1 we decided to simply allow user to focus on on the object and then send an image and retrieve result - in UI user is prompt with simple dialog (is it A object or not), so no real-time processing is needed for now.

The bill surprised me (SageMaker Inference using Endpoint):

$0.00 for Host:ml.m5.xlarge per hour under monthly free tier (125hrs)

$0.245 per Hosting ml.m5.xlarge hour in EU (Stockholm) (52.573 Hrs) = USD 12.88. If a little bit more than 2 days is costing me around 13 USD I have to be doing something awfully wrong. The app currently is being tested only by 2-3 people that have made in total around 300-400 endpoint calls.

I mean, yes, I could switch to ml.m6g.large that would cost me around $0.09 - how much people would that handle for my use-case? Nonetheless, even switching out to smaller machine seems to be costly as hell. Any suggestions? Maybe for my use case I need to use other service?

🌐
Medium
generativeai.pub › the-cost-of-inference-aws-sagemaker-vs-ec2-c7ce5d9c99d2
The Cost of Inference: AWS SageMaker vs. EC2 | by Tushar Tiwari | Generative AI
January 20, 2025 - To compare the costs, I used the AWS Pricing Calculator and created a detailed estimate for both deployment options. The results are clear: deploying the same machine learning model on an EC2 instance is significantly cheaper than using a SageMaker Inference Endpoint.
🌐
AWS
aws.amazon.com › blogs › machine-learning › part-5-analyze-amazon-sagemaker-spend-and-determine-cost-optimization-opportunities-based-on-usage-part-5-hosting
Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 5: Hosting | Artificial Intelligence
May 30, 2023 - If you have several under-utilized endpoint instances, consider hosting options such as multi-model endpoints (MMEs), multi-container endpoints (MCEs), and serial inference pipelines to consolidate usage to fewer endpoint instances. For real-time and asynchronous inference model deployment, you can optimize cost and performance by deploying models on SageMaker using AWS Graviton. AWS Graviton is a family of processors designed by AWS that provide the best price performance and are more energy efficient than their x86 counterparts.
🌐
Amazon Web Services
amazonaws.cn › en › sagemaker › pricing
SageMaker Pricing
6 days ago - For built-in rules with ml.m5.xlarge instance in China (Ningxia) Region, you get up to 30 hours of monitoring aggregated across all endpoints each month, at no charge. Usage in China (Beijing) Region, usage beyond 30 hours in China (Ningxia) Region, or usage for other ML instance types will ...
🌐
Saturn Cloud
saturncloud.io › sagemaker-pricing
Amazon SageMaker Pricing | Saturn Cloud
The details of Amazon SageMaker’s free tier pricing are in the table below. The Saturn Cloud price is the price per hour for the Saturn Cloud component, while the hosting price is the charge for the underlying AWS EC2 instances that the resources run on.
🌐
AWS re:Post
repost.aws › questions › QUTgC3561MQKaWwyazLh_YXQ › sagemaker-costs-for-ai-model
SageMaker costs for AI model | AWS re:Post
March 18, 2025 - Thank you for providing details about your SageMaker deployment. I'll explain the pricing structure and address your questions. For your deployment using an ml.g5.xlarge instance, you will be charged based on the time your endpoint is running, regardless of whether it's processing requests or idle.
🌐
Medium
medium.com › @mahammadkhadir2 › lets-understand-how-amazon-pricing-works-sagemaker-best-practices-for-right-sizing-compute-a3aaef150531
Let’s Understand how Amazon SageMakerpricing works, SageMaker best practices for right-sizing compute resources for different stages of an ML project. | by Khadir Mahammad | Medium
January 25, 2024 - Amazon SageMaker Studio is a fully integrated ML development environment with a managed Jupyter Notebook app experience, now accessible for free, with payment only for used AWS services. Notebook Instances are fully managed compute instances running Jupyter Notebook, handling ML workflows. Prices for compute instances are the same for both Studio and on-demand instances.
🌐
LiteLLM
docs.litellm.ai › supported models & providers
Providers | liteLLM
Selecting openai as the provider routes your request to an OpenAI-compatible endpoint using the upstream ... LiteLLM supports all anthropic models. LiteLLM supports All Sagemaker Huggingface Jumpstart Models
🌐
AWS
calculator.aws
AWS Pricing Calculator
AWS Pricing Calculator lets you explore AWS services, and create an estimate for the cost of your use cases on AWS.