If you're bringing your own model, you don't need to use SageMaker. I've got mixtral-8x7b.Q5_K_M running on a g5.2xlarge (24GB of VRAM, 32 GB RAM), and it's $1.212 per hour in us-east-1. Just be sure to pick the Ubuntu deep learning AMI, as it has the required GPU drivers. Be aware that you're paying for the server whenever it's running - not just when you're interacting with the model. If you forget to shut it off and leave it for a month, that's $860... Answer from kingtheseus on reddit.com
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker › pricing
SageMaker pricing - AWS
1 week ago - Pricing is based on the number of nodes and instance hours used, with costs varying by node type. The model includes options for on-demand and reserved instances, with the latter providing cost savings for long-term commitments. For the most accurate and detailed pricing information, consult Amazon Redshift pricing. SageMaker Data Processing, which brings together capabilities from Amazon Athena, Amazon EMR, AWS Glue, and Amazon Managed Workflows for Apache Airflow (Amazon MWAA), offers a flexible, pay-as-you-go pricing model without upfront commitments or a minimum fee.
🌐
AWS re:Post
repost.aws › questions › QUTgC3561MQKaWwyazLh_YXQ › sagemaker-costs-for-ai-model
SageMaker costs for AI model | AWS re:Post
February 14, 2025 - As of my last update, this was approximately $1.212 per hour, but please check the current pricing as it may have changed. Regarding monthly costs, if you keep the endpoint running 24/7 for a month, you could expect to pay around $876 (30 days ...
🌐
Reddit
reddit.com › r/aws › sagemaker rough costs when deploying object detection model api
r/aws on Reddit: SageMaker rough costs when deploying object detection model API
August 11, 2023 -

Hi all,

Use case - I've a mobile app where for roughly 30-45 seconds in total, the user sends every 2-3 seconds an image to AWS Gateway API, this image then is passed to pre-trained model and returns results. Initial idea was to retrieve real-time results, but for phase 1 we decided to simply allow user to focus on on the object and then send an image and retrieve result - in UI user is prompt with simple dialog (is it A object or not), so no real-time processing is needed for now.

The bill surprised me (SageMaker Inference using Endpoint):

$0.00 for Host:ml.m5.xlarge per hour under monthly free tier (125hrs)

$0.245 per Hosting ml.m5.xlarge hour in EU (Stockholm) (52.573 Hrs) = USD 12.88. If a little bit more than 2 days is costing me around 13 USD I have to be doing something awfully wrong. The app currently is being tested only by 2-3 people that have made in total around 300-400 endpoint calls.

I mean, yes, I could switch to ml.m6g.large that would cost me around $0.09 - how much people would that handle for my use-case? Nonetheless, even switching out to smaller machine seems to be costly as hell. Any suggestions? Maybe for my use case I need to use other service?

🌐
Reddit
reddit.com › r/aws › how to do cost estimation for amazon sagemaker
r/aws on Reddit: How to do cost estimation for Amazon Sagemaker
March 15, 2024 -

Hey guys, I am trying to do some cost estimation in Sagemaker for an AI chatbot I am building. The AI chatbot will be using the Mixtral-8x7b-Instruct quantized model downloaded from Hugging Face. And I will be using Sagemaker endpoints for inference.

When I went into the AWS Pricing Calculator website (https://calculator.aws/#/) and selected Sagemaker, I was presented with different options to choose from like Sagemaker Studio Notebooks, RStudio on Sagemaker, SageMaker On-Demand Notebook Instances etc (see the link below).

https://imgur.com/a/KVSHINe

For my chatbot that I had described above, how would I know which of these options to select to do my pricing estimate?

Would really appreciate any help with this. Many thanks!

🌐
AWS
aws.amazon.com › blogs › machine-learning › part-5-analyze-amazon-sagemaker-spend-and-determine-cost-optimization-opportunities-based-on-usage-part-5-hosting
Analyze Amazon SageMaker spend and determine cost optimization opportunities based on usage, Part 5: Hosting | Artificial Intelligence
May 30, 2023 - The cost of SageMaker real-time endpoints is based on the per instance-hour consumed for each instance while the endpoint is running, the cost of GB-month of provisioned storage (EBS volume), as well as the GB data processed in and out of the ...
🌐
Holori
holori.com › accueil › blog › ultimate aws sagemaker pricing guide
Holori - Ultimate AWS Sagemaker pricing guide
October 23, 2024 - Duration: Billed per second, so longer training times increase costs. Hyperparameter tuning: If you use SageMaker’s Automatic Model Tuning, costs can escalate with the number of training jobs launched during the tuning process. For deploying and serving predictions, you have several options, including: Real-time inference endpoints: Instances dedicated to real-time predictions are billed by the hour...
🌐
CloudZero
cloudzero.com › home › blog › amazon sagemaker pricing guide: 2025 costs (and savings)
Amazon SageMaker Pricing Guide: 2025 Costs (And Savings)
August 15, 2025 - If you’re interested in specific ... They’re great for testing, notebooks, and light ML tasks. ml.t3.medium costs around $0.05/hour....
🌐
Cloudforecast
cloudforecast.io › home › aws pricing & cost optimization › aws sagemaker pricing guide – cost breakdown & optimization tips
AWS SageMaker Pricing Guide - Cost Breakdown & Optimization Tips | CloudForecast
July 28, 2025 - In the On-demand pricing section on the SageMaker AI pricing page, you’ll notice that costs vary greatly by service. You can reference this table for exact pricing, but here’s a summary of how you’re billed based on the type of service you’re using: We’ll walk through specific pricing examples soon. For now, understand that compute instances are billed by the hour, and storage and data transfers out are charged per ...
Find elsewhere
🌐
Cloudchipr
cloudchipr.com › blog › amazon-sagemaker-pricing
Amazon SageMaker AI Pricing: Detailed Breakdown and Ultimate Guide
SageMaker Serverless Inference pricing is based on the compute capacity used (billed per millisecond) and the amount of data processed. Costs depend on the selected memory configuration, with an option to add Provisioned Concurrency for predictable performance. ... Provisioned Concurrency ensures predictable performance by keeping endpoints warm for concurrent requests.
🌐
Amazon Web Services
amazonaws.cn › en › sagemaker › pricing
SageMaker Pricing
1 week ago - For building, training, and deploying ... Amazon SageMaker, on-demand ML instances let you pay for machine learning compute capacity by the second, with no long-term commitments. This frees you from the costs and complexities of planning, purchasing, and maintaining hardware, and transforms what are commonly large fixed costs into much smaller variable costs. Pricing is per instance-hour consumed for ...
🌐
Cloudexmachina
cloudexmachina.io › blog › sagemaker-pricing
AWS SageMaker Pricing: The Developer’s Guide to Smart ML Spending
September 16, 2025 - That means you're billed every hour they're running, even if they're sitting idle. Depending on the instance type (especially for inference-optimized options like ml.inf1 or GPU-backed classes), costs can escalate quickly.
🌐
AWS re:Post
repost.aws › questions › QUaxWgMnwrSaq9IYZ5L0G1SA › sagemaker-real-time-inference-pricing-clarification
Sagemaker Real Time Inference pricing clarification | AWS re:Post
December 16, 2022 - In your case, you would be charged based on the per hour pricing of ml.c7g.16xlarge instance, and (2mb+2mb)*2mil for data in/out a month. Link to pricing examples can be found here. If your usage will be consistent for a period of time, do check out savings plan to save cost.
🌐
nOps
nops.io › blog › sagemaker-pricing-the-essential-guide
SageMaker Pricing: The Essential Guide | nOps
November 18, 2025 - In practice, most of your cost ... endpoints), how long those resources stay active, and any large datasets you store or process through SageMaker. The table below breaks down the key SageMaker services & components, typical instance types, rates, and key tips to optimize costs. While SageMaker On-Demand pricing offers flexibility, Savings Plans let you reduce costs if you have predictable workloads. With a Savings Plan, you commit to a certain level of usage (measured in $/hour) for one or ...
🌐
JFrog ML
qwak.com › post › the-hidden-cost-of-sagemaker
Uncovering The Hidden Costs Behind Sagemaker's Pricing | JFrog ML
On top of breaking down the different ... and the cost structure entailed, we'll take a look at the complexity that comes with this offering and how SageMaker's pricing can easily skyrocket accordingly. In the following use case, we'll construct a machine learning model capable of real-time operation. This model will undergo periodic training and subsequently be deployed as a real-time endpoint...
🌐
TutorialsPoint
tutorialspoint.com › sagemaker › sagemaker-pricing.htm
Amazon SageMaker - Pricing
Hosting starts once your model is trained and deployed to a Amazon SageMaker endpoint. The fees of hosting depend on the instance type used for deployment and the number of active endpoints. Like training jobs, higher-performing instances such as GPUs will be costly. The billing is calculated on an hourly basis for each endpoint.
🌐
CloudOptimo
cloudoptimo.com › home › blog › mastering amazon sagemaker pricing
Mastering Amazon SageMaker Pricing
March 13, 2025 - Explore Amazon SageMaker pricing with our guide on cost optimization. Learn about instance types, storage, and strategies to manage your ML budget efficiently
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › deploy models for inference › best practices › inference cost optimization best practices
Inference cost optimization best practices - Amazon SageMaker AI
Use batch inference for workloads ... need a persistent endpoint). You pay for the instance for the duration of the batch inference job. If you have a consistent usage level across all SageMaker AI services, you can opt in to a SageMaker AI Savings Plan to help reduce your costs by up to 64%. ... provide a flexible pricing model for Amazon SageMaker AI, in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a one-year ...