sagemaker endpoint cost per month

How to do cost estimation for Amazon Sagemaker

reddit.com › r › aws › comments › 1bfff2l › how_to_do_cost_estimation_for_amazon_sagemaker

If you're bringing your own model, you don't need to use SageMaker. I've got mixtral-8x7b.Q5_K_M running on a g5.2xlarge (24GB of VRAM, 32 GB RAM), and it's $1.212 per hour in us-east-1. Just be sure to pick the Ubuntu deep learning AMI, as it has the required GPU drivers. Be aware that you're paying for the server whenever it's running - not just when you're interacting with the model. If you forget to shut it off and leave it for a month, that's $860... Answer from kingtheseus on reddit.com

AWS re:Post

repost.aws › questions › QUTgC3561MQKaWwyazLh_YXQ › sagemaker-costs-for-ai-model

SageMaker costs for AI model | AWS re:Post

March 18, 2025 - Regarding monthly costs, if you keep the endpoint running 24/7 for a month, you could expect to pay around $876 (30 days * 24 hours * $1.212/hour), plus any additional charges for data transfer or storage.

reddit.com › r/aws › how to do cost estimation for amazon sagemaker

r/aws on Reddit: How to do cost estimation for Amazon Sagemaker

March 15, 2024 -

Hey guys, I am trying to do some cost estimation in Sagemaker for an AI chatbot I am building. The AI chatbot will be using the Mixtral-8x7b-Instruct quantized model downloaded from Hugging Face. And I will be using Sagemaker endpoints for inference.

When I went into the AWS Pricing Calculator website (https://calculator.aws/#/) and selected Sagemaker, I was presented with different options to choose from like Sagemaker Studio Notebooks, RStudio on Sagemaker, SageMaker On-Demand Notebook Instances etc (see the link below).

https://imgur.com/a/KVSHINe

For my chatbot that I had described above, how would I know which of these options to select to do my pricing estimate?

Would really appreciate any help with this. Many thanks!

Videos

05:38

YouTube

Deploy Cohere Rerank #Multilingual From #AWS Marketplace Onto ...

Machine Learning in 15: Amazon SageMaker High-Performance Inference ...

September 22, 2023

11:05

YouTube

Optimizing your Machine Learning costs on Amazon SageMaker with ...

April 20, 2021

View all

Amazon Web Services

aws.amazon.com › machine learning › amazon sagemaker › pricing

SageMaker pricing - AWS

1 week ago - SageMaker Catalog follows a pay-as-you-go pricing model with no upfront commitments or minimum fees. Pricing is based on four key dimensions: requests, metadata storage, compute, and AI recommendations for generating business descriptions. Each account receives some free usage per billing month, including a set amount of metadata storage, API requests, and compute units.

CloudZero

cloudzero.com › home › blog › amazon sagemaker pricing guide: 2025 costs (and savings)

Amazon SageMaker Pricing Guide: 2025 Costs (And Savings)

August 15, 2025 - Although these options increase ... cost visibility and optimization efforts (complexity). Besides, SageMaker has some endpoints and service quotas you need to know about. Also, it’s challenging to choose the right SageMaker ML instance for your specific workload because instances vary in performance and ...

Amazon Web Services

aws.amazon.com › machine learning › amazon sagemaker ai › pricing

SageMaker Pricing

1 week ago - The total charges for this example would be $305.881 per month. Note, for built-in rules with ml.m5.xlarge instance, you get up to 30 hours of monitoring aggregated across all endpoints each month, at no charge. The model in example #5 is then deployed to production to two (2) ml.c5.xlarge ...

reddit.com › r/aws › sagemaker rough costs when deploying object detection model api

r/aws on Reddit: SageMaker rough costs when deploying object detection model API

August 14, 2023 -

Hi all,

Use case - I've a mobile app where for roughly 30-45 seconds in total, the user sends every 2-3 seconds an image to AWS Gateway API, this image then is passed to pre-trained model and returns results. Initial idea was to retrieve real-time results, but for phase 1 we decided to simply allow user to focus on on the object and then send an image and retrieve result - in UI user is prompt with simple dialog (is it A object or not), so no real-time processing is needed for now.

The bill surprised me (SageMaker Inference using Endpoint):

$0.00 for Host:ml.m5.xlarge per hour under monthly free tier (125hrs)

$0.245 per Hosting ml.m5.xlarge hour in EU (Stockholm) (52.573 Hrs) = USD 12.88. If a little bit more than 2 days is costing me around 13 USD I have to be doing something awfully wrong. The app currently is being tested only by 2-3 people that have made in total around 300-400 endpoint calls.

I mean, yes, I could switch to ml.m6g.large that would cost me around $0.09 - how much people would that handle for my use-case? Nonetheless, even switching out to smaller machine seems to be costly as hell. Any suggestions? Maybe for my use case I need to use other service?

Top answer

1 of 1

Yes, deploying the model to an Endpoint will create a separate instance - which will stay active until the Endpoint is deleted. When you invoke your model from the notebook, you're calling an API and the actual inference work will happen on the endpoint.

In general when using SageMaker I'd suggest keeping your notebook instances small (e.g. an ml.t3.medium) and use the on-demand infrastructure for heavier tasks: SageMaker Processing, Training, and Batch inference ("transform") jobs all also run on separate compute which automatically shuts down as soon as the job is completed. In cases where you do need more resources, can temporarily spin up a bigger notebook and shut it down again.

Cloudchipr

cloudchipr.com › blog › amazon-sagemaker-pricing

Amazon SageMaker AI Pricing: Detailed Breakdown and Ultimate Guide

SageMaker Serverless Inference pricing is based on the compute capacity used (billed per millisecond) and the amount of data processed. Costs depend on the selected memory configuration, with an option to add Provisioned Concurrency for predictable performance. ... Provisioned Concurrency ensures predictable performance by keeping endpoints warm for concurrent requests.

TrustRadius

trustradius.com › home › ai development platforms › amazon sagemaker › pricing

Amazon SageMaker Pricing 2025

We recommend going to Amazon SageMaker’s pricing page to look at their associated costs for different instances, memory, and vCPU. The cost is determined by the type of instance, memory, and vCPU you use per hour, billed monthly. Below is an image example of the pricing list for Sagemaker ...

Concurrencylabs

concurrencylabs.com › blog › sagemaker-ai-cost-savings

How To Keep SageMaker AI Cost Under Control and Avoid Bad Billing Surprises when doing Machine Learning in AWS - Concurrency Labs

This can significantly increase performance and lower cost for inference tasks. Configure SageMaker Studio instance automatic shut down after a pre-determined idle period of time. This is particularly important for organizations with a significant number of team members using SageMaker Studio. For inference endpoints, configuring Auto Scaling based on a schedule or usage metrics can optimize compute infrastructure cost.

Holori

holori.com › accueil › blog › ultimate aws sagemaker pricing guide

Holori - Ultimate AWS Sagemaker pricing guide

October 23, 2024 - Endpoints: Real-time inference can be costly if the endpoint is always running, even during low traffic periods. Network Costs: Moving data between AWS services (e.g., S3 to SageMaker) incurs egress fees. AWS SageMaker offers a variety of pricing models to accommodate different usage needs and budget constraints: SageMaker is available to try for free as part of the AWS Free Tier, offering new users two months ...

JFrog ML

qwak.com › post › the-hidden-cost-of-sagemaker

Uncovering The Hidden Costs Behind Sagemaker's Pricing | JFrog ML

On top of breaking down the different ... and the cost structure entailed, we'll take a look at the complexity that comes with this offering and how SageMaker's pricing can easily skyrocket accordingly. In the following use case, we'll construct a machine learning model capable of real-time operation. This model will undergo periodic training and subsequently be deployed as a real-time endpoint...

AWS re:Post

repost.aws › questions › QUaxWgMnwrSaq9IYZ5L0G1SA › sagemaker-real-time-inference-pricing-clarification

Sagemaker Real Time Inference pricing clarification | AWS re:Post

December 19, 2022 - In your case, you would be charged based on the per hour pricing of ml.c7g.16xlarge instance, and (2mb+2mb)*2mil for data in/out a month. Link to pricing examples can be found here.

Economize

economize.cloud › blog › amazon-sagemaker-pricing

Amazon SageMaker Pricing Explained: What You Need to Know

nOps

nops.io › blog › sagemaker-pricing-the-essential-guide

SageMaker Pricing: The Essential Guide | nOps

November 18, 2025 - Amazon SageMaker is one of the most popular managed machine learning services, enabling teams to build, train, and deploy models at scale. But while SageMaker simplifies the ML workflow, understanding its pricing can be tricky. Costs vary depending on instance types, training jobs, endpoints, and storage, making it essential to know how SageMaker pricing works […]

TutorialsPoint

tutorialspoint.com › sagemaker › sagemaker-pricing.htm

Amazon SageMaker - Pricing

The billing is calculated on an hourly basis for each endpoint. Amazon SageMaker is dependent on Amazon S3 for storing datasets. You will be charged for data storage in S3, as well as any data transfers between S3 and Amazon SageMaker. These costs depend on the size of the data being used and the amount of data transferred in and out of the cloud. Here are some ways with the help of which you can manage and reduce costs when using Amazon SageMaker −

Amazon Web Services

pages.awscloud.com › rs › 112-TZM-766 › images › Amazon_SageMaker_TCO_uf.pdf pdf

Please send questions to: [email protected] machine learning

Please send questions to: · [email protected]