Videos
Hey guys, I am trying to do some cost estimation in Sagemaker for an AI chatbot I am building. The AI chatbot will be using the Mixtral-8x7b-Instruct quantized model downloaded from Hugging Face. And I will be using Sagemaker endpoints for inference.
When I went into the AWS Pricing Calculator website (https://calculator.aws/#/) and selected Sagemaker, I was presented with different options to choose from like Sagemaker Studio Notebooks, RStudio on Sagemaker, SageMaker On-Demand Notebook Instances etc (see the link below).
https://imgur.com/a/KVSHINe
For my chatbot that I had described above, how would I know which of these options to select to do my pricing estimate?
Would really appreciate any help with this. Many thanks!
Hi all,
Use case - I've a mobile app where for roughly 30-45 seconds in total, the user sends every 2-3 seconds an image to AWS Gateway API, this image then is passed to pre-trained model and returns results. Initial idea was to retrieve real-time results, but for phase 1 we decided to simply allow user to focus on on the object and then send an image and retrieve result - in UI user is prompt with simple dialog (is it A object or not), so no real-time processing is needed for now.
The bill surprised me (SageMaker Inference using Endpoint):
$0.00 for Host:ml.m5.xlarge per hour under monthly free tier (125hrs)
$0.245 per Hosting ml.m5.xlarge hour in EU (Stockholm) (52.573 Hrs) = USD 12.88. If a little bit more than 2 days is costing me around 13 USD I have to be doing something awfully wrong. The app currently is being tested only by 2-3 people that have made in total around 300-400 endpoint calls.
I mean, yes, I could switch to ml.m6g.large that would cost me around $0.09 - how much people would that handle for my use-case? Nonetheless, even switching out to smaller machine seems to be costly as hell. Any suggestions? Maybe for my use case I need to use other service?
I've been looking into whether it would be sensible to use AWS Sagemaker for my GPU inference workload on a T4 GPU. It can tolerate multiple minutes of downtime and delays, and has large binary multimedia files as input to the model. Autoscaling is a requirement based on load.
AWS Sagemaker Asynchronous endpoints look very attractive in that they manage the container orchestration, autoscaling, queueing of requests and provide various convenience utilities for maintaining, monitoring and upgrading models in production.
In us-east-1, I calculate the following:
| Service | Instance Type | Hourly Price | Monthly Price |
|---|---|---|---|
| EC2 On Demand | g4dn.xlarge | $0.526 | $378 |
| EC2 Annual Reservation | g4dn.xlarge | $0.309 (upfront) | $229 |
| EC2 Spot | g4dn.xlarge | ~$0.22 | ~$156 |
| Sagemaker On Demand | ml.g4dn.xlarge | $0.7364 | $530 |
| Sagemaker On Demand with Saving Plan | ml.g4dn.xlarge | $0.4984 | $358 |
From what I can see however, it would come with a significant cost (likely prohibitively high) to use Sagemaker versus using EKS/ECS to host the model and SQS to provide the queueing of requests. I appreciate that's the price one pays for a managed service, but I wanted to confirm a few things with the community to make sure I'm not missing anything in my cost estimations:
Is it correct that Sagemaker does not support spot instances for inference at all? (I appreciate they support it for training)
Is it correct that one can apply a savings plan to inference endpoints and that it would be Service classified as "Hosting" on this page https://aws.amazon.com/savingsplans/ml-pricing/ ? It's confusing as "Hosting Service" is not a term they use in the development docs to describe inference endpoints per say.
Is it correct one cannot reserve instances for a year for Sagemaker like with EC2 to cut costs, and thus the above Savings Plan is the cheapest you can get a T4 GPU.
I ask this as it seems surprising there's ~40% markup in cost for using this managed service versus EC2, and despite what the AWS Report on TCO says, I can't quite see it saving me that amount of money versus us setting up a EKS/ECS solution for this problem. I can see however that TCO report is also largely considering training infra, which indeed does likely bring a lot of value not relevant here.