I've listed some pros and cons from experience...
..., as opposed to marketing materials. If I were to guess, I'd say you have a much higher chance to experience all the drawbacks of SageMaker, than any one of the benefits.
Drawbacks
- Cloud vendor lock in: free improvements in the open source projects in the future and better prices in competitor vendors are difficult to get. Why don't AWS invest developers in JupyterLab, they have done limited work in open source. Find some great points here, where people have experienced companies using as few AWS services as possible with good effect.
- SageMaker instances are currently 40% more expensive than their EC2 equivalent.
- Slow startup, it will break your workflow if every time you start the machine, it takes ~5 minutes. SageMaker Studio apparently speeds this up, but not without other issues. This is completely unacceptable when you are trying to code or run applications.
- SageMaker Studio is the first thing they show you when you enter SageMaker console. It should really be the last thing you consider.
- SageMaker Studio is more limited than SageMaker notebook instances. For example, you cannot mount an EFS drive.I spoke to a AWS solutions architect, and he confirmed this was impossible (after looking for the answer all over the internet). It is also very new, so there is almost no support on it, even by AWS developers.
- Worsens the disorganised Notebooks problem. Notebooks in a file system can be much easier to organise than using JupyterLab. With SageMaker Studio, a new volume gets created and your notebooks lives in there. What happens when you have more than 1...
- Awful/ limited terminal experience, coupled with tedious configuration (via Lifecycle configuration scripts, which require the Notebook to be turned off just to edit these scripts). Additionally, you cannot set any lifecycle configurations for Studio Notebooks.
- SageMaker endpoints are limited compared to running your own server in an EC2 instance.
- It may seem like it allows you to skip certain challenges, but in fact it provides you with more obscure challenges that no one has solved. Good luck solving them. The rigidity of SageMaker and lack of documentation means lots of workarounds and pain. This is very expensive.
Benefits
These revolve around the SageMaker SDK (the Sagemaker console and SageMaker SDK) (please comment or edit if you found any more benefits)
- Built in algorithms (which you can easily just import in your machine learning framework of choice): I would say this is worse than using open source alternatives.
- Training many models easily during hyperparameter search YouTube video by AWS (a fast way to spend money)
- Easily create machine learning related AWS mechanical turk tasks. However, mturk is very limited within SageMaker, so youre better off going to mturk yourself.
My suggestion
If you're thinking about ML on the cloud, don't use SageMaker. Spin up a VM with a prebuilt image that has PyTorch/ TensorFlow and JupyterLab and get the work done.
You are correct about EC2 being cheaper than Sagemaker. However you have to understand their differences.
- EC2 provides you computing power
- Sagemaker (try to) provides a fully configured environment and computing power with a seamless deployment model for you to start training your model on day one
If you look at Sagemaker's overview page, it comes with Jupyter notebooks, pre-installed machine learning algorithms, optimized performance, seamless rollout to production etc.
Note that this is the same as self-hosting a EC2 MYSQL server and utilizing AWS managed RDS MYSQL. Managed services always appears to be more expensive, but if you factor in the time you have to spent maintaing server, updating packages etc., the extra 30% cost may be worth it.
So in conclusion if you rather save some money and have the time to set up your own server or environment, go for EC2. If you do not want to be bothered with these work and want to start training as soon as possible, use Sagemaker.
Hey guys, I am trying to do some cost estimation in Sagemaker for an AI chatbot I am building. The AI chatbot will be using the Mixtral-8x7b-Instruct quantized model downloaded from Hugging Face. And I will be using Sagemaker endpoints for inference.
When I went into the AWS Pricing Calculator website (https://calculator.aws/#/) and selected Sagemaker, I was presented with different options to choose from like Sagemaker Studio Notebooks, RStudio on Sagemaker, SageMaker On-Demand Notebook Instances etc (see the link below).
https://imgur.com/a/KVSHINe
For my chatbot that I had described above, how would I know which of these options to select to do my pricing estimate?
Would really appreciate any help with this. Many thanks!
I've been looking into whether it would be sensible to use AWS Sagemaker for my GPU inference workload on a T4 GPU. It can tolerate multiple minutes of downtime and delays, and has large binary multimedia files as input to the model. Autoscaling is a requirement based on load.
AWS Sagemaker Asynchronous endpoints look very attractive in that they manage the container orchestration, autoscaling, queueing of requests and provide various convenience utilities for maintaining, monitoring and upgrading models in production.
In us-east-1, I calculate the following:
| Service | Instance Type | Hourly Price | Monthly Price |
|---|---|---|---|
| EC2 On Demand | g4dn.xlarge | $0.526 | $378 |
| EC2 Annual Reservation | g4dn.xlarge | $0.309 (upfront) | $229 |
| EC2 Spot | g4dn.xlarge | ~$0.22 | ~$156 |
| Sagemaker On Demand | ml.g4dn.xlarge | $0.7364 | $530 |
| Sagemaker On Demand with Saving Plan | ml.g4dn.xlarge | $0.4984 | $358 |
From what I can see however, it would come with a significant cost (likely prohibitively high) to use Sagemaker versus using EKS/ECS to host the model and SQS to provide the queueing of requests. I appreciate that's the price one pays for a managed service, but I wanted to confirm a few things with the community to make sure I'm not missing anything in my cost estimations:
Is it correct that Sagemaker does not support spot instances for inference at all? (I appreciate they support it for training)
Is it correct that one can apply a savings plan to inference endpoints and that it would be Service classified as "Hosting" on this page https://aws.amazon.com/savingsplans/ml-pricing/ ? It's confusing as "Hosting Service" is not a term they use in the development docs to describe inference endpoints per say.
Is it correct one cannot reserve instances for a year for Sagemaker like with EC2 to cut costs, and thus the above Savings Plan is the cheapest you can get a T4 GPU.
I ask this as it seems surprising there's ~40% markup in cost for using this managed service versus EC2, and despite what the AWS Report on TCO says, I can't quite see it saving me that amount of money versus us setting up a EKS/ECS solution for this problem. I can see however that TCO report is also largely considering training infra, which indeed does likely bring a lot of value not relevant here.