For choosing a SageMaker hosted notebook type:

Do you plan to do all of your preprocessing of your data in-memory on the notebook, or do you plan to orchestrate ETL with external services?

If you're planning to load the dataset into memory on the notebook instance for exploration/preprocessing, the primary bottleneck here would be ensuring the instance has enough memory for your dataset. This would require at least the 16gb types (.xlarge) (full list of ML instance types available here). Further, depending on how compute intensive your pre-processing is, and your desired pre-processing completion time, you can opt for a compute optimized instance (c4, c5) to speed this up.


For the training job, specifically:

Using the Amazon SageMaker SDK, your training data will be loaded and distributed to the training cluster, allowing your training job to be completely separate from the instance your hosted notebook is running on.

Figuring out the ideal instance type for training will depend on whether your algorithm of choice/training job is memory, CPU, or IO bound. Since your dataset will likely be loaded onto your training cluster from S3, the instance you choose for your hosted notebook will have no bearing on the speed of your training job.


Broadly: When it comes to SageMaker notebooks, the best practice is to use your notebook as a "puppeteer" or orchestrator, that calls out to external services (AWS Glue or Amazon EMR for preprocessing, SageMaker for training, S3 for storage, etc). It is best to treat them as ephemeral forms of compute/storage for building and kicking off your experiment pipeline.

This will allow you to more closely pair compute, storage, and hosting resources/services with the demands for your workload, ultimately resulting in the best bang for your buck by not having you pay for latent or unused resources.


Answer from Nick Walsh on Stack Overflow
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › machine learning environments offered by amazon sagemaker ai › amazon sagemaker notebook instances
Amazon SageMaker notebook instances - Amazon SageMaker AI
The SageMaker notebook instances help create the environment by initiating Jupyter servers on Amazon Elastic Compute Cloud (Amazon EC2) and providing preconfigured kernels with the following packages: the Amazon SageMaker Python SDK, AWS SDK for Python (Boto3), AWS Command Line Interface (AWS ...
Top answer
1 of 1
7

For choosing a SageMaker hosted notebook type:

Do you plan to do all of your preprocessing of your data in-memory on the notebook, or do you plan to orchestrate ETL with external services?

If you're planning to load the dataset into memory on the notebook instance for exploration/preprocessing, the primary bottleneck here would be ensuring the instance has enough memory for your dataset. This would require at least the 16gb types (.xlarge) (full list of ML instance types available here). Further, depending on how compute intensive your pre-processing is, and your desired pre-processing completion time, you can opt for a compute optimized instance (c4, c5) to speed this up.


For the training job, specifically:

Using the Amazon SageMaker SDK, your training data will be loaded and distributed to the training cluster, allowing your training job to be completely separate from the instance your hosted notebook is running on.

Figuring out the ideal instance type for training will depend on whether your algorithm of choice/training job is memory, CPU, or IO bound. Since your dataset will likely be loaded onto your training cluster from S3, the instance you choose for your hosted notebook will have no bearing on the speed of your training job.


Broadly: When it comes to SageMaker notebooks, the best practice is to use your notebook as a "puppeteer" or orchestrator, that calls out to external services (AWS Glue or Amazon EMR for preprocessing, SageMaker for training, S3 for storage, etc). It is best to treat them as ephemeral forms of compute/storage for building and kicking off your experiment pipeline.

This will allow you to more closely pair compute, storage, and hosting resources/services with the demands for your workload, ultimately resulting in the best bang for your buck by not having you pay for latent or unused resources.


🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › machine learning environments offered by amazon sagemaker ai › amazon sagemaker notebook instances › create an amazon sagemaker notebook instance
Create an Amazon SageMaker notebook instance - Amazon SageMaker AI
To load a dataset into memory on the notebook instance for exploration or preprocessing, choose an instance type with enough RAM memory for your dataset. This requires an instance with at least 16 GB of memory (.xlarge or larger). If you plan to use the notebook for compute intensive preprocessing, we recommend you choose a compute-optimized instance such as a c4 or c5. A best practice when using a SageMaker notebook is to use the notebook instance to orchestrate other AWS services.
🌐
Medium
arindam-dey.medium.com › a-gentle-introduction-to-aws-sagemaker-part-i-7e4ef93a6ba4
A Gentle Introduction to AWS SageMaker - Part I | by Arindam Dey | Medium
July 2, 2022 - Use the File option to rename the notebook to something suitable. ... Fig 12. JupyterLab running on SageMaker Instance · Phew !! We now have an instance running SageMaker, which we will use to build a classification model. ... Let us summarize what we have done so far. We created a Sagemaker instance of type ml.t2.medium. We associated a lifecycle configuration named imblearn , so that instance always starts up with this package. ... Load dataset into an AWS S3 ( Simple Storage Service ) bucket.
🌐
GitHub
github.com › awsdocs › amazon-sagemaker-developer-guide › blob › master › doc_source › notebooks-available-instance-types.md
amazon-sagemaker-developer-guide/doc_source/notebooks-available-instance-types.md at master · awsdocs/amazon-sagemaker-developer-guide
For information about available Amazon SageMaker Notebook Instance types, see CreateNotebookInstance. Note For most use cases, you should use a ml.t3.medium. This is the default instance type for CPU-based SageMaker images, and is available as part of the AWS Free Tier.
Author   awsdocs
🌐
AWS
aws.amazon.com › about-aws › whats-new › 2024 › 04 › amazon-sagemaker-notebooks-p5-c6i-c7i-m6i-m7i-r6i-r7i-instance-types
Amazon SageMaker notebooks now support P5, C6i, C7i, M6i, M7i, R6i, and R7i instance types
We are pleased to announce general availability of Amazon EC2 P5, C6i, C7i, M6i, M7i, R6i, and R7i instances on SageMaker notebooks. Amazon EC2 M7i, R7i, and C7i instances are powered by custom 4th generation Intel Xeon Scalable processors and ...
Find elsewhere
🌐
Amazon Web Services
aws.amazon.com › machine learning › amazon sagemaker › pricing
SageMaker pricing - AWS
6 days ago - This includes a monthly AWS Free Tier for SageMaker Catalog, SageMaker notebooks, JupyterLab IDE, Amazon Q, metadata storage, and API requests. For Amazon SageMaker notebooks, AWS offers Free Tier useage for the first 2 months of 250 hours of sc.t3.medium instance on notebook instances.
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › amazon sagemaker api reference › actions › amazon sagemaker service › updatenotebookinstance
UpdateNotebookInstance - Amazon SageMaker
The Amazon Resource Name (ARN) of the IAM role that SageMaker AI can assume to access the notebook instance. For more information, see SageMaker AI Roles. To be able to pass this role to SageMaker AI, the caller of this API must have the iam:PassRole permission. ... Length Constraints: Minimum length of 20. Maximum length of 2048. Pattern: arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › machine learning environments offered by amazon sagemaker ai › amazon sagemaker geospatial capabilities › types of compute instances
Types of compute instances - Amazon SageMaker AI
SageMaker geospatial capabilities offer three types of compute instances. SageMaker Studio Classic geospatial notebook instances – SageMaker geospatial supports both CPU and GPU-based notebook instances in Studio Classic. Notebook instances are used to build, train, and deploy ML models.
🌐
AWS
docs.aws.amazon.com › aws cloudformation › template reference › amazon sagemaker ai › aws::sagemaker::notebookinstance
AWS::SageMaker::NotebookInstance - AWS CloudFormation
The AWS::SageMaker::NotebookInstance resource creates an Amazon SageMaker notebook instance. A notebook instance is a machine learning (ML) compute instance running on a Jupyter notebook. For more information, see Use Notebook Instances. To declare this entity in your CloudFormation template, use the following syntax: { "Type" : "AWS::SageMaker::NotebookInstance", "Properties" : { "AcceleratorTypes" : [ String, ...
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › amazon sagemaker api reference › actions › amazon sagemaker service › createnotebookinstance
CreateNotebookInstance - Amazon SageMaker
The IP address type for the notebook instance. Specify ipv4 for IPv4-only connectivity or dualstack for both IPv4 and IPv6 connectivity. When you specify dualstack, the subnet must support IPv6 CIDR blocks. If not specified, defaults to ipv4. ... The Amazon Resource Name (ARN) of a AWS Key Management Service key that SageMaker AI uses to encrypt data on the storage volume attached to your notebook instance.
🌐
ClassMethod
dev.classmethod.jp › articles › how-to-choose-the-right-amazon-sagemaker-instance-type
How to Choose the Right Amazon SageMaker Instance Type | DevelopersIO
Amazon SageMaker provides a broad choice of instance types tailored for various machine learning workloads, so it's critical to thoroughly consider your options to ensure you're picking the appropriate instance type for your use case. You can guarantee that your SageMaker tasks operate smoothly and effectively by doing so. AWSのGPU系EC2インスタンスをまとめてみた
🌐
Packtpub
subscription.packtpub.com › book › data › 9781800569003 › 12 › ch12lvl1sec88 › choosing-instance-types-in-amazon-sagemaker
Chapter 9: Amazon SageMaker Modeling | AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide
The instance type starts with ml. SageMaker offers instances of the following families: The t family: This is a burstable CPU family. With this family, you get a normal ratio of CPU and memory. This means that if you have a long-running training job, then you lose performance over time as you spend the CPU credits. If you have very small jobs, then they are cost-effective. For example, if you want a notebook ...
🌐
Pulumi
pulumi.com › registry › packages › aws › api-docs › sagemaker › notebookinstance
aws.sagemaker.NotebookInstance | Pulumi Registry
The name of ML compute instance type. ... The ARN of the IAM role to be used by the notebook instance which allows SageMaker AI to call other services on your behalf. ... An array of up to three Git repositories to associate with the notebook instance. These can be either the names of Git repositories stored as resources in your account, or the URL of Git repositories in AWS ...
Top answer
1 of 1
1
I'll address your questions for both SageMaker Studio JupyterLab and SageMaker Notebook Instances: A) SageMaker Studio JupyterLab: 1. SageMaker Studio is billed per second of usage, with a minimum of 1 minute. This means if you use Studio for just one minute, you'll be charged for that minute, not a full hour. The billing starts when you launch the SageMaker Studio application and ends when you log out or when an administrator ends your session. 2. In SageMaker Studio, each user gets their own dedicated workspace instance. If multiple data scientists are working in Studio, they would each have their own session and be billed separately. There isn't a concept of sharing the same Studio session among multiple users in the way you described. B) SageMaker Notebook Instance: 1. SageMaker Notebook Instances are also billed per second of usage, with a minimum of 1 minute. Like Studio, if you use a Notebook Instance for just one minute, you'll be charged for that minute, not a full hour. 2. For Notebook Instances, multiple users can technically share the same instance, but it's not designed for simultaneous use by multiple users. If two users are using the same Notebook Instance, you would still be charged based on the instance type and the total time it's running, not per user. So in your example with an ml.t3.16xlarge instance used for one hour, the charge would be $1 (assuming that's the correct rate), regardless of whether one or two users were using it. It's important to note that while sharing a Notebook Instance is possible, it's generally not recommended for collaborative work due to potential conflicts and security concerns. SageMaker Studio is better suited for multi-user environments, where each user gets their own workspace. In both cases, you only pay for the compute resources you use, and there are no additional charges for using the SageMaker Studio or Notebook Instance interfaces themselves. **Sources** Amazon SageMaker Studio pricing - Amazon SageMaker AI Machine Learning Service – Amazon SageMaker Pricing – AWS Community | Enhancing ML Efficiency with Amazon SageMaker