aws sagemaker notebook instance types

Sagemaker Notebook Instance Type Recommendation

stackoverflow.com › questions › 58755708 › sagemaker-notebook-instance-type-recommendation

For choosing a SageMaker hosted notebook type:

Do you plan to do all of your preprocessing of your data in-memory on the notebook, or do you plan to orchestrate ETL with external services?

If you're planning to load the dataset into memory on the notebook instance for exploration/preprocessing, the primary bottleneck here would be ensuring the instance has enough memory for your dataset. This would require at least the 16gb types (.xlarge) (full list of ML instance types available here). Further, depending on how compute intensive your pre-processing is, and your desired pre-processing completion time, you can opt for a compute optimized instance (c4, c5) to speed this up.

For the training job, specifically:

Using the Amazon SageMaker SDK, your training data will be loaded and distributed to the training cluster, allowing your training job to be completely separate from the instance your hosted notebook is running on.

Figuring out the ideal instance type for training will depend on whether your algorithm of choice/training job is memory, CPU, or IO bound. Since your dataset will likely be loaded onto your training cluster from S3, the instance you choose for your hosted notebook will have no bearing on the speed of your training job.

Broadly: When it comes to SageMaker notebooks, the best practice is to use your notebook as a "puppeteer" or orchestrator, that calls out to external services (AWS Glue or Amazon EMR for preprocessing, SageMaker for training, S3 for storage, etc). It is best to treat them as ephemeral forms of compute/storage for building and kicking off your experiment pipeline.

This will allow you to more closely pair compute, storage, and hosting resources/services with the demands for your workload, ultimately resulting in the best bang for your buck by not having you pay for latent or unused resources.

Answer from Nick Walsh on Stack Overflow

AWS

docs.aws.amazon.com › amazon sagemaker › developer guide › machine learning environments offered by amazon sagemaker ai › amazon sagemaker studio › amazon sagemaker studio classic › use amazon sagemaker studio classic notebooks › available resources for amazon sagemaker studio classic notebooks › instance types available for use with amazon sagemaker studio classic notebooks

Instance Types Available for Use With Amazon SageMaker Studio Classic Notebooks - Amazon SageMaker AI

For information about available Amazon SageMaker Notebook Instance types, see CreateNotebookInstance. For most use cases, you should use a ml.t3.medium. This is the default instance type for CPU-based SageMaker images, and is available as part of the AWS Free Tier

AWS

docs.aws.amazon.com › amazon sagemaker › developer guide › machine learning environments offered by amazon sagemaker ai › amazon sagemaker notebook instances

Amazon SageMaker notebook instances - Amazon SageMaker AI

The SageMaker notebook instances help create the environment by initiating Jupyter servers on Amazon Elastic Compute Cloud (Amazon EC2) and providing preconfigured kernels with the following packages: the Amazon SageMaker Python SDK, AWS SDK for Python (Boto3), AWS Command Line Interface (AWS ...

Videos