With a sagemaker.session.Session instance, you can describe training jobs:

import sagemaker


sagemaker_session = sagemaker.session.Session()
sagemaker_session.describe_training_job("Job...")
Answer from Eric Johnson on Stack Overflow
🌐
SageMaker
sagemaker.readthedocs.io › en › stable › overview.html
Using the SageMaker Python SDK — sagemaker 2.254.1 documentation
The VPC should be the same as that ... jobs. The usage is the same as above. ... The SageMaker Python SDK allows you to specify a name and a regular expression for metrics you want to track for training....
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › amazon sagemaker api reference › actions › amazon sagemaker service › createtrainingjob
CreateTrainingJob - Amazon SageMaker
You must grant this role the necessary permissions so that SageMaker can successfully complete model training. StoppingCondition - To help cap training costs, use MaxRuntimeInSeconds to set a time limit for training. Use MaxWaitTimeInSeconds to specify how long a managed spot training job has to complete.
🌐
AWS
docs.aws.amazon.com › amazon sagemaker › developer guide › model training › run your local code as a sagemaker training job
Run your local code as a SageMaker training job - Amazon SageMaker AI
@remote(**settings) def divide(x, y): return x / y · The SageMaker Python SDK will automatically translate your existing workspace environment and any associated data processing code and datasets into a SageMaker training job that runs on the SageMaker training platform.
🌐
Hugging Face
huggingface.co › docs › sagemaker › train
Run training on Amazon SageMaker
The Hugging Face extension for the SageMaker Python SDK means we can benefit from fully-managed EC2 spot instances. This can help you save up to 90% of training costs! Note: Unless your training job completes quickly, we recommend you use checkpointing with managed spot training.
🌐
GitHub
github.com › aws › sagemaker-training-toolkit
GitHub - aws/sagemaker-training-toolkit: Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Use the Docker image to start a training job using the SageMaker Python SDK.
Starred by 523 users
Forked by 137 users
Languages   Python 96.0% | C 3.2%
🌐
SageMaker
sagemaker.readthedocs.io › en › stable › api › training › smd_data_parallel_use_sm_pysdk.html
Launch a Distributed Training Job Using the SageMaker Python SDK — sagemaker 2.251.0 documentation
Your input data must be in an S3 bucket or in FSx in the AWS region that you will use to launch your training job. If you use the Jupyter notebooks provided, create a SageMaker notebook instance in the same region as the bucket that contains your input data. For more information about storing your training data, refer to the SageMaker Python SDK data inputs documentation.
Find elsewhere
🌐
AWS
awscli.amazonaws.com › v2 › documentation › api › latest › reference › sagemaker › create-training-job.html
create-training-job - sagemaker
RetryStrategy - The number of times to retry the job when the job fails due to an InternalServerError . For more information about SageMaker, see How It Works . ... create-training-job --training-job-name <value> [--hyper-parameters <value>] --algorithm-specification <value> --role-arn <value> [--input-data-config <value>] --output-data-config <value> --resource-config <value> [--vpc-config <value>] --stopping-condition <value> [--tags <value>] [--enable-network-isolation | --no-enable-network-isolation] [--enable-inter-container-traffic-encryption | --no-enable-inter-container-traffic-encrypt
🌐
Mission Control
missioncloud.com › blog › remote-training-amazon-sagemaker
Taking the next step: A data scientists introduction to remote training with Amazon Sagemaker
As you author your notebook, you’ll utilize the SageMaker SDK to define and manage your training job. It’s worth noting that your training notebook typically doesn’t contain the model and training code itself. Instead, you specify various parameters for your training job, such as the location of your training data, the instance types to be used, and the training container.
🌐
Medium
ymwdalex.medium.com › trigger-pre-built-framework-training-job-via-amazon-sagemaker-api-b6e49984e707
Trigger Pre-built Framework Training Job via Amazon SageMaker API | by Zhe Sun | Medium
December 16, 2021 - HyperParameters.sagemaker_program: the name of the entry point file · AlgorithmSpecification.TrainingImage: the Amazon ECR registry path of the pre-built framework container images. You can find the images URL here. Below is an example to trigger the same training job by using Python boto3 SDK.
🌐
SageMaker
sagemaker.readthedocs.io › en › v2.23.0 › overview.html
Using the SageMaker Python SDK — sagemaker 2.23.0 documentation
The VPC should be the same as that ... jobs. The usage is the same as above. ... The SageMaker Python SDK allows you to specify a name and a regular expression for metrics you want to track for training....
🌐
Amazon Web Services
boto3.amazonaws.com › v1 › documentation › api › latest › reference › services › sagemaker › client › create_training_job.html
create_training_job - Boto3 1.42.10 documentation
You must grant this role the necessary permissions so that SageMaker can successfully complete model training. StoppingCondition - To help cap training costs, use MaxRuntimeInSeconds to set a time limit for training. Use MaxWaitTimeInSeconds to specify how long a managed spot training job has to complete.
🌐
Readthedocs
sagemaker-examples.readthedocs.io › en › latest › sagemaker-debugger › build_your_own_container_with_debugger › debugger_byoc.html
Build a Custom Training Container and Debug Training Jobs with Amazon SageMaker Debugger — Amazon SageMaker Examples 1.0.0 documentation
It has been orchestrated with SageMaker Debugger hooks to allow saving tensors during training. These hooks have been instrumented to read from a JSON configuration that SageMaker puts in the training container. Configuration provided to the SageMaker python SDK when creating a job will be passed on to the hook.
🌐
SageMaker
sagemaker.readthedocs.io › en › v2.31.1 › overview.html
Using the SageMaker Python SDK — sagemaker 2.31.1 documentation
The VPC should be the same as that ... jobs. The usage is the same as above. ... The SageMaker Python SDK allows you to specify a name and a regular expression for metrics you want to track for training....
🌐
AWS Neuron
awsdocs-neuron.readthedocs-hosted.com › en › latest › general › devflows › training › sm-devflow › sm-training-devflow.html
Train your model on SageMaker - AWS Neuron SDK
The Amazon SageMaker Python SDK lets you launch training jobs in just a few lines of code with ease. As shown in the below diagram Amazon SageMaker launches Trn1 instances, copies both data and code onto the instance. It then runs the training script to generate model artifacts.