🌐
PyTorch
docs.pytorch.org › recipes › pytorch profiler
PyTorch Profiler — PyTorch Tutorials 2.12.0+cu130 documentation
July 20, 2022 - Profiler allows one to check which operators were called during the execution of a code range wrapped with a profiler context manager. If multiple profiler ranges are active at the same time (e.g. in parallel PyTorch threads), each profiling ...
Discussions

Weird Pytorch profiler output
Hello, I am profiling my training code and I’m struggling to understand the output. The different steps seem to be taking different amount of time and I am not sure why. A simplified version of my code would be: wi… More on discuss.pytorch.org
🌐 discuss.pytorch.org
3
0
August 14, 2024
Overhead using torch.profiler
Hi everybody, I have been working for a few months with pytorch and this is the first time I try to use torch.profiler since I’m trying to get information about CUDA time. However, I’m running a small test only with the cpu and I see that there is a very large overhead when running the ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
1
0
August 1, 2022
FIVE WAYS TO INCREASE MODEL PERF W/ PYTORCH PROFILER!
FIVE WAYS TO INCREASE MODEL PERF W/ PYTORCH PROFILER!preview.redd.it/z1dqg8... ... Pytorch is an open source machine learning framework with a focus on neural networks. ... Reddit Inc © 2022. More on reddit.com
🌐 r/pytorch
gpu in pytorch good resource for general guidelines/advice? I feel very lost with the tutorial afterthought-like treatment
Make sure that your device is actually cuda something if you don't see any gpu usage. There can be plenty of install problems that could cause cuda to not be available. Actually if you expect to always run this code on gpu, an assert would be fitting, the fallback on CPU pretty much means failing silently. As for implicit global gpu usage, there are plenty of discussion on the subject, as far as I know it can't be done (yet). The general consensus is that it's better for the user to be fully aware of what is going on and where/when data is moved. More on reddit.com
🌐 r/pytorch
6
4
July 24, 2019
🌐
Hugging Face
huggingface.co › blog › torch-profiler
Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler
May 29, 2026 - No prerequisites apart from basic PyTorch. Treat this as a leisurely read with some "Aha!" moments. The structure of the post is intentionally question-led: we open a trace, ask "wait, why is that happening?", and chase the answer until something clicks. By the end you should know: how to set up torch.profiler and what it actually hands back,
🌐
PyTorch
pytorch.org › blog › introducing-pytorch-profiler-the-new-and-improved-performance-tool
Introducing PyTorch Profiler – the new and improved performance tool – PyTorch
March 25, 2021 - Developed as part of a collaboration between Microsoft and Facebook, the PyTorch Profiler is an open-source tool that enables accurate and efficient performance analysis and troubleshooting for large-scale deep learning models.
🌐
Medium
medium.com › @zachriane › how-pytorch-profiler-saved-me-from-insanity-45f35296e736
How PyTorch Profiler Saved Me from Insanity | by Zach Riane Machacon | Medium
September 3, 2024 - PyTorch Profiler is a powerful tool designed to help developers analyze and optimize the performance of their PyTorch models.
🌐
GitHub
github.com › Quentin-Anthony › torch-profiling-tutorial
GitHub - Quentin-Anthony/torch-profiling-tutorial · GitHub
This tutorial seeks to teach users about using profiling tools such as nvsys, rocprof, and the torch profiler in a simple transformers training loop. We will cover how to use the PyTorch profiler to identify performance bottlenecks, understand GPU efficiency metrics, and perform initial optimizations.
Starred by 579 users
Forked by 32 users
Languages   Python
🌐
Habana
docs.habana.ai › en › latest › Profiling › Profiling_with_PyTorch.html
Profiling with PyTorch — Gaudi Documentation 1.23.0 documentation
1import torch 2import habana_frameworks.torch.core as htcore 3 4activities = [torch.profiler.ProfilerActivity.CPU] 5 6#CUDA: 7#device = torch.device('cuda:0') 8#activities.append(torch.profiler.ProfilerActivity.CUDA) 9 10#HPU: 11device = torch.device('hpu') 12activities.append(torch.profiler.ProfilerActivity.HPU) 13 14with torch.profiler.profile( 15 schedule=torch.profiler.schedule(wait=0, warmup=20, active=5, repeat=1), 16 activities=activities, 17 on_trace_ready=torch.profiler.tensorboard_trace_handler('./profile_logs')) as profiler: 18 for i in range(100): 19 input = torch.tensor([[i]*10]*10, dtype=torch.float32, device=device) 20 result = torch.matmul(input, input) 21 result.to('cpu') 22 htcore.mark_step() 23 profiler.step()
Find elsewhere
🌐
Lightning AI
lightning.ai › docs › pytorch › 1.6.3 › advanced › profiler.html
Profiling — PyTorch Lightning 1.6.3 documentation
PyTorch includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU.
🌐
Lightning AI
lightning.ai › docs › pytorch › stable › api › lightning.pytorch.profilers.PyTorchProfiler.html
PyTorchProfiler — PyTorch Lightning 2.6.1 documentation
This profiler uses PyTorch’s Autograd Profiler and lets you inspect the cost of different operators inside your model - both on the CPU and GPU.
🌐
Anl
alcf.anl.gov › sites › default › files › 2025-05 › PyTorchProfiler_HZheng.pdf pdf
Pytorch profiler for AI Huihuo Zheng May 7, 2025
May 7, 2025 - Estimated FLOPs: For certain common operators like matrix multiplication and 2D convolution, the profiler · can estimate the number of floating-point operations (FLOPs) performed if with_flops=True is set.9 This can · help in assessing the computational intensity of different parts of the model. • · Execution Timeline (Trace View): Perhaps the most powerful feature for detailed analysis is the ability to · export a chronological trace of events. Argonne Leadership Computing Facility · 4 · Example of using PyTorch Profiler on Aurora ·
🌐
APXML
apxml.com › courses › advanced-pytorch › chapter-4-deployment-performance-optimization › pytorch-profiler
Using the PyTorch Profiler for Bottleneck Analysis
The torch.profiler API provides a comprehensive view of model execution by tracking several important metrics: Operator Execution Time: Measures the duration spent executing individual PyTorch operators on both the CPU and the GPU (CUDA kernels).
🌐
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
Fig. 19.5 Profiling Loop to Optimize Code# PyTorch profiler is a tool that facilitates collecting different performance metrics at runtime to better understand what happens behind the scene. For example, during training of a ML model, torch profiler can be used for understanding the most expensive ...
🌐
GitHub
github.com › pytorch › pytorch › blob › main › torch › profiler › profiler.py
pytorch/torch/profiler/profiler.py at main · pytorch/pytorch
profile_memory (bool): track tensor memory allocation/deallocation (see ``export_memory_timeline`` ... execution_trace_observer (ExecutionTraceObserver) : A PyTorch Execution Trace Observer object.
Author   pytorch
🌐
PyTorch Forums
discuss.pytorch.org › t › weird-pytorch-profiler-output › 208078
Weird Pytorch profiler output - PyTorch Forums
August 14, 2024 - Hello, I am profiling my training code and I’m struggling to understand the output. The different steps seem to be taking different amount of time and I am not sure why. A simplified version of my code would be: with profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], ...) as p: for x, y in training_dataloader: p.step() out = model(x) loss = loss_fn(out,y) loss.backward() optim.step() optim.zero_grad() Scheduler is...
🌐
ROCm Blogs
rocm.blogs.amd.com › artificial-intelligence › torch_profiler › README.html
Unveiling performance insights with PyTorch Profiler on an AMD GPU — ROCm Blogs
May 29, 2024 - PyTorch Profiler is a performance analysis tool that enables developers to examine various aspects of model training and inference in PyTorch. It allows users to collect and analyze detailed profiling information, including GPU/CPU utilization, ...
🌐
PyTorch Forums
discuss.pytorch.org › t › overhead-using-torch-profiler › 158060
Overhead using torch.profiler - PyTorch Forums
August 1, 2022 - Hi everybody, I have been working for a few months with pytorch and this is the first time I try to use torch.profiler since I’m trying to get information about CUDA time. However, I’m running a small test only with the cpu and I see that there is a very large overhead when running the ...
🌐
Hugging Face
huggingface.co › docs › accelerate › usage_guides › profiler
Profiler · Hugging Face
PyTorch Profiler is a powerful tool for analyzing the performance of your models.