NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. ... Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.
NERSC
docs.nersc.gov › tools › performance › nvidiaproftools
NVIDIA Profiling Tools - NERSC Documentation
NVIDIA Data Center GPU Manager (DCGM) is a light weight tool to measure and monitor GPU utilization and comprehensive diagnostics of GPU nodes on a cluster. NERSC will be using this tool to measure application utilization and monitor the status of the machine.
Videos
03:58
Python Profiling: NVIDIA Nsight Tools Feature Spotlight - YouTube
02:07:16
Lecture 44: NVIDIA Profiling - YouTube
01:01:19
Mastering Nvidia Nsight GPU Profiling - YouTube
11:31
Continuous Profiling for GPUs — Matthias Loibl, Polar Signals ...
HPC Assistant: GPU Profiling
12:58
GPU Profiling and Debugging with Sokatoa Built on Theia - YouTube
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
GPU profiling helps to get some insights of GPUs behavior to identify and fix performance bottlenecks. The following steps are performed iteratively until achieving the desired performance: ... Fix the bottlenecks and optimize the code. Fig. 19.5 Profiling Loop to Optimize Code# PyTorch profiler is ...
GitHub
github.com › intel › pti-gpu
GitHub - intel/pti-gpu: Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily · GitHub
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily - intel/pti-gpu
Starred by 267 users
Forked by 69 users
Languages C++ 82.5% | Python 10.1% | CMake 3.6% | C 3.4% | Dockerfile 0.2% | Makefile 0.1%
Medium
aditya-sunjava.medium.com › optimizing-gpu-performance-a-comprehensive-guide-to-profiling-tools-and-techniques-90cac941a088
Optimizing GPU Performance: A Comprehensive Guide to Profiling Tools and Techniques | by Aditya Bhuyan | Medium
September 17, 2025 - nvprof: A command-line profiling tool that provides detailed statistics on CUDA kernel execution, memory transfers, and API calls. ... AMD Radeon Developer Tool Suite (GPU PerfAPI and GPU PerfStudio): A set of profiling tools for AMD GPUs, including GPU PerfAPI for low-level performance counter access and GPU PerfStudio for a graphical profiling interface.
Reddit
reddit.com › r/cuda › using nvidia tools for profiling
r/CUDA on Reddit: Using Nvidia tools for profiling
March 7, 2025 -
I've written a guide on using Nvidia tools (Nsight systems, Nsight Compute,..) from zero to hero, here is content:
Fix-Bug
Chapter01: Introduction to Nsight Systems - Nsight Compute
Chapter02: Cuda toolkit - Cuda driver
Chapter03: NVIDIA Compute Sanitizer Part 1
Chapter04: NVIDIA Compute Sanitizer Part 2
Chapter05: Global Memory Coalescing
Chapter06: Warp Scheduler
Chapter07: Occupancy Part 1
Chapter08: Occupancy Part 2
Chapter09: Bandwidth - Throughput - Latency
Chapter10: Compute Bound - Memory Bound
GitHub
github.com › JeremyMain › GPUProfiler
GitHub - JeremyMain/GPUProfiler: GPUProfiler - Understand your application and workflow resource requirements · GitHub
GPUProfiler was created to accelerate analysis of resource utilization within physical environments to allow for better resource sizing for virtual GPU environments and troubleshoot performance issues. I needed a small tool to understand existing system configuration and performance metrics that impact the sizing decision making process.
Starred by 313 users
Forked by 20 users
eunomia
eunomia.dev › home › blog › 2025 › 04 › 21 › gpu profiling under the hood an implementation focused survey of modern accelerator tracing tools
GPU Profiling Under the Hood: An Implementation-Focused Survey of Modern Accelerator Tracing Tools | eunomia
July 11, 2025 - In practice, this means it relies on perf_event_open for minimal-overhead sampling, requiring an appropriate perf_event_paranoid setting on Linux. For GPU tracing, Nsight Systems interfaces with NVIDIA’s CUDA Profiling Tools Interface (CUPTI) to receive callback “activity” records whenever GPU operations occur (kernel launches, memory copies, etc.).
NVIDIA Developer
developer.nvidia.com › cuda-profiling-tools-interface
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer
Using these CUPTI APIs, independent software developers can create profiling tools that provide low and deterministic profiling overhead on the target system, while giving insight into the CPU and GPU behavior of CUDA applications.
ROCm Blogs
rocm.blogs.amd.com › software-tools-optimization › profilers › README.html
Introduction to profiling tools for AMD hardware - ROCm™ Blogs
April 10, 2026 - We begin by identifying the architecture and operating systems supported by each of the profiling tools provided by AMD. Almost all the tools in Table 1 support Linux® distros and with the gaining popularity of Instinct™ GPUs, every tool has some capability to profile codes running on CDNA™ architecture.
Eunomia
eunomia.dev › others › cuda-tutorial › 08-profiling-tracing
CUDA GPU Profiling and Tracing | eunomia
May 25, 2025 - Nsight Compute is an interactive kernel profiler for CUDA applications: Detailed kernel metrics: SM utilization, memory throughput, instruction mix · Guided analysis: Provides optimization recommendations · Roofline analysis: Shows performance relative to hardware limits · Kernel comparison: Compare kernels across runs or hardware platforms · Legacy tools (deprecated but still useful for older CUDA versions):
GitHub
github.com › NVIDIA › PyProf
GitHub - NVIDIA/PyProf: A GPU performance profiling tool for PyTorch models · GitHub
Starred by 511 users
Forked by 50 users
Languages Python 95.8% | Shell 3.6% | Dockerfile 0.6%