🌐
NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. ... Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.
🌐
NERSC
docs.nersc.gov › tools › performance › nvidiaproftools
NVIDIA Profiling Tools - NERSC Documentation
NVIDIA Data Center GPU Manager (DCGM) is a light weight tool to measure and monitor GPU utilization and comprehensive diagnostics of GPU nodes on a cluster. NERSC will be using this tool to measure application utilization and monitor the status of the machine.
🌐
AMD GPUOpen
gpuopen.com › rgp
AMD Radeon™ GPU Profiler - AMD GPUOpen
We've released an updated AMD Radeon™ Developer Tool Suite, including enhanced versions of tools such as the Radeon GPU Detective, Profiler, Raytracing Analyzer, Memory Visualizer, GPU Analyzer, and Developer Panel, now with expanded GPU support.
🌐
NVIDIA Developer
developer.nvidia.com › performance-analysis-tools
Nsight Developer Tools | NVIDIA Developer
Nsight VSCE lets you build and debug GPU kernels and native CPU code, as well as inspect the state of the GPU and memory. Learn More · Nsight Tools JupyterLab Extension allows profiling Python and other supported languages directly in JupyterLab using Nsight Systems and Nsight Compute.
🌐
NVIDIA Developer
developer.nvidia.com › nvidia-visual-profiler
NVIDIA Visual Profiler | NVIDIA Developer
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA ...
🌐
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
GPU profiling helps to get some insights of GPUs behavior to identify and fix performance bottlenecks. The following steps are performed iteratively until achieving the desired performance: ... Fix the bottlenecks and optimize the code. Fig. 19.5 Profiling Loop to Optimize Code# PyTorch profiler is ...
🌐
GitHub
github.com › intel › pti-gpu
GitHub - intel/pti-gpu: Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily · GitHub
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily - intel/pti-gpu
Starred by 267 users
Forked by 69 users
Languages   C++ 82.5% | Python 10.1% | CMake 3.6% | C 3.4% | Dockerfile 0.2% | Makefile 0.1%
🌐
Medium
aditya-sunjava.medium.com › optimizing-gpu-performance-a-comprehensive-guide-to-profiling-tools-and-techniques-90cac941a088
Optimizing GPU Performance: A Comprehensive Guide to Profiling Tools and Techniques | by Aditya Bhuyan | Medium
September 17, 2025 - nvprof: A command-line profiling tool that provides detailed statistics on CUDA kernel execution, memory transfers, and API calls. ... AMD Radeon Developer Tool Suite (GPU PerfAPI and GPU PerfStudio): A set of profiling tools for AMD GPUs, including GPU PerfAPI for low-level performance counter access and GPU PerfStudio for a graphical profiling interface.
🌐
Reddit
reddit.com › r/cuda › using nvidia tools for profiling
r/CUDA on Reddit: Using Nvidia tools for profiling
March 7, 2025 -

I've written a guide on using Nvidia tools (Nsight systems, Nsight Compute,..) from zero to hero, here is content:

Fix-Bug

Chapter01: Introduction to Nsight Systems - Nsight Compute

Chapter02: Cuda toolkit - Cuda driver

Chapter03: NVIDIA Compute Sanitizer Part 1

Chapter04: NVIDIA Compute Sanitizer Part 2

Chapter05: Global Memory Coalescing

Chapter06: Warp Scheduler

Chapter07: Occupancy Part 1

Chapter08: Occupancy Part 2

Chapter09: Bandwidth - Throughput - Latency

Chapter10: Compute Bound - Memory Bound

Find elsewhere
🌐
GitHub
github.com › JeremyMain › GPUProfiler
GitHub - JeremyMain/GPUProfiler: GPUProfiler - Understand your application and workflow resource requirements · GitHub
GPUProfiler was created to accelerate analysis of resource utilization within physical environments to allow for better resource sizing for virtual GPU environments and troubleshoot performance issues. I needed a small tool to understand existing system configuration and performance metrics that impact the sizing decision making process.
Starred by 313 users
Forked by 20 users
🌐
eunomia
eunomia.dev › home › blog › 2025 › 04 › 21 › gpu profiling under the hood an implementation focused survey of modern accelerator tracing tools
GPU Profiling Under the Hood: An Implementation-Focused Survey of Modern Accelerator Tracing Tools | eunomia
July 11, 2025 - In practice, this means it relies on perf_event_open for minimal-overhead sampling, requiring an appropriate perf_event_paranoid setting on Linux. For GPU tracing, Nsight Systems interfaces with NVIDIA’s CUDA Profiling Tools Interface (CUPTI) to receive callback “activity” records whenever GPU operations occur (kernel launches, memory copies, etc.).
🌐
Microsoft Learn
learn.microsoft.com › en-us › visualstudio › profiling › gpu-usage
Use the GPU Usage tool in the Performance Profiler - Visual Studio (Windows) | Microsoft Learn
October 30, 2025 - Use the GPU Usage tool in the Performance Profiler to better understand the high-level hardware usage of your Direct3D app. It helps you see whether the performance of your app is CPU-bound or GPU-bound, and gain insight into how you can use ...
🌐
Do more with less.
polarsignals.com › blog › posts › 2025 › 04 › 01 › introducing-continuous-gpu-profiling
Maximize GPU Efficiency: Visualize, Analyze, and Optimize with Precision
April 1, 2025 - Polar Signals provides always-on, continuous profiling of your GPU usage, capturing transient issues and long-term trends without needing manual intervention for every run. Complementary to Deep Dive Tools: Tools like NVIDIA Nsight are powerful ...
🌐
NVIDIA Developer
developer.nvidia.com › cuda-profiling-tools-interface
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer
Using these CUPTI APIs, independent software developers can create profiling tools that provide low and deterministic profiling overhead on the target system, while giving insight into the CPU and GPU behavior of CUDA applications.
🌐
ROCm Blogs
rocm.blogs.amd.com › software-tools-optimization › profilers › README.html
Introduction to profiling tools for AMD hardware - ROCm™ Blogs
April 10, 2026 - We begin by identifying the architecture and operating systems supported by each of the profiling tools provided by AMD. Almost all the tools in Table 1 support Linux® distros and with the gaining popularity of Instinct™ GPUs, every tool has some capability to profile codes running on CDNA™ architecture.
🌐
Eunomia
eunomia.dev › others › cuda-tutorial › 08-profiling-tracing
CUDA GPU Profiling and Tracing | eunomia
May 25, 2025 - Nsight Compute is an interactive kernel profiler for CUDA applications: Detailed kernel metrics: SM utilization, memory throughput, instruction mix · Guided analysis: Provides optimization recommendations · Roofline analysis: Shows performance relative to hardware limits · Kernel comparison: Compare kernels across runs or hardware platforms · Legacy tools (deprecated but still useful for older CUDA versions):
🌐
Manning Publications
livebook.manning.com › book › parallel-and-high-performance-computing › chapter-13 › v-9
13 GPU profiling and tools · Parallel and High Performance Computing livebook
July 26, 2021 - Available profiling tools for the GPU · A sample workflow for using these tools on a shallow water simulation application · An understanding of what the results of GPU profiling tool return
🌐
DEV Community
dev.to › adityabhuyan › optimizing-gpu-performance-a-comprehensive-guide-to-profiling-tools-and-techniques-1k20
Optimizing GPU Performance: A Comprehensive Guide to Profiling Tools and Techniques - DEV Community
September 17, 2025 - Intel GPA (Graphics Performance Analyzers): A suite of tools for analyzing and optimizing graphics performance on Intel GPUs. ... APEX (AMD Performance Experiments): An open-source, cross-platform profiling tool that supports multiple GPU vendors.
🌐
GitHub
github.com › NVIDIA › PyProf
GitHub - NVIDIA/PyProf: A GPU performance profiling tool for PyTorch models · GitHub
June 30, 2021 - PyProf is a tool that profiles and analyzes the GPU performance of PyTorch models.
Starred by 511 users
Forked by 50 users
Languages   Python 95.8% | Shell 3.6% | Dockerfile 0.6%
🌐
NVIDIA
resources.nvidia.com › en-us-nsight-developer-tools › cuda-tutorials
CUDA Tutorials I Profiling and Debugging Applications
This video will get you started with the ecosystem of tools that CUDA developers equip themselves with to build applications at any scale.