🌐
NVIDIA Developer
developer.nvidia.com › nvidia-visual-profiler
NVIDIA Visual Profiler | NVIDIA Developer
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA ...
Log in
Welcome · Connect with millions of like-minded developers, researchers, and innovators · Accelerate your apps with the latest tools and 150+ SDKs · Receive technical training and expert help · Log in or sign up for an NVIDIA account · You need to sign in or sign up before continuing
CUDA GPU Compute Capability
Compute capability defines the hardware features and supported instructions for each NVIDIA GPU architecture.
Free Tools and Training
Get access to SDKs, trainings, and connect with developers.
CUDA Toolkit 13.3 Update 1 Downloads
Get the latest feature updates to NVIDIA's compute stack, including compatibility support for NVIDIA Open GPU Kernel Modules and lazy loading support.
🌐
GitHub
github.com › JeremyMain › GPUProfiler
GitHub - JeremyMain/GPUProfiler: GPUProfiler - Understand your application and workflow resource requirements · GitHub
GPUProfiler is not a source code profiler but a resource and utilization profile that can provide a snapshot of a system and select resource utilization metrics over a period of time.
Starred by 313 users
Forked by 20 users
🌐
NERSC
docs.nersc.gov › tools › performance › nvidiaproftools
NVIDIA Profiling Tools - NERSC Documentation
NVIDIA Data Center GPU Manager (DCGM) is a light weight tool to measure and monitor GPU utilization and comprehensive diagnostics of GPU nodes on a cluster. NERSC will be using this tool to measure application utilization and monitor the status of the machine.
🌐
NVIDIA
docs.nvidia.com › cuda › profiler-users-guide
1. Preparing An Application For Profiling — Profiler 12.9 documentation
May 31, 2025 - This document describes NVIDIA profiling tools that enable you to understand and optimize the performance of your CUDA, OpenACC or OpenMP applications. The Visual Profiler is a graphical profiling tool that displays a timeline of your application’s CPU and GPU activity, and that includes ...
🌐
NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. ... Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.
🌐
NVIDIA Developer
developer.nvidia.com › performance-analysis-tools
Nsight Developer Tools | NVIDIA Developer
Nsight VSCE lets you build and debug GPU kernels and native CPU code, as well as inspect the state of the GPU and memory. Learn More · Nsight Tools JupyterLab Extension allows profiling Python and other supported languages directly in JupyterLab using Nsight Systems and Nsight Compute.
🌐
NVIDIA Developer
developer.nvidia.com › cuda-profiling-tools-interface
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer
GPU workload trace for the activities happening on the GPU, which includes kernel executions, memory operations (e.g., Host-to-Device memory copies) and memset operations.
🌐
NVIDIA
docs.nvidia.com › cuda › pdf › CUDA_Profiler_Users_Guide.pdf pdf
Profiler Release 12.9 NVIDIA Corporation May 31, 2025
The user manual for NVIDIA profiling tools for optimizing performance of CUDA applications. ... tool that displays a timeline of your application’s CPU and GPU activity, and that includes an auto-
🌐
NVIDIA
docs.nvidia.com › nsight-compute › ProfilingGuide › index.html
2. Profiling Guide — NsightCompute 13.3 documentation
April 14, 2026 - The tool inserts its measurement ... the profiler to intercept communication with the CUDA user-mode driver. In addition, when a kernel launch is detected, the libraries can collect the requested performance metrics from the GPU. The results are then transferred back to the frontend. ... Collection of performance metrics is the key feature of NVIDIA Nsight ...
Find elsewhere
🌐
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
Alternatively, Nvidia Nsight Systems and Nsight Compute combination can be used to analyze and visualize the insight of application’s algorithms. Nsight Systems (High-Level Profiling) checks our code overall to see if there are any problems (e.g., with host and device communication or GPU kernels) identifying non-performant/top kernel(s) and then Nsight Compute (Kernel-Specific Profiling) dives into the details of the identified kernel(s) to help with optimizing, debugging and fixing the issue.
🌐
GitHub
github.com › Orbmu2k › nvidiaProfileInspector › releases
Releases · Orbmu2k/nvidiaProfileInspector
2 days ago - Added NVIDIA GPU startup check | @Orbmu2k · 'Profile selected' snackbar notification on drop | @Orbmu2k · Add profile to modified profile list when changes applied | @emoose · Fix favorite toggling invalidating setting item values #414 | @Orbmu2k · Prevent unintended switch to global profile in specific cases #415| @Orbmu2k ·
Author   Orbmu2k
🌐
NVIDIA Developer
developer.nvidia.com › nsight-graphics
NVIDIA Nsight Graphics | NVIDIA Developer
NVIDIA Nsight Systems is a system-wide ... and tune to scale efficiently across any quantity or size of CPUs and GPUs. ... NVIDIA Nsight Pef SDK is a graphics profiling toolbox that enables you to collect GPU performance metrics ...
🌐
NVIDIA
resources.nvidia.com › en-us-nsight-developer-tools › cuda-tutorials
CUDA Tutorials I Profiling and Debugging Applications
This video will get you started with the ecosystem of tools that CUDA developers equip themselves with to build applications at any scale.
🌐
NVIDIA Developer
developer.nvidia.com › nsight-compute
Nsight Compute | NVIDIA Developer
Nsight Compute’s report pages provide insight into all aspects of a profile. The details page offers metrics that address overall GPU utilization, how performance is connected to various hardware concepts, and concludes with recommended optimization actions.
🌐
Reddit
reddit.com › r/cuda › using nvidia tools for profiling
r/CUDA on Reddit: Using Nvidia tools for profiling
March 7, 2025 -

I've written a guide on using Nvidia tools (Nsight systems, Nsight Compute,..) from zero to hero, here is content:

Fix-Bug

Chapter01: Introduction to Nsight Systems - Nsight Compute

Chapter02: Cuda toolkit - Cuda driver

Chapter03: NVIDIA Compute Sanitizer Part 1

Chapter04: NVIDIA Compute Sanitizer Part 2

Chapter05: Global Memory Coalescing

Chapter06: Warp Scheduler

Chapter07: Occupancy Part 1

Chapter08: Occupancy Part 2

Chapter09: Bandwidth - Throughput - Latency

Chapter10: Compute Bound - Memory Bound

🌐
Nvidia
run-ai-docs.nvidia.com › self-hosted › platform-management › monitor-performance › gpu-profiling-metrics
GPU Profiling Metrics | Self-hosted | Run:ai Documentation
April 16, 2026 - This guide describes how to enable advanced GPU profiling metrics from NVIDIA Data Center GPU Manager (DCGM).
🌐
Nvidia Profile Inspector
nvidiaprofileinspector.com › nvidia profile inspector download
Download Nvidia Profile Inspector - Optimize Graphics & FPS
1 week ago - Download the latest Nvidia Profile Inspector to manage GPU settings, clock speeds, and fan profiles. A free, lightweight utility guide.
🌐
NVIDIA Developer
developer.nvidia.com › blog › migrating-from-range-profiler-to-gpu-trace-in-nsight-graphics
Migrating from Range Profiler to GPU Trace in Nsight Graphics | NVIDIA Technical Blog
September 4, 2024 - The Nsight Graphics GPU Trace Profiler activity provides the same or better levels of information as the Range Profiler. In most cases, metrics are displayed over time, rather than as a single number, revealing the real-time performance characteristics of concurrent GPU workloads. NVIDIA continues to develop and improve GPU Trace, helping you to extract maximum performance on each new powerful architecture and programming model.
🌐
Medium
medium.com › @shaginhekvs › from-zero-to-hero-in-gpu-performance-profiling-optimization-e03da271ff18
From Zero to Hero in GPU performance profiling & optimization | by Keshav Singh | Medium
May 5, 2026 - The goal: profile vLLM serving the Qwen/Qwen2.5–1.5B-Instruct model on an NVIDIA L4 server with a load of cline client on my personal laptop. I was trying out cline with my own server hosted via vLLM, and performance was not as responsive as I'm used to with the API's of openAI, xGrok etc. I would have expected a small 1.5B model on an L4 GPU server to be atleast similarly responsive to these APIs.
🌐
Readthedocs
gpuhackshef.readthedocs.io › en › latest › tools › nvidia-profiling-tools.html
NVIDIA Profiling Tools — GPUHackSheffield documentation
The NVIDIA Visual Profiler is the legacy profiling tool, with full support for GPUs up to pascal (SM < 75), partial support for Turing (SM 75 and no support for Ampere (SM80).