NVIDIA Developer
developer.nvidia.com › nvidia-visual-profiler
NVIDIA Visual Profiler | NVIDIA Developer
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA ...
Log in
Welcome · Connect with millions of like-minded developers, researchers, and innovators · Accelerate your apps with the latest tools and 150+ SDKs · Receive technical training and expert help · Log in or sign up for an NVIDIA account · You need to sign in or sign up before continuing
CUDA GPU Compute Capability
Compute capability defines the hardware features and supported instructions for each NVIDIA GPU architecture.
Free Tools and Training
Get access to SDKs, trainings, and connect with developers.
CUDA Toolkit 13.3 Downloads
Get the latest feature updates to NVIDIA's compute stack, including compatibility support for NVIDIA Open GPU Kernel Modules and lazy loading support.
GitHub
github.com › JeremyMain › GPUProfiler
GitHub - JeremyMain/GPUProfiler: GPUProfiler - Understand your application and workflow resource requirements · GitHub
GPUProfiler is not a source code profiler but a resource and utilization profile that can provide a snapshot of a system and select resource utilization metrics over a period of time.
Starred by 313 users
Forked by 20 users
Basic GPU profiling tool?
Dear Experts, I’m looking for a tool to do some very rough profiling of an OpenGL application on a Jetson Nano or NX. Really I’d just like to see something like “vertex shader 5%, fragment shader 15%, idle 80%”, just to give me some idea of utilisation. More on forums.developer.nvidia.com
Nvidia Profiling support - Technical Help - DeepTalk - Deep Learning Community
Hi, Does Lambda currently have any support or plan to offer support for GPU profiling via nsight compute? I’ve seen a past question about this ( Running Nvidia NSight - permissions issue ). My understanding is that this is currently unsupported because NVIDIA GPU performance counters aren’t ... More on deeptalk.lambda.ai
Using Nvidia tools for profiling
This is fantastic! Profiling can be such a pain, especially for newcomers, so your guide is going to be a real lifesaver for a lot of us. I love how you broke it down into manageable chapters – really makes it easier to digest. The sections on memory coalescing and understanding occupancy are super helpful. Have you had any feedback from users yet? I'm definitely diving into this; can't wait to see how much performance I can squeeze out of my kernels using the insights from your guide! Keep up the great work! More on reddit.com
profiling - Best CUDA profiler - Software Recommendations Stack Exchange
Nvidia Visual Profiler (nvvp) to view a profile saved by nvprof. As near as I can tell this only gives the timeline of when kernels launched/completed, and some other coarse information about what the GPU is up to (PCIe data transfers, etc.). More on softwarerecs.stackexchange.com
Videos
01:01:19
Mastering Nvidia Nsight GPU Profiling - YouTube
03:58
Python Profiling: NVIDIA Nsight Tools Feature Spotlight - YouTube
10:31
CUDA Tutorials I Profiling and Debugging Applications - YouTube
Intro to NVIDIA Nsight Compute | CUDA Developer Tools
21:23
Infrastructure-wide profiling of NVIDIA CUDA | Ubuntu Summit 25.10 ...
NERSC
docs.nersc.gov › tools › performance › nvidiaproftools
NVIDIA Profiling Tools - NERSC Documentation
NVIDIA Data Center GPU Manager (DCGM) is a light weight tool to measure and monitor GPU utilization and comprehensive diagnostics of GPU nodes on a cluster. NERSC will be using this tool to measure application utilization and monitor the status of the machine.
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
Alternatively, Nvidia Nsight Systems and Nsight Compute combination can be used to analyze and visualize the insight of application’s algorithms. Nsight Systems (High-Level Profiling) checks our code overall to see if there are any problems (e.g., with host and device communication or GPU kernels) identifying non-performant/top kernel(s) and then Nsight Compute (Kernel-Specific Profiling) dives into the details of the identified kernel(s) to help with optimizing, debugging and fixing the issue.
DeepTalk
deeptalk.lambda.ai › technical help
Nvidia Profiling support - Technical Help - DeepTalk - Deep Learning Community
September 29, 2025 - Hi, Does Lambda currently have any support or plan to offer support for GPU profiling via nsight compute? I’ve seen a past question about this ( Running Nvidia NSight - permissions issue ). My understanding is that this is currently unsupported because NVIDIA GPU performance counters aren’t ...
Medium
medium.com › @shaginhekvs › from-zero-to-hero-in-gpu-performance-profiling-optimization-e03da271ff18
From Zero to Hero in GPU performance profiling & optimization | by Keshav Singh | Medium
May 5, 2026 - The goal: profile vLLM serving the Qwen/Qwen2.5–1.5B-Instruct model on an NVIDIA L4 server with a load of cline client on my personal laptop. I was trying out cline with my own server hosted via vLLM, and performance was not as responsive as I'm used to with the API's of openAI, xGrok etc. I would have expected a small 1.5B model on an L4 GPU server to be atleast similarly responsive to these APIs.
NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. ... Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.
Reddit
reddit.com › r/cuda › using nvidia tools for profiling
r/CUDA on Reddit: Using Nvidia tools for profiling
March 7, 2025 -
I've written a guide on using Nvidia tools (Nsight systems, Nsight Compute,..) from zero to hero, here is content:
Fix-Bug
Chapter01: Introduction to Nsight Systems - Nsight Compute
Chapter02: Cuda toolkit - Cuda driver
Chapter03: NVIDIA Compute Sanitizer Part 1
Chapter04: NVIDIA Compute Sanitizer Part 2
Chapter05: Global Memory Coalescing
Chapter06: Warp Scheduler
Chapter07: Occupancy Part 1
Chapter08: Occupancy Part 2
Chapter09: Bandwidth - Throughput - Latency
Chapter10: Compute Bound - Memory Bound
NVIDIA
docs.nvidia.com › cuda › profiler-users-guide
1. Preparing An Application For Profiling — Profiler 12.9 documentation
May 31, 2025 - This document describes NVIDIA profiling tools that enable you to understand and optimize the performance of your CUDA, OpenACC or OpenMP applications. The Visual Profiler is a graphical profiling tool that displays a timeline of your application’s CPU and GPU activity, and that includes an automated analysis engine to identify optimization opportunities.
Polar Signals
polarsignals.com › blog › posts › 2025 › 10 › 22 › gpu-profiling
Continuous NVIDIA CUDA Profiling In Production | Polar Signals
October 22, 2025 - Our solution combines the CUPTI profiling API with USDT probes and eBPF into a simple pipeline making what we believe to be the worlds first (to our knowledge) open-source low-overhead always on GPU profiler. At the heart of our solution is parcagpu, a shim library that intercepts CUDA API calls. Using CUDA's CUDA_INJECTION64_PATH mechanism, we can inject this library into any CUDA application without modification: export CUDA_INJECTION64_PATH=/path/to/libparcagpucupti.so ./your_cuda_application · Our goal is zero instrumentation but this approach is probably as close as we can realistically get. The library uses NVIDIA's CUPTI (CUDA Profiling Tools Interface) to:
NVIDIA Developer
developer.nvidia.com › cuda-profiling-tools-interface
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer
GPU workload trace for the activities happening on the GPU, which includes kernel executions, memory operations (e.g., Host-to-Device memory copies) and memset operations.
NVIDIA
docs.nvidia.com › nsight-compute › ProfilingGuide › index.html
2. Profiling Guide — NsightCompute 13.3 documentation
April 14, 2026 - The tool inserts its measurement ... the profiler to intercept communication with the CUDA user-mode driver. In addition, when a kernel launch is detected, the libraries can collect the requested performance metrics from the GPU. The results are then transferred back to the frontend. ... Collection of performance metrics is the key feature of NVIDIA Nsight ...
GitHub
github.com › Orbmu2k › nvidiaProfileInspector › releases
Releases · Orbmu2k/nvidiaProfileInspector
2 weeks ago - Added NVIDIA GPU startup check | @Orbmu2k · 'Profile selected' snackbar notification on drop | @Orbmu2k · Add profile to modified profile list when changes applied | @emoose · Fix favorite toggling invalidating setting item values #414 | @Orbmu2k · Prevent unintended switch to global profile in specific cases #415| @Orbmu2k ·
Author Orbmu2k
Wisc
hep.wisc.edu › cms › comp › gpuprofiling.html
Profiling CUDA Kernels with NVIDIA NSight Compute# | University of Wisconsin–Madison
There is an ongoing effort to ... a complete overview of the MadGraph4GPU project and its codebase, see the highlighted links. # We can use the NVIDIA NSight software suite to profile GPU kernels in the CUDA version....
YouTube
youtube.com › watch
Lecture 44: NVIDIA Profiling - YouTube
Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.
Published February 16, 2025
Juliagpu
cuda.juliagpu.org › v2.5 › development › profiling
Profiling · CUDA.jl
$ nvprof --profile-from-start off julia julia> using CUDA julia> a = CUDA.rand(1024,1024,1024); julia> sin.(a); julia> CUDA.@profile sin.(a); julia> exit() ==156406== Profiling application: julia ==156406== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 100.00% 44.777ms 1 44.777ms 44.777ms 44.777ms ptxcall_broadcast_1 API calls: 56.46% 6.6544ms 1 6.6544ms 6.6544ms 6.6544ms cuMemAlloc 43.52% 5.1286ms 1 5.1286ms 5.1286ms 5.1286ms cuLaunchKernel 0.01% 1.3200us 1 1.3200us 1.3200us 1.3200us cuDeviceGetCount 0.01% 725ns 3 241ns 196ns 301ns cuCtxGetCurrent · For a visual overview of these results, you can use the NVIDIA Visual Profiler (nvvp):