TL;DR:

The up-to-date Nvidia tool to optimize a single compute kernel is Nsight Compute.


Details:

nvprof and nvvp are legacy profilers, while the Nsight profilers are newer and are regularly updated with new features. So as long as the Nsight profilers support your GPU architecture, you should probably use them.

I do not know how exactly Nvidia categorizes its software products into "Nsight" or not, but Nsight certainly is not a single product/piece of software and not everything called "Nsight" has something to do with profiling. As you noted, there are multiple IDE plugins under this moniker which give better syntax highlighting, a debugging GUI (wrapping cuda-gdb) etc.

The two available profilers for use in the compute context (vs 3D/Ray-Tracing with "Nsight Graphics") are Nsight Systems (nsys) and Nsight Compute (ncu). Both can be called in CLI mode for data-collection on a remote server, or with a GUI (nsys-ui and ncu-ui) to view the collected data or interactively collect data.

Nsight Systems gives you a timeline for the whole application, i.e. as OP described it, it "minimize[s] bottlenecks between multiple kernel invocations/data transfers, etc." and is therefore not what OP is searching for.

For more information on the relation between legacy and Nsight profilers see the Nvidia blog post Migrating to NVIDIA Nsight Tools from NVVP and Nvprof

Answer from paleonix on Stack Exchange
🌐
NVIDIA Developer
developer.nvidia.com › cuda-profiling-tools-interface
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit | NVIDIA Developer
The NVIDIA CUDA Profiling Tools Interface (CUPTI) is a library that enables the creation of profiling and tracing tools that target CUDA applications.
🌐
NVIDIA Developer
developer.nvidia.com › nvidia-visual-profiler
NVIDIA Visual Profiler | NVIDIA Developer
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA ...
🌐
NVIDIA
docs.nvidia.com › cuda › profiler-users-guide
1. Preparing An Application For Profiling — Profiler 12.9 documentation
May 31, 2025 - In cases where the profiler needs ... Linux 64-bit targets in PGI 2019 version 19.1 or later. The NVIDIA Visual Profiler allows you to visualize and optimize the performance of your application....
🌐
NVIDIA
docs.nvidia.com › cuda › archive › 9.0 › profiler-users-guide
Profiler :: CUDA Toolkit Documentation
The user manual for NVIDIA profiling tools for optimizing performance of CUDA applications.
🌐
GitHub
github.com › NVIDIA › cuda-profiler
GitHub - NVIDIA/cuda-profiler: Tools and extensions for CUDA profiling · GitHub
Tools and extensions for CUDA profiling. Contribute to NVIDIA/cuda-profiler development by creating an account on GitHub.
Starred by 67 users
Forked by 23 users
Languages   Python 73.6% | C++ 11.6% | CMake 5.1% | Perl 4.7% | CWeb 4.6% | Makefile 0.4%
🌐
NVIDIA Developer
developer.nvidia.com › nsight-compute
Nsight Compute | NVIDIA Developer
NVIDIA Nsight™ Compute is an interactive profiler for CUDA® and NVIDIA OptiX™ that provides detailed performance metrics and API debugging via a user interface and command-line tool.
🌐
NVIDIA
resources.nvidia.com › en-us-nsight-developer-tools › cuda-tutorials
CUDA Tutorials I Profiling and Debugging Applications
This video will get you started with the ecosystem of tools that CUDA developers equip themselves with to build applications at any scale.
Find elsewhere
🌐
NVIDIA
docs.nvidia.com › cuda › pdf › CUDA_Profiler_Users_Guide.pdf pdf
Profiler Release 12.9 NVIDIA Corporation May 31, 2025
Profiler shows these calls in the Timeline View, allowing you to see where each CPU thread in the · application is invoking CUDA functions.
🌐
NVIDIA
developer.download.nvidia.com › compute › cuda › 2_1 › cudaprof › cudaprof.html
NVIDIA CUDA Visual Profiler Version 1.1
Select the session settings through the dialog. Browse and select the CUDA program to profile. Change the working directory if it is different from the program directory. Select option for profiler countersSelect option for time stamps.
🌐
GitHub
github.com › NVIDIA › cuda-profiler › blob › master › one_hop_profiling › README.md
cuda-profiler/one_hop_profiling/README.md at master · NVIDIA/cuda-profiler
This is a script that remotely profiles a CUDA program when the machine actually running it is not directly accessible from the machine running the NVIDIA Visual Profiler.
Author   NVIDIA
🌐
Reddit
reddit.com › r/cuda › using nvidia tools for profiling
r/CUDA on Reddit: Using Nvidia tools for profiling
March 7, 2025 -

I've written a guide on using Nvidia tools (Nsight systems, Nsight Compute,..) from zero to hero, here is content:

Fix-Bug

Chapter01: Introduction to Nsight Systems - Nsight Compute

Chapter02: Cuda toolkit - Cuda driver

Chapter03: NVIDIA Compute Sanitizer Part 1

Chapter04: NVIDIA Compute Sanitizer Part 2

Chapter05: Global Memory Coalescing

Chapter06: Warp Scheduler

Chapter07: Occupancy Part 1

Chapter08: Occupancy Part 2

Chapter09: Bandwidth - Throughput - Latency

Chapter10: Compute Bound - Memory Bound

🌐
NVIDIA Developer
developer.nvidia.com › cupti-ctk12_5
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit 12.5 | NVIDIA Developer
The NVIDIA CUDA Profiling Tools Interface (CUPTI) is a dynamic library that enables the creation of profiling and tracing tools that target CUDA applications.
🌐
NVIDIA Developer
developer.nvidia.com › performance-analysis-tools
Nsight Developer Tools | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric ...
🌐
Polar Signals
polarsignals.com › blog › posts › 2025 › 10 › 22 › gpu-profiling
Continuous NVIDIA CUDA Profiling In Production | Polar Signals
October 22, 2025 - NVIDIA Nsight, the standard profiling tool is very informative but it's invasive, telling you about every syscall, CUDA API call, memory transfers and even application level stacktraces!
🌐
NVIDIA Developer
developer.nvidia.com › cupti-ctk10_1u2
NVIDIA CUDA Profiling Tools Interface (CUPTI) - CUDA Toolkit 10.1 Update 2 | NVIDIA Developer
The NVIDIA® CUDA Profiling Tools Interface (CUPTI) is a dynamic library that enables the creation of profiling and tracing tools that target CUDA applications.
🌐
YouTube
youtube.com › watch
CUDA Tutorials I Profiling and Debugging Applications - YouTube
Profile, optimize, and debug CUDA with NVIDIA Developer Tools. The NVIDIA Nsight suite of tools visualizes hardware throughput and will analyze performance m...
Published   August 25, 2023
🌐
NVIDIA
docs.nvidia.com › cuda › cuda-runtime-api › group__CUDART__PROFILER.html
CUDA Runtime API :: CUDA Toolkit Documentation
April 9, 2026 - This section describes the profiler control functions of the CUDA runtime application programming interface.
🌐
NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool.
🌐
NVIDIA
docs.nvidia.com › cuda › profiler-users-guide › contents.html
Contents — Profiler 12.9 documentation
May 31, 2025 - 1.1. Focused Profiling · 1.2. Marking Regions of CPU Activity · 1.3. Naming CPU and CUDA Resources · 1.4. Flush Profile Data · 1.5. Profiling CUDA Fortran Applications · 2. ​Visual Profiler · 2.1. Getting Started · 2.1.1. Setting up Java Runtime Environment ·