🌐
NVIDIA
docs.nvidia.com › cuda › profiler-users-guide › index.html
1. Preparing An Application For Profiling — Profiler 12.9 documentation
May 31, 2025 - If your CUDA application includes graphics that operate using a display or main loop, care must be taken to call cudaProfilerStop() or cuProfilerStop() before the thread executing that loop calls exit(). Failure to call one of these APIs may result in the loss of some or all of the collected profile data. For some graphics applications like the ones use OpenGL, the application exits when the escape key is pressed. In those cases where calling the above functions before exit is not feasible, use nvprof option --timeout or set the “Execution timeout” in the Visual Profiler.
🌐
Mit-satori
mit-satori.github.io › tutorial-examples › nvprof-profiling › index.html
Profiling code with nvprof — MIT Satori User Documentation documentation
git clone https://gist.github.com/de9e934a2315fd2551a794d40255d301.git cp de9e934a2315fd2551a794d40255d301/vector-add.cu . rm -fr de9e934a2315fd d2551a794d40255d301 nvcc -o vector-add vector-add.cu bsub -gpu "num=1" -Is bash nvprof -s -o results.nvprof ./vector-add
🌐
GitHub
gist.github.com › sonots › 5abc0bccec2010ac69ff74788b265086
How to use NVIDIA profiler - Gist - GitHub
$ nvprof --print-gpu-trace python train_mnist.py --network mlp --num-epochs 1 INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus=None, kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=1, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001) ==27259== NVPROF is profiling process 27259, command: python train_mnist.py --network mlp --num-epochs 1 INFO:root:Epoch[
🌐
Colby College
cs.colby.edu › courses › S14 › cs336 › online_materials › CUDA_Profiler_Users_Guide.pdf pdf
PROFILER USER'S GUIDE DU-05982-001_v5.5 | May 2013
... An event is a countable activity, action, or occurrence on a device. It corresponds to a · single hardware counter value which is collected during kernel execution. To see a list of · all available events on a particular NVIDIA GPU, type nvprof --query-events.
🌐
Oak Ridge Leadership Computing Facility
olcf.ornl.gov › wp-content › uploads › 2019 › 08 › NVIDIA-Profilers.pdf pdf
Jeff Larkin, August 08, 2019; Some slides courtesy Tom Papatheodore (ORNL)
August 8, 2019 - $ scp USERNAME@login1.ascent.c... your local system: 10 · VECTOR ADDITION EXAMPLE – VISUAL PROFILER · File->Import · 1 · 2 · Select “Nvprof” then “Next >” ·...
🌐
NVIDIA Developer
developer.nvidia.com › blog › cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler
Recent posts for: “CUDA”
November 23, 2021 - NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and...
🌐
NVIDIA Developer Forums
forums.developer.nvidia.com › developer tools › other tools › visual profiler and nvprof
How to use NVPROF on code compiled with NVRTC? - Visual Profiler and nvprof - NVIDIA Developer Forums
October 21, 2018 - Hi, I have been using NVPROF to collect all 113 performance counters from my kernels that run on a TitanV. I was never able to get CUPTI to give me all the counters the way NVPROF does. Now I am using NVRTC (with JITI…
Find elsewhere
🌐
GitHub
gist.github.com › mrprajesh › 352cbe661ee27a6b4627ae72d89479e6
Learn nvprof - Profiling CUDA Programs - Gist - GitHub
How to query for a specific metric? say Dram reads. nvprof --metrics dram_read_transactions ./executable
🌐
StudyRaid
app.studyraid.com › en › read › 11728 › 371473 › installing-nvidia-visual-profiler-and-nvprof
Understand installing NVIDIA Visual Profiler and nvprof
January 13, 2025 - For Windows, add these paths to the system environment variables through System Properties. The nvprof command-line profiler offers quick performance analysis.
🌐
Fz-juelich
indico-jsc.fz-juelich.de › event › 32 › material › 0 › 5.pdf pdf
Fz-juelich
In order to enable an iCal export link, your account needs to have a key created. This key enables other applications to access data from within Indico even when you are neither using nor logged into the Indico system yourself with the link provided. Once created, you can manage your key at ...
🌐
Euro-fusion
indico.euro-fusion.org › event › 460 › attachments › 520 › 1091 › 03_-_NVIDIA_PROFILING_TOOLS.pdf pdf
NVIDIA PROFILING TOOLS OVERVIEW
September 22, 2020 - Run nvprof multiple times to collect metrics · nvprof --output-profile profile.<metric>.%q{OMPI_COMM_WORLD_RANK}\ --aggregate-mode off --event-collection-mode continuous \ --metrics <metric> –f · Use `--query-metrics` and `--query-events` for full list of metrics (-m) or events (-e) Combine with an MPI annotated timeline file for full picture ·
🌐
Massed Compute
massedcompute.com › home › faq answers
How to use NVIDIA\'s nvprof tool to profile and optimize memory usage on A100 GPUs? - Massed Compute
July 31, 2025 - Optimize A100 GPU memory usage with NVIDIA's nvprof tool. Learn how to profile and optimize for improved performance.
🌐
GitHub
gist.github.com › mcarilli › 213a4e698e4a0ae2234ddee56f4f3f95
Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...) · GitHub
This is not essential for profiling, but will cause the script to produce run-to-run deterministic results. ... nvprof --profile-from-start off -fo %p.nvprof python main_amp.py -a resnet50 --b 224 --prof 20 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/
🌐
Medium
medium.com › @rc3729 › profiling-deep-learning-inference-with-nsight-systems-and-nvprof-a-practical-guide-c1e5b49f0627
Profiling Deep Learning Inference with Nsight Systems and nvprof: A Practical Guide | by Raksha Chandrashekar | Medium
July 7, 2025 - Locating performance bottlenecks by comparing CPU and GPU latencies making it easier to pinpoint which component limits the pipeline efficiency. For a simpler, terminal-based overview, along with saving the profiling output into a log file: nvprof --unified-memory-profiling off --dependency-analysis --log-file profiling_log.txt python3 inference.py ./input.mp4
🌐
PyPI
pypi.org › project › nvprof
nvprof · PyPI
This tool is aimed in extracting the small bits of important information and make profiling in NVVP faster. You can remove a big number of unimportant events and take a small time slice, so that you can shrink the sqlite database a few MBs. Author: Bohumír Zámečník bohumir.zamecnik@gmail.com, Rossum ... $ nvprof_tools --help usage: nvprof_tools [-h] {info,truncate,slice} ...
      » pip install nvprof
    
Published   Nov 19, 2017
Version   0.2
🌐
Kaust
hpc-user-docs.kaust.edu.sa › soft_env › prof_debug › using-nvprof.html
Using nvprof — KAUST Supercomputing Lab Support Documentation 0.1 documentation
The following is an example jobscript to generate the profile. The training script trains resnet50 from scratch using tiny imagenet (200 classes) for 1st epoch. ... #!/bin/bash --login #SBATCH --time=00:10:00 #SBATCH --nodes=1 #SBATCH --gpus-per-node=1 #SBATCH --cpus-per-gpu=8 #SBATCH --constraint=v100 #SBATCH --partition=batch #SBATCH --job-name=nvprof #SBATCH --mail-type=ALL #SBATCH --output=%x-%j-slurm.out #SBATCH --error=%x-%j-slurm.err module load dl torchvision pytorch/1.9.0 cmd="python ./train.py" nvprof profile.${SLURM_JOBID}.nvvp ${cmd}
🌐
UCSD
cseweb.ucsd.edu › classes › wi15 › cse262-a › static › cuda-5.5-doc › pdf › CUDA_Profiler_Users_Guide.pdf pdf
PROFILER USER'S GUIDE DU-05982-001_v5.5 | July 2013
... An event is a countable activity, action, or occurrence on a device. It corresponds to a · single hardware counter value which is collected during kernel execution. To see a list of · all available events on a particular NVIDIA GPU, type nvprof --query-events.
🌐
GitHub
github.com › rossumai › nvprof-tools
GitHub - rossumai/nvprof-tools: Python tools for NVIDIA Profiler · GitHub
This tool is aimed in extracting the small bits of important information and make profiling in NVVP faster. You can remove a big number of unimportant events and take a small time slice, so that you can shrink the sqlite database a few MBs. Author: Bohumír Zámečník bohumir.zamecnik@gmail.com, Rossum ... $ nvprof_tools --help usage: nvprof_tools [-h] {info,truncate,slice} ...
Starred by 21 users
Forked by 6 users
Languages   Python 90.7% | Makefile 9.3%