As you have pointed out, you can use CUDA profilers to profile python codes simply by having the profiler run the python interpreter, running your script:

nvprof python ./myscript.py

Regarding the GPUs being used, the CUDA environment variable CUDA_VISIBLE_DEVICES can be used to restrict the CUDA runtime API to use only certain GPUs. You can try it like this:

CUDA_VISIBLE_DEVICES="0" nvprof --profile-child-processes python ./myscript.py

Also, nvprof is documented and also has command line help via nvprof --help. Looking at the command-line help, I see a --devices switch which appears to limit at least some functions to use only particular GPUs. You could try it with:

nvprof --devices 0 --profile-child-processes python ./myscript.py

For newer GPUs, nvprof may not be the best profiler choice. You should be able to use nsight systems in a similar fashion, for example via:

nsys profile --stats=true python ....

Additional "newer" profiler resources are linked here.

🌐
Mit-satori
mit-satori.github.io › tutorial-examples › nvprof-profiling › index.html
Profiling code with nvprof — MIT Satori User Documentation documentation
The nvprof tool from NVidia can be used to create detailed profiles of where codes are spending time and what resources they are using. It can work for compiled CUDA code and for Python libraries.
🌐
GitHub
gist.github.com › sonots › 5abc0bccec2010ac69ff74788b265086
How to use NVIDIA profiler - Gist - GitHub
$ nvprof --print-gpu-trace python train_mnist.py --network mlp --num-epochs 1 INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus=None, kv_store='device', load_epoch=None, lr=0.05, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=1, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001) ==27259== NVPROF is profiling process 27259, command: python train_mnist.py --network mlp --num-epochs 1 INFO:root:Epoch[
Discussions

cuda - nvprof is using all available GPU's when profiling python script - Stack Overflow
I am using a remote machine, which has 2 GPU's, in order to execute a Python script which has CUDA code. In order to find where I can improve the performance of my code, I am trying to use nvprof. ... More on stackoverflow.com
🌐 stackoverflow.com
Nvprof python.exe pytorch code
nprof works to profile C++ CUDA executable, but not python with Pytorch code: python -c “import torch; torch.randperm(10, device=‘cuda’)” ======== Warning: No CUDA application was profiled, exiting According to How do I know randperm is performed on GPU - #2 by ptrblck - C++ - PyTorch ... More on forums.developer.nvidia.com
🌐 forums.developer.nvidia.com
1
0
August 14, 2023
using nvprof with pycuda?

See the penultimate slide. https://github.com/mit-satori/getting-started/blob/master/tutorial-examples/nvprof-profiling/Satori_NVProf_Intro.pdf

More on reddit.com
🌐 r/CUDA
4
1
July 7, 2020
Calling nvprof from a pythoon code
I want to turn on the nvprof inside a python code. The code looks like for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)): data, targets = get_batch(train_data, i) hidden = repackage_hidden(hidden) model.zero_grad() output, hidden = model(data, hidden) loss = criterion(... More on forums.developer.nvidia.com
🌐 forums.developer.nvidia.com
0
0
January 26, 2019
🌐
GitHub
github.com › rossumai › nvprof-tools
GitHub - rossumai/nvprof-tools: Python tools for NVIDIA Profiler · GitHub
Python tools for NVIDIA Profiler. Contribute to rossumai/nvprof-tools development by creating an account on GitHub.
Starred by 21 users
Forked by 6 users
Languages   Python 90.7% | Makefile 9.3%
🌐
PyPI
pypi.org › project › nvprof
nvprof · PyPI
Python :: 3 · Report project as malware · Tools to help working with nvprof SQLite files, specifically for profiling scripts to train deep learning models. The files can be big and thus slow to scp and work with in NVVP.
      » pip install nvprof
    
Published   Nov 19, 2017
Version   0.2
🌐
NVIDIA Developer Forums
forums.developer.nvidia.com › developer tools › other tools › visual profiler and nvprof
Nvprof python.exe pytorch code - Visual Profiler and nvprof - NVIDIA Developer Forums
August 14, 2023 - nprof works to profile C++ CUDA executable, but not python with Pytorch code: python -c “import torch; torch.randperm(10, device=‘cuda’)” ======== Warning: No CUDA application was profiled, exiting According to How do I know randperm is performed on GPU - #2 by ptrblck - C++ - PyTorch Forums it should work.
🌐
Packtpub
subscription.packtpub.com › book › game-development › 9781788993913 › 6 › ch06lvl1sec42 › using-the-nvidia-nvprof-profiler-and-visual-profiler
Debugging and Profiling Your CUDA Code | Hands-On GPU Programming with Python and CUDA
We can do a basic profiling of a binary executable program with the nvprof program command; we can likewise profile a Python script by using the python command as the first argument, and the script as the second as follows: nvprof python program.py.
🌐
Vincent-lunot
vincent-lunot.com › post › an-introduction-to-cuda-in-python-part-4
An introduction to CUDA in Python (Part 4) - Vincent's Blog
December 4, 2017 - For python files, nvprof can be launched the following way: nvprof python filename.py This command executes the default mode of nvprof that is the summary mode.
Find elsewhere
🌐
GitHub
gist.github.com › mcarilli › 213a4e698e4a0ae2234ddee56f4f3f95
Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...) · GitHub
nvprof --profile-from-start off -fo %p.nvprof python main_amp.py -a resnet50 --b 224 --prof 20 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/
🌐
CodeGenes
codegenes.net › blog › nvprof-pytorch
In-Depth Guide to Using nvprof with PyTorch — codegenes.net
To profile a PyTorch script using nvprof, you can simply prefix your Python command with nvprof.
🌐
GitHub
github.com › NVIDIA › PyProf
GitHub - NVIDIA/PyProf: A GPU performance profiling tool for PyTorch models · GitHub
June 30, 2021 - A GPU performance profiling tool for PyTorch models - NVIDIA/PyProf
Starred by 511 users
Forked by 50 users
Languages   Python 95.8% | Shell 3.6% | Dockerfile 0.6%
🌐
Fbpic
fbpic.github.io › advanced › profiling.html
Profiling the code — FBPIC 0.27.0 documentation
nvprof --log-file gpu.log python -m cProfile -s time fbpic_script.py > cpu.log
🌐
GitHub
github.com › rossumai › nvprof-tools › blob › master › docs › info.md
nvprof-tools/docs/info.md at master · rossumai/nvprof-tools
September 18, 2024 - Python tools for NVIDIA Profiler. Contribute to rossumai/nvprof-tools development by creating an account on GitHub.
Author   rossumai
🌐
Libraries.io
libraries.io › pypi › nvprof
nvprof 0.2 on PyPI - Libraries.io - security & maintenance data for open source software
November 13, 2017 - Homepage PyPI Python · License · MIT · Install · pip install nvprof==0.2 · Tools to help working with nvprof SQLite files, specifically for profiling scripts to train deep learning models.
🌐
GitHub
github.com › NVIDIA › PyProf › blob › main › docs › profile.rst
PyProf/docs/profile.rst at main · NVIDIA/PyProf
$ nvprof -f # Overwrite existing file -o net.sql # Create net.sql python net.py
Author   NVIDIA
🌐
Numba
numba.pydata.org › numba-doc › dev › cuda › faq.html
CUDA Frequently Asked Questions — Numba 0.52.0.dev0+274.g626b40e-py3.7-linux-x86_64.egg documentation
When using the nvprof tool to profile Numba jitted code for the CUDA target, the output contains No kernels were profiled but there are clearly running kernels present, what is going on?
🌐
NVIDIA Developer Forums
forums.developer.nvidia.com › ai & data science › deep learning (training & inference) › cudnn
Calling nvprof from a pythoon code - cuDNN - NVIDIA Developer Forums
January 26, 2019 - I want to turn on the nvprof inside a python code. The code looks like for batch, i in enumerate(range(0, train_data.size(0) - 1, args.bptt)): data, targets = get_batch(train_data, i) hidden = repackage_…