As you have pointed out, you can use CUDA profilers to profile python codes simply by having the profiler run the python interpreter, running your script:
nvprof python ./myscript.py
Regarding the GPUs being used, the CUDA environment variable CUDA_VISIBLE_DEVICES can be used to restrict the CUDA runtime API to use only certain GPUs. You can try it like this:
CUDA_VISIBLE_DEVICES="0" nvprof --profile-child-processes python ./myscript.py
Also, nvprof is documented and also has command line help via nvprof --help. Looking at the command-line help, I see a --devices switch which appears to limit at least some functions to use only particular GPUs. You could try it with:
nvprof --devices 0 --profile-child-processes python ./myscript.py
For newer GPUs, nvprof may not be the best profiler choice. You should be able to use nsight systems in a similar fashion, for example via:
nsys profile --stats=true python ....
Additional "newer" profiler resources are linked here.
cuda - nvprof is using all available GPU's when profiling python script - Stack Overflow
Nvprof python.exe pytorch code
using nvprof with pycuda?
See the penultimate slide. https://github.com/mit-satori/getting-started/blob/master/tutorial-examples/nvprof-profiling/Satori_NVProf_Intro.pdf
More on reddit.comCalling nvprof from a pythoon code
» pip install nvprof
Does anyone know how to profile pycuda scripts using command line tool nvprof?
Thanks!
See the penultimate slide. https://github.com/mit-satori/getting-started/blob/master/tutorial-examples/nvprof-profiling/Satori_NVProf_Intro.pdf
It should work in the same way like normal programs: nvprof python [options] ./my_scrip.py.
There exist some wrapper around the library functions if you don't want to profile the whole program