Reposting: Use "stat unitgraph" console command to validate if you are CPU/RHI or GPU bound (counter with highest ms). If your GPU bound initially use the GPU Visualiser by running the "ProfileGPU" console command or pressing the hotkey " Ctrl + Shift + ," to capture a frame's information. Sort the GPU Visualiser list by highest ms and open the top couple of parent headings then sort the list by highest ms again so the child headings are sorted properly, use this information to investigate further. (Also look at using Renderdoc to analyse a frame in more detail) If your CPU bound you would instead use Unreal Insights to take a capture and look through it to find where the largest chunks of time are being spent and then try to optimise the code related to them. Answer from CloudShannen on reddit.com
🌐
GitHub
github.com › JeremyMain › GPUProfiler
GitHub - JeremyMain/GPUProfiler: GPUProfiler - Understand your application and workflow resource requirements · GitHub
GPUProfiler is not a source code profiler but a resource and utilization profile that can provide a snapshot of a system and select resource utilization metrics over a period of time.
Starred by 313 users
Forked by 20 users
🌐
NVIDIA Developer
developer.nvidia.com › nvidia-visual-profiler
NVIDIA Visual Profiler | NVIDIA Developer
The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA ...
Discussions

How to read GPU profiler
Reposting: Use "stat unitgraph" console command to validate if you are CPU/RHI or GPU bound (counter with highest ms). If your GPU bound initially use the GPU Visualiser by running the "ProfileGPU" console command or pressing the hotkey " Ctrl + Shift + ," to capture a frame's information. Sort the GPU Visualiser list by highest ms and open the top couple of parent headings then sort the list by highest ms again so the child headings are sorted properly, use this information to investigate further. (Also look at using Renderdoc to analyse a frame in more detail) If your CPU bound you would instead use Unreal Insights to take a capture and look through it to find where the largest chunks of time are being spent and then try to optimise the code related to them. More on reddit.com
🌐 r/unrealengine
5
1
October 22, 2024
How to use GPU profiler stats?
Hello @MartinTilo (tagging you because of this very useful post , and I was wonder what might have changed since July), I’m using HDRP trying to get some GPU profiler stats at runtime but haven’t been having any luck as of yet. (I’m using 2020.2.4, HDRP 10.3.1 on Windows.) More on discussions.unity.com
🌐 discussions.unity.com
12
1
February 16, 2021
[P] seeking feedback on a gpu profiler I made as a Python package
Took a look at the PyPI page. Cool that you're tackling this because GPU profiling tooling is genuinely painful and most teams just stare at nvidia-smi and hope for the best. Few questions and thoughts. What's the overhead like when profiling is active? Our clients doing ML work are always paranoid about profilers that distort the thing they're measuring. If your instrumentation adds latency or memory pressure it can shift where bottlenecks appear. Would be worth documenting expected overhead as a percentage. The auto-calibration for any GPU is ambitious. How are you handling the differences between consumer cards versus datacenter stuff like A100s and H100s? Memory bandwidth characteristics and compute unit architectures vary a ton. Curious whether calibration actually captures those differences or if it's approximating. The compute/memory/overhead classification is the useful part imo. Most people don't realize their "slow kernel" is actually just waiting on memory transfers. If your tool makes that obvious you're solving a real problem. One thing that would make this way more useful is integration examples with common training loops. PyTorch Lightning, HuggingFace Trainer, that kind of thing. People are lazy and if they can't drop it into their existing workflow with minimal changes they won't bother. Also no README on PyPI is rough. I see you have a GitHub link but the project description is basically empty. You're asking people to pip install something with almost no context. Throw some example output and a quick usage snippet on there, it'll dramatically increase adoption. More on reddit.com
🌐 r/MachineLearning
2
3
January 3, 2026
GPU Profiling tools
Hi there I am looking for some tools to profile GPU workloads running on a physical desktop. I am aware of GPUprofiler (Releases · JeremyMain/GPUProfiler (github.com) but are there alternatives? More on community.omnissa.com
🌐 community.omnissa.com
2
October 24, 2024
🌐
MathWorks
mathworks.com › gpu coder › performance
gpuprofile - Profile execution time for generated CUDA code - MATLAB
This MATLAB function profiles the execution times for the generated CUDA code and displays the profiling results in the GPU Performance Analyzer report.
🌐
AMD GPUOpen
gpuopen.com › rgp
AMD Radeon™ GPU Profiler - AMD GPUOpen
AMD RGP gives you unprecedented, in-depth access to a GPU. Easily analyze graphics, async compute usage, event timing, pipeline stalls, barriers, bottlenecks, and other performance inefficiencies.
🌐
Reddit
reddit.com › r/unrealengine › how to read gpu profiler
r/unrealengine on Reddit: How to read GPU profiler
October 22, 2024 -

I’m running the editor at 40/50 fps, when I run the profileGPU command it say my scene took 40ms, but when I click on it, it say (on the image I linked) it took 20ms, and as you can see in the graph the missing 20ms are blank and there is nothing on it, but they are still there. What’s going on ?

https://imgur.com/a/30C1ubb

P.S.: My scene has like 1300 actors, and while using world partition and getting far enough to unload every actor, I gain like 5fps max

🌐
Unity
discussions.unity.com › unity engine
How to use GPU profiler stats? - Unity Engine - Unity Discussions
February 16, 2021 - Hello @MartinTilo (tagging you because of this very useful post , and I was wonder what might have changed since July), I’m using HDRP trying to get some GPU profiler stats at runtime but haven’t been having any luck as of yet. (I’m using 2020.2.4, HDRP 10.3.1 on Windows.)
Find elsewhere
🌐
Harvard
handbook.eng.kempnerinstitute.harvard.edu › s5_ai_scaling_and_engineering › scalability › gpu_profiling.html
19.3. GPU Profiling — Kempner Institute Computing Handbook
Nsight Systems (High-Level Profiling) checks our code overall to see if there are any problems (e.g., with host and device communication or GPU kernels) identifying non-performant/top kernel(s) and then Nsight Compute (Kernel-Specific Profiling) dives into the details of the identified kernel(s) to help with optimizing, debugging and fixing the issue.
🌐
Unity
docs.unity3d.com › 6000.4 › Documentation › Manual › ProfilerGPU.html
Unity - Manual: GPU Usage Profiler module
The GPU Usage Profiler module displays where your application spends time in the GPU. You can only use the GPU Profiler in Play mode, or for builds of your application.
🌐
Polar Signals
polarsignals.com › use-cases › gpu-profiling
GPU Profiling | Polar Signals
Polar Signals Cloud is an always-on, zero-instrumentation continuous profiling for CPU, GPU, and Memory that helps improve performance, understand incidents, and lower infrastructure costs.
🌐
Android Developers
developer.android.com › app quality › analyze with profile gpu rendering
Analyze with Profile GPU Rendering | App quality | Android Developers
May 19, 2026 - The Profile GPU Rendering tool indicates the relative time that each stage of the rendering pipeline takes to render the previous frame.
🌐
Reddit
reddit.com › r/machinelearning › [p] seeking feedback on a gpu profiler i made as a python package
r/MachineLearning on Reddit: [P] seeking feedback on a gpu profiler I made as a Python package
January 3, 2026 -

Recently released a project that profiles GPU. It classifies operations as compute/memory/overhead bound and suggests fixes. works on any gpu through auto-calibration

Let me know https://pypi.org/project/gpu-regime-profiler/

pip install gpu-regime-profiler

Top answer
1 of 1
1
Took a look at the PyPI page. Cool that you're tackling this because GPU profiling tooling is genuinely painful and most teams just stare at nvidia-smi and hope for the best. Few questions and thoughts. What's the overhead like when profiling is active? Our clients doing ML work are always paranoid about profilers that distort the thing they're measuring. If your instrumentation adds latency or memory pressure it can shift where bottlenecks appear. Would be worth documenting expected overhead as a percentage. The auto-calibration for any GPU is ambitious. How are you handling the differences between consumer cards versus datacenter stuff like A100s and H100s? Memory bandwidth characteristics and compute unit architectures vary a ton. Curious whether calibration actually captures those differences or if it's approximating. The compute/memory/overhead classification is the useful part imo. Most people don't realize their "slow kernel" is actually just waiting on memory transfers. If your tool makes that obvious you're solving a real problem. One thing that would make this way more useful is integration examples with common training loops. PyTorch Lightning, HuggingFace Trainer, that kind of thing. People are lazy and if they can't drop it into their existing workflow with minimal changes they won't bother. Also no README on PyPI is rough. I see you have a GitHub link but the project description is basically empty. You're asking people to pip install something with almost no context. Throw some example output and a quick usage snippet on there, it'll dramatically increase adoption.
🌐
Unity
docs.unity3d.com › 550 › Documentation › Manual › ProfilerGPU.html
Unity - Manual: GPU Profiler
The GPU Usage Profiler displays where GPU time is spent in your game. When you select this Profiler, the lower pane displays hierarchical time data for the selected frame. Select an item from the Hierarchy to see a breakdown of contributions in the right-hand panel.
🌐
Meta
developers.meta.com › horizon › documentation › unity › ts-ovrgpuprofiler
Use ovrgpuprofiler for GPU Profiling | Meta Horizon OS Developers
Access real-time GPU metrics and perform render stage traces on Meta Quest using ovrgpuprofiler CLI.
🌐
GitHub
github.com › jeremymain › gpuprofiler › releases
Releases · JeremyMain/GPUProfiler
GPUProfiler - Understand your application and workflow resource requirements - JeremyMain/GPUProfiler
Author   JeremyMain
🌐
Mintlify
mintlify.com › Silas-Asamoah › stormlog › api › cli › gpu-profiler
GPU Memory Profiler
March 3, 2026 - Self-updating documentation for startups, enterprises, and agents.
🌐
NVIDIA Developer
developer.nvidia.com › nsight-systems
Nsight Systems | NVIDIA Developer
Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command-line tool. It also provides a customizable, data-driven user interface and metric collection that can be extended with analysis scripts for post-processing results.. ... Nsigh Aftermath SDK is a library that integrates into a D3D12 or Vulkan game’s crash reporter to generate GPU “mini-dumps” when an exception or TDR occurs, exposing pipeline information to resolve an unexpected crash.
🌐
Medium
medium.com › @shaginhekvs › from-zero-to-hero-in-gpu-performance-profiling-optimization-e03da271ff18
From Zero to Hero in GPU performance profiling & optimization | by Keshav Singh | Medium
May 5, 2026 - The goal: profile vLLM serving the Qwen/Qwen2.5–1.5B-Instruct model on an NVIDIA L4 server with a load of cline client on my personal laptop. I was trying out cline with my own server hosted via vLLM, and performance was not as responsive as I'm used to with the API's of openAI, xGrok etc. I would have expected a small 1.5B model on an L4 GPU server to be atleast similarly responsive to these APIs.