I also observed the same phenomenon with my customized model. The FLOPs counted by torch.profiler.profile is approximately 2 times of that counted by mmengine.analysis.get_model_complexity_info() (mmengine doc). I suspect this is caused by the unclear definition of FLOPs. torch.profiler.profile cou… Answer from qsj287068067 on discuss.pytorch.org
🌐
GitHub
github.com › cli99 › flops-profiler
GitHub - cli99/flops-profiler: pytorch-profiler · GitHub
The Flops Profiler helps users easily measure both the model training/inference speed (latency, throughput) and efficiency (floating-point operations per second, i.e., FLOPS) of a model and its submodules, with an eye towards eliminating ...
Starred by 49 users
Forked by 8 users
Languages   Python
🌐
Readthedocs
deepspeed.readthedocs.io › en › latest › flops-profiler.html
Flops Profiler — DeepSpeed 0.18.6 documentation
The flops-profiler profiles the forward pass of a PyTorch model and prints the model graph with the measured profile attached to each module. It shows how latency, flops and parameters are spent in the model and which modules or layers could be the bottleneck.
🌐
PyPI
pypi.org › project › flops-profiler
flops-profiler · PyPI
Similar to existing flops calculation tools or methods, the Flops Profiler measures the flops of the forward pass of a module and the flops of the backward pass is estimated as 2 times of that of the forward pass.
      » pip install flops-profiler
    
Published   Mar 07, 2023
Version   0.1.2
🌐
Readthedocs
flops-profiler.readthedocs.io › en › latest › index.html
Flops Profiler — flops-profiler documentation
Measures the parameters, latency, and floating-point operations of PyTorch model. class flops_profiler.profiler.FlopsProfiler(model, ds_engine=None)[source]
🌐
DeepSpeed
deepspeed.ai › home › tutorials
Flops Profiler - DeepSpeed
May 1, 2026 - Similar to existing flops calculation tools or methods, the DeepSpeed Flops Profiler measures the flops of the forward pass of a module and the flops of the backward pass is estimated as 2 times of that of the forward pass. Different from the PyTorch profiler which calculates the flops of PyTorch ...
🌐
PyTorch Developer Mailing List
dev-discuss.pytorch.org › performance
The "Ideal" PyTorch FLOP Counter (with __torch_dispatch__) - performance - PyTorch Developer Mailing List
February 18, 2022 - TL;DR: I wrote a flop counter in 130 lines of Python that 1. counts FLOPS at an operator level, 2. (optionally) aggregates them in a module hierarchy, 3. captures backwards FLOPS, and 4. works in eager-mode. Oh, and you can use it under arbitrary transformations (such as vmap) to compute FLOPS ...
🌐
GitHub
github.com › zhijian-liu › torchprofile
GitHub - zhijian-liu/torchprofile: Count the MACs / FLOPs of PyTorch models · GitHub
results = profile_macs(model, inputs, reduction=None) for node, macs in results.items(): if macs > 0: print(f"{node.scope:40s} {node.operator:30s} {macs / 1e6:>8.2f} MMACs")
Starred by 636 users
Forked by 44 users
Languages   Python
🌐
PyTorch Forums
discuss.pytorch.org › t › clarification-of-flops-macs-in-model-descriptions › 198948
Clarification of FLOps/MACs in model descriptions - PyTorch Forums
March 15, 2024 - Hi there! I noticed that the FLOps reported in the torchvision library are different from those computed with the torch profiler. So I computed the FLOps for several architectures from the torchvision library using meta’s fvcore library and the official torch profiler: architecture reported fvcore torch profiler AlexNet 0.71 0.71 1.43 ResNet 50 4.09 4.11 8.18 DenseNet 121 2.83 2.87 5.67 Swin B 15.43 15.47 30.88 MaxViT T 5.56 5.61 11.13 ViT-B 16 17.56 16.87 33.70 From these...
Find elsewhere
🌐
PyTorch
docs.pytorch.org › reference api › torch.profiler
torch.profiler — PyTorch main documentation
profile_memory (bool) – track ... line number) for the ops. with_flops (bool) – use formula to estimate the FLOPs (floating point operations) of specific operators (matrix multiplication and 2D ......
🌐
PyTorch Forums
discuss.pytorch.org › t › issues-on-pytorch-profiler-flops-counting-with-with-flops-true › 224734
Issues on pytorch profiler FLOPs counting with with_flops=True - PyTorch Forums
March 25, 2026 - According to the official ...r/profiler.py, with_flops is supposed to: “use formula to estimate the FLOPS of specific operators (matrix multiplication and 2D convolution).” This is, well pretty disappointing but at ...
🌐
Medium
medium.com › the-owl › understanding-and-calculating-flops-in-pytorch-models-c609cb83ac3a
Understanding and Calculating MACs and FLOPs in PyTorch Models | by Siladittya Manna | The Owl | Medium
May 22, 2025 - Multiple input is also supported through the args argument. `calflops` is another library that can be used to calculate the number of floating-point operations (FLOPs) for PyTorch models.
🌐
PyTorch Forums
discuss.pytorch.org › t › using-with-flops-argument-of-profiler › 124999
Using with_flops argument of Profiler - PyTorch Forums
June 25, 2021 - In the profiler documentation we have an argument called with_flops. When I set that value to True the exported profile in the json format had a huge size. However, I could tell how useful is this argument. I also tried …
🌐
GitHub
github.com › pytorch › pytorch › issues › 69782
Torch Profiler does not count FLOPs for backward pass · Issue #69782 · pytorch/pytorch
December 10, 2021 - from torch.profiler import profile import torch import torch.optim as optim import torchvision.models as models # setup model input and target model = models.resnet18(num_classes=10).cuda() inputs=torch.randn((1,3,224,224)).cuda() # dummy input y = torch.randint(10, size=(1,)).cuda() # dummy target # warm up cuda memory allocator, recommended here: https://github.com/pytorch/pytorch/blob/master/torch/autograd/profiler.py outputs = model(inputs) ### forward only with torch.profiler.profile( activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA, ], with_flops=Tru
Author   pytorch
🌐
GitHub
github.com › ruipeterpan › torch_profiler
GitHub - ruipeterpan/torch_profiler: Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo
The motivation behind writing this up is that DeepSpeed Flops Profiler profiles both the model training/inference speed (latency, throughput) and the efficiency (floating-point operations per second, i.e., FLOPS) of a model and its submodules but not the shape of the input/output of each module, and torchinfo is the other way around. Although this profiler only provides some basic functionalities, it achieves the best of both worlds in this aspect. This profiler is based on PyTorch hooks, so the profiling granularity is each torch.nn.Module.
Author   ruipeterpan
🌐
PyTorch Forums
discuss.pytorch.org › t › deepspeed-flops-profiler-for-llama-3-8b-model-on-compile-mode › 207630
Deepspeed flops profiler for llama-3-8B model on compile mode - PyTorch Forums
August 6, 2024 - Hi all, I want to find out the total number of flops of an inference flow of llama-3-8B model in compile mode using deepspeed flops profiler. I am using the following code for this purpose: model.generate = torch.compile(model.generate,backend=“aot_eager”) prof = FlopsProfiler(model) prof.start_profile() input_ids= tokenizer(batch_sentences,truncation=True,padding=“max_length”,max_length=256, return_tensors=“pt”).to(device=“cpu”) input_ids = input_ids[‘input_ids’] start_time1 = time.time(...
🌐
GitHub
github.com › pytorch › pytorch › issues › 82951
torch.profiler's FLOPs measure only counts operations involving '+' and '*' . · Issue #82951 · pytorch/pytorch
August 8, 2022 - import torch from torch.profiler import profile def flops(a, b, op): with profile( activities = [torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA], with_flops = True) as subtraction: if op == '-': c = a - b elif op == '-': c = a + (-b) else: raise NotImplementedError subtraction_events = subtraction.events() subtraction_flops = sum([int(evt.flops) for evt in subtraction_events]) print(subtraction_flops)
Author   pytorch
🌐
Anindya Bhadra
stat.purdue.edu › ~wang4094 › code_pytorch.html
PyTorch
Basic implementation examples (Linear Regression, CNN, ResNet, RNN, GAN, VAE, TensorBoard) · Another Flops counter for CNN
🌐
Readthedocs
deepspeed.readthedocs.io › en › latest › _modules › deepspeed › profiling › flops_profiler › profiler.html
deepspeed.profiling.flops_profiler.profiler — DeepSpeed 0.19.3 documentation
[docs]class FlopsProfiler(object): """Measures the latency, number of estimated floating-point operations and parameters of each module in a PyTorch model. The flops-profiler profiles the forward pass of a PyTorch model and prints the model graph with the measured profile attached to each module.