Brave Search

PyTorch CUDA vs Numpy for arithmetic operations? Fastest?

stackoverflow.com › questions › 52526082 › pytorch-cuda-vs-numpy-for-arithmetic-operations-fastest

GPU operations have to additionally get memory to/from the GPU

The problem is that your GPU operation always has to put the input on the GPU memory, and then retrieve the results from there, which is a quite costly operation.

NumPy, on the other hand, directly processes the data from the CPU/main memory, so there is almost no delay here. Additionally, your matrices are extremely small, so even in the best-case scenario, there should only be a minute difference.

This is also partially the reason why you use mini-batches when training on a GPU in neural networks: Instead of having several extremely small operations, you now have "one big bulk" of numbers that you can process in parallel.
Also note that GPU clock speeds are generally way lower than CPU clocks, so the GPU only really shines because it has way more cores. If your matrix does not utilize all of them fully, you are also likely to see a faster result on your CPU.

TL;DR: If your matrix is big enough, you will eventually see a speed-up in CUDA than Numpy, even with the additional cost of the GPU transfer.

Answer from dennlinger on Stack Overflow

Kaggle

kaggle.com › code › amirmotefaker › pytorch-vs-numpy

PyTorch vs NumPy

July 14, 2024 - Explore and run AI code with Kaggle Notebooks | Using data from No attached data sources

Medium

medium.com › @aktooall › numpy-vs-pytorch-tensors-same-dna-different-superpowers-f59c1211274d

“NumPy vs PyTorch Tensors: Same DNA, Different Superpowers” | by Arunkumar Ravichandran | Medium

September 8, 2025 - Some functions like where exist in NumPy but have slightly different implementations in PyTorch (torch.where). Random number handling in PyTorch often links with seeds for reproducibility, while NumPy’s default_rng() gives a modern, flexible random generator.

Discussions

Pytorch tensor constructor speed vs numpy

So I was comparing the performance of the tensor constructor to the numpy array constructor For pytorch torch.inference_mode() total_time = 0.0 iterations = 10000 for _ in range(iterations): data = np.random.normal(0, 1, (1000, 10)).tolist() # Use numpy for RNG # but convert back to python ... More on discuss.pytorch.org

discuss.pytorch.org

8

1

November 8, 2023

Tensors vs Numpy Arrays

reach disarm sophisticated strong six coherent cooperative important reminiscent telephone This post was mass deleted and anonymized with Redact More on reddit.com

r/learnmachinelearning

27

56

April 21, 2023

Is pytorch faster than numpy on a single CPU?

If you really enforce a single thread, then it's all up to the BLAS library which does the heavy lifting. Both numpy and pytorch are compatible with the most common libs, so it's a matter of which lib they are linked against respectively. Numpy with MKL will most likely beat pytorch with OpenBLAS for most workloads. More on reddit.com

r/Python

29

64

July 12, 2024

Numerical differences between numpy and pytorch?

How come ? rng = np.random.RandomState(1) dataset1 = rng.uniform(low=-0.01, high=0.01, size=(1000, 20)) dataset2 = torch.from_numpy(data_numpy) print(dataset1.mean(), dataset1.std()) print(dataset2.mean(), dataset2.std()) returns 2.5537095782416174e-05 0.005769608507668796 tensor(-3.141987... More on discuss.pytorch.org

discuss.pytorch.org

3

1

July 17, 2020

Videos

youtube.com

M1 | L9 | Sharing with NumPy | Tensors