Brave Search

Why do we call .detach() before calling .numpy() on a Pytorch Tensor?

stackoverflow.com › questions › 63582590 › why-do-we-call-detach-before-calling-numpy-on-a-pytorch-tensor

I think the most crucial point to understand here is the difference between a torch.tensor and np.ndarray:
While both objects are used to store n-dimensional matrices (aka "Tensors"), torch.tensors has an additional "layer" - which is storing the computational graph leading to the associated n-dimensional matrix.

So, if you are only interested in efficient and easy way to perform mathematical operations on matrices np.ndarray or torch.tensor can be used interchangeably.

However, torch.tensors are designed to be used in the context of gradient descent optimization, and therefore they hold not only a tensor with numeric values, but (and more importantly) the computational graph leading to these values. This computational graph is then used (using the chain rule of derivatives) to compute the derivative of the loss function w.r.t each of the independent variables used to compute the loss.

As mentioned before, np.ndarray object does not have this extra "computational graph" layer and therefore, when converting a torch.tensor to np.ndarray you must explicitly remove the computational graph of the tensor using the detach() command.

Computational Graph
From your comments it seems like this concept is a bit vague. I'll try and illustrate it with a simple example.
Consider a simple function of two (vector) variables, x and w:

x = torch.rand(4, requires_grad=True)
w = torch.rand(4, requires_grad=True)

y = x @ w  # inner-product of x and w
z = y ** 2  # square the inner product

If we are only interested in the value of z, we need not worry about any graphs, we simply moving forward from the inputs, x and w, to compute y and then z.

However, what would happen if we do not care so much about the value of z, but rather want to ask the question "what is w that minimizes z for a given x"?
To answer that question, we need to compute the derivative of z w.r.t w.
How can we do that?
Using the chain rule we know that dz/dw = dz/dy * dy/dw. That is, to compute the gradient of z w.r.t w we need to move backward from z back to w computing the gradient of the operation at each step as we trace back our steps from z to w. This "path" we trace back is the computational graph of z and it tells us how to compute the derivative of z w.r.t the inputs leading to z:

z.backward()  # ask pytorch to trace back the computation of z

We can now inspect the gradient of z w.r.t w:

w.grad  # the resulting gradient of z w.r.t w
tensor([0.8010, 1.9746, 1.5904, 1.0408])

Note that this is exactly equals to

2*y*x
tensor([0.8010, 1.9746, 1.5904, 1.0408], grad_fn=<MulBackward0>)

since dz/dy = 2*y and dy/dw = x.

Each tensor along the path stores its "contribution" to the computation:

z
tensor(1.4061, grad_fn=<PowBackward0>)

And

y
tensor(1.1858, grad_fn=<DotBackward>)

As you can see, y and z stores not only the "forward" value of <x, w> or y**2 but also the computational graph -- the grad_fn that is needed to compute the derivatives (using the chain rule) when tracing back the gradients from z (output) to w (inputs).

These grad_fn are essential components to torch.tensors and without them one cannot compute derivatives of complicated functions. However, np.ndarrays do not have this capability at all and they do not have this information.

please see this answer for more information on tracing back the derivative using backwrd() function.

Since both np.ndarray and torch.tensor has a common "layer" storing an n-d array of numbers, pytorch uses the same storage to save memory:

numpy() → numpy.ndarray
Returns self tensor as a NumPy ndarray. This tensor and the returned ndarray share the same underlying storage. Changes to self tensor will be reflected in the ndarray and vice versa.

The other direction works in the same way as well:

torch.from_numpy(ndarray) → Tensor
Creates a Tensor from a numpy.ndarray.
The returned tensor and ndarray share the same memory. Modifications to the tensor will be reflected in the ndarray and vice versa.

Thus, when creating an np.array from torch.tensor or vice versa, both object reference the same underlying storage in memory. Since np.ndarray does not store/represent the computational graph associated with the array, this graph should be explicitly removed using detach() when sharing both numpy and torch wish to reference the same tensor.

Note, that if you wish, for some reason, to use pytorch only for mathematical operations without back-propagation, you can use with torch.no_grad() context manager, in which case computational graphs are not created and torch.tensors and np.ndarrays can be used interchangeably.

with torch.no_grad():
  x_t = torch.rand(3,4)
  y_np = np.ones((4, 2), dtype=np.float32)
  x_t @ torch.from_numpy(y_np)  # dot product in torch
  np.dot(x_t.numpy(), y_np)  # the same dot product in numpy

Answer from Shai on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 63582590 › why-do-we-call-detach-before-calling-numpy-on-a-pytorch-tensor

Why do we call .detach() before calling .numpy() on a Pytorch Tensor? - Stack Overflow

Top answer

1 of 3

142

I think the most crucial point to understand here is the difference between a torch.tensor and np.ndarray:
While both objects are used to store n-dimensional matrices (aka "Tensors"), torch.tensors has an additional "layer" - which is storing the computational graph leading to the associated n-dimensional matrix.

So, if you are only interested in efficient and easy way to perform mathematical operations on matrices np.ndarray or torch.tensor can be used interchangeably.

However, torch.tensors are designed to be used in the context of gradient descent optimization, and therefore they hold not only a tensor with numeric values, but (and more importantly) the computational graph leading to these values. This computational graph is then used (using the chain rule of derivatives) to compute the derivative of the loss function w.r.t each of the independent variables used to compute the loss.

As mentioned before, np.ndarray object does not have this extra "computational graph" layer and therefore, when converting a torch.tensor to np.ndarray you must explicitly remove the computational graph of the tensor using the detach() command.

Computational Graph
From your comments it seems like this concept is a bit vague. I'll try and illustrate it with a simple example.
Consider a simple function of two (vector) variables, x and w:

x = torch.rand(4, requires_grad=True)
w = torch.rand(4, requires_grad=True)

y = x @ w  # inner-product of x and w
z = y ** 2  # square the inner product

If we are only interested in the value of z, we need not worry about any graphs, we simply moving forward from the inputs, x and w, to compute y and then z.

However, what would happen if we do not care so much about the value of z, but rather want to ask the question "what is w that minimizes z for a given x"?
To answer that question, we need to compute the derivative of z w.r.t w.
How can we do that?
Using the chain rule we know that dz/dw = dz/dy * dy/dw. That is, to compute the gradient of z w.r.t w we need to move backward from z back to w computing the gradient of the operation at each step as we trace back our steps from z to w. This "path" we trace back is the computational graph of z and it tells us how to compute the derivative of z w.r.t the inputs leading to z:

z.backward()  # ask pytorch to trace back the computation of z

We can now inspect the gradient of z w.r.t w:

w.grad  # the resulting gradient of z w.r.t w
tensor([0.8010, 1.9746, 1.5904, 1.0408])

Note that this is exactly equals to

2*y*x
tensor([0.8010, 1.9746, 1.5904, 1.0408], grad_fn=<MulBackward0>)

since dz/dy = 2*y and dy/dw = x.

Each tensor along the path stores its "contribution" to the computation:

z
tensor(1.4061, grad_fn=<PowBackward0>)

And

y
tensor(1.1858, grad_fn=<DotBackward>)

As you can see, y and z stores not only the "forward" value of <x, w> or y**2 but also the computational graph -- the grad_fn that is needed to compute the derivatives (using the chain rule) when tracing back the gradients from z (output) to w (inputs).

These grad_fn are essential components to torch.tensors and without them one cannot compute derivatives of complicated functions. However, np.ndarrays do not have this capability at all and they do not have this information.

please see this answer for more information on tracing back the derivative using backwrd() function.

Since both np.ndarray and torch.tensor has a common "layer" storing an n-d array of numbers, pytorch uses the same storage to save memory:

numpy() → numpy.ndarray
Returns self tensor as a NumPy ndarray. This tensor and the returned ndarray share the same underlying storage. Changes to self tensor will be reflected in the ndarray and vice versa.

The other direction works in the same way as well:

torch.from_numpy(ndarray) → Tensor
Creates a Tensor from a numpy.ndarray.
The returned tensor and ndarray share the same memory. Modifications to the tensor will be reflected in the ndarray and vice versa.

Thus, when creating an np.array from torch.tensor or vice versa, both object reference the same underlying storage in memory. Since np.ndarray does not store/represent the computational graph associated with the array, this graph should be explicitly removed using detach() when sharing both numpy and torch wish to reference the same tensor.

Note, that if you wish, for some reason, to use pytorch only for mathematical operations without back-propagation, you can use with torch.no_grad() context manager, in which case computational graphs are not created and torch.tensors and np.ndarrays can be used interchangeably.

with torch.no_grad():
  x_t = torch.rand(3,4)
  y_np = np.ones((4, 2), dtype=np.float32)
  x_t @ torch.from_numpy(y_np)  # dot product in torch
  np.dot(x_t.numpy(), y_np)  # the same dot product in numpy

2 of 3

11

I asked, Why does it break the graph to to move to numpy? Is it because any operations on the numpy array will not be tracked in the autodiff graph?

Yes, the new tensor will not be connected to the old tensor through a grad_fn, and so any operations on the new tensor will not carry gradients back to the old tensor.

Writing my_tensor.detach().numpy() is simply saying, "I'm going to do some non-tracked computations based on the value of this tensor in a numpy array."

The Dive into Deep Learning (d2l) textbook has a nice section describing the detach() method, although it doesn't talk about why a detach makes sense before converting to a numpy array.

Thanks to jodag for helping to answer this question.

TutorialsPoint

tutorialspoint.com › article › how-to-convert-a-pytorch-tensor-with-gradient-to-a-numpy-array

How to convert a PyTorch tensor with gradient to a numpy array?

January 6, 2022 - To convert a Torch tensor with gradient to a Numpy array, first we have to detach the tensor from the current computing graph. To do it, we use the Tensor.detach() operation. This operation detaches the tensor from the current computational graph.

reddit.com › r/pytorch › can't call numpy() on tensor that requires grad. use tensor.detach().numpy() instead.

r/pytorch on Reddit: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

June 11, 2022 -

I've been stuck on this error for a long time now, and have gone through multiple solutions on stack overflow etc., but the problem is I believe I do have the .detach().numpy() and I still keep getting the error, so I'm not really sure what's going on. Could someone please help me out?

class plot_diagram():

# Constructor
def __init__(self, X, Y, w, stop, go = False):
start = w.data
self.error = []
self.parameter = []
self.X = X.numpy()
self.Y = Y.numpy()
self.parameter_values = torch.arange(start, stop)
self.Loss_function = [criterion(forward(X), Y) for w.data in self.parameter_values]
w.data = start

# Executor
def __call__(self, Yhat, w, error, n):
self.error.append(error)
self.parameter.append(w.data)
plt.subplot(212)
plt.plot(self.X, Yhat.detach().numpy())
plt.plot(self.X, self.Y,'ro')
plt.xlabel("A")
plt.ylim(-20, 20)
plt.subplot(211)
plt.title("Data Space (top) Estimated Line (bottom) Iteration " + str(n))
s = [p.detach().numpy() for p in self.parameter_values]
plt.plot(s, self.Loss_function)
plt.plot(self.parameter, self.error, 'ro')
plt.xlabel("B")
plt.figure()

# Destructor
def __del__(self):
plt.close('all')

X = torch.arange(-3, 3, 0.1).view(-1, 1)
Y = f + 0.1 * torch.randn(X.size())
w = torch.tensor(-10.0, requires_grad = True)
gradient_plot = plot_diagram(X, Y, w, stop = 5)

def forward(x):
return w * x
def criterion(yhat, y):
return torch.mean((yhat - y) ** 2)
lr = 0.1
LOSS = []
def train_model(iter):
for epoch in range (iter):

# make the prediction as we learned in the last lab
Yhat = forward(X)

# calculate the iteration
loss = criterion(Yhat,Y)

# plot the diagram for us to have a better idea
gradient_plot(Yhat, w, loss.item(), epoch)

# store the loss into list
LOSS.append(loss.item())

# backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()

# updata parameters
w.data = w.data - lr * w.grad.data

# zero the gradients before running the backward pass
w.grad.data.zero_()

Top answer

1 of 3

5

The tensor is not on the CPU, so it can't be converted. You have to move it to CPU then convert to numpy: p.detach().to("CPU").numpy()

2 of 3

1

Are your tensors on GPU?

GitHub

github.com › pytorch › pytorch › issues › 91810

[Bug/functorch] Cannot use `tensor.detach().numpy()` for `GradTrackingTensor`: Cannot access data pointer of Tensor that doesn't have storage · Issue #91810 · pytorch/pytorch

January 6, 2023 - In functorch.grad / functorch.vjp, the input params tensors are wrapped as GradTrackingTensor, the all intermediate tensors will also be GradTrackingTensor (e.g., the action tensor). They can be .detach() but cannot convert to numpy arrays .numpy().

Author pytorch

Medium

medium.com › @heyamit10 › converting-pytorch-tensors-to-numpy-arrays-fa804b1fae1c

Converting PyTorch Tensors to NumPy Arrays | by Hey Amit | Medium

February 26, 2025 - By detaching the tensor, you effectively tell PyTorch, “I’m done with this tensor in terms of backpropagation; you don’t need to track it anymore.” This makes the tensor safe to convert to a NumPy array.

reddit.com › r/pytorch › can we convert a torch tensor to a numpy array using gpu for faster performance?

r/pytorch on Reddit: Can we convert a torch tensor to a numpy array using GPU for faster performance?

May 12, 2023 -

Is it possible to convert a torch tensor to a numpy array using the GPU for faster performance? Currently, I am using

input = input.cpu().detach().numpy()

to convert the tensor to a numpy array on the CPU. Is there a way to utilize the GPU to perform this conversion instead, potentially saving time?

Top answer

1 of 5

7

No. When calling .cpu() you are performing a memcopy from device memory to host memory.

2 of 5

6

You're doing 3 operations there: Move to CPU, detach from the computational graph, and convert to numpy. Detaching and converting are extremely cheap, they take views not copies of the tensor itself so they just need to fiddle with metadata (creating a new Tensor or ndarray object respectively, backed by the existing data). Almost all of the time taken in this op is the copy from GPU to CPU, which obviously can't be accelerated because it's just a copy, no processing is happening. It's bandwidth bound, not compute. As a rule, you can accelerate math with the GPU. If you're not doing math, the GPU won't help you.

EDUCBA

educba.com › home › software development › software development tutorials › pytorch tutorial › pytorch tensor to numpy

PyTorch Tensor to NumPy | Complete Guide on PyTorch Tensor to NumPy

April 7, 2023 - PyTorch tensor can be converted to NumPy array using detach function in the code either with the help of CUDA or CPU. The data inside the tensor can be numerical or characters which represents an array structure inside the containers.

Call +917738666252

Address Unit no. 202, Jay Antariksh Bldg, Makwana Road, Marol, Andheri (East),, 400059, Mumbai

PyTorch

docs.pytorch.org › reference api › torch.tensor › torch.tensor.numpy

torch.Tensor.numpy — PyTorch 2.9 documentation

January 1, 2023 - If force is False (the default), the conversion is performed only if the tensor is on the CPU, does not require grad, does not have its conjugate bit set, and is a dtype and layout that NumPy supports. The returned ndarray and the tensor will share their storage, so changes to the tensor will be reflected in the ndarray and vice versa. If force is True this is equivalent to calling t.detach().cpu().resolve_conj().resolve_neg().numpy().

DataCamp

datacamp.com › doc › numpy › pytorch-tensors

NumPy to PyTorch Tensors

Here, we convert a PyTorch tensor that tracks gradients to a NumPy array, using `detach()` to ensure no gradient tracking.

Find elsewhere

Google Bing Mojeek

YouTube

youtube.com › watch

49 - PyTorch Tensor vs Numpy array & detach-method of PyTorch | Deep Learning | Neural Network - YouTube

06:09

🔥🐍 Checkout the MASSIVELY UPGRADED 2nd Edition of my Book (with 1300+ pages of Dense Python Knowledge) Covering 350+ Python 🐍 Core concepts🟠 Book Link - ...

Published February 20, 2022

PyTorch Forums

discuss.pytorch.org › autograd

Should it really be necessary to do var.detach().cpu().numpy()? - autograd - PyTorch Forums

January 24, 2019 - I have a CUDA variable that is part of a differentiable computational graph. I want to read out its value into numpy (say for plotting). If I do var.numpy() I get RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

PyTorch Forums

discuss.pytorch.org › t › runtimeerror-cant-call-numpy-on-tensor-that-requires-grad-use-tensor-detach-numpy-instead › 158743

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead - PyTorch Forums

August 9, 2022 - Hi All I have data from a dataloader which I pass to my model for prediction. I am trying to normalize my prediction within a certain range using T.tonumpy_denormalize, however I get this error RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Knowledge Transfer

androidkt.com › home

Knowledge Transfer -

December 6, 2022 - Shuffle the list before splitting else you won’t get all the classes in the three splits since these indices would be used by the Subset class to sample from the original dataset. Shuffling the elements of a tensor amounts to finding a permutation of its indices.

Python Guides

pythonguides.com › pytorch-tensor-to-numpy

Convert PyTorch Tensor To Numpy - Python Guides

June 16, 2025 - This is crucial because NumPy arrays don’t support GPU memory, and attempting to directly convert would result in a runtime error. When dealing with tensors that have gradients attached (like model parameters during training), you’ll want to use .detach().numpy():

GitHub

github.com › lululxvi › deepxde › issues › 1046

"RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead" when calculating boundary_normal · Issue #1046 · lululxvi/deepxde

November 25, 2022 - "RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead" when calculating boundary_normal#1046

Author lululxvi

CodeGenes

codegenes.net › blog › pytorch-detach-cpu-numpy

PyTorch `detach`, `.cpu()`, and `.numpy()`: A Comprehensive Guide — codegenes.net

Tensors can be stored on different ... for backpropagation. The detach() method is used to create a new tensor that has the same data as the original tensor but is detached from the computational graph....

EDUCBA

educba.com › home › software development › software development tutorials › pytorch tutorial › pytorch detach

PyTorch Detach | A Compelete Guide on PyTorch Detach

April 6, 2023 - Now, if we use detach, the tensor view will be differentiated from the following methods, and all the tracking operations will be stopped. If we need to track furthermore, we have to start a new class or method. We can also use detach().numpy() where the computational graph is broken directly, and thus the gradients can be calculated using PyTorch in the same program.

Call +917738666252

Address Unit no. 202, Jay Antariksh Bldg, Makwana Road, Marol, Andheri (East),, 400059, Mumbai

Stack Abuse

stackabuse.com › numpy-array-to-tensor-and-tensor-to-numpy-array-with-pytorch

Convert Numpy Array to Tensor and Tensor to Numpy Array with PyTorch

May 22, 2023 - This works very well, and you've got yourself a clean Numpy array. However, if your tensor requires you to calculate gradients for it as well (i.e. the requires_grad argument is set to True), this approach won't work anymore. You'll have to detach the underlying array from the tensor, and through detaching, you'll be pruning away the gradients:

Stack Overflow

stackoverflow.com › questions › 55466298 › pytorch-cant-call-numpy-on-variable-that-requires-grad-use-var-detach-num › 72423448

python - PyTorch: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead - Stack Overflow

Top answer

1 of 5

57

Error reproduced

import torch

tensor1 = torch.tensor([1.0,2.0],requires_grad=True)

print(tensor1)
print(type(tensor1))

tensor1 = tensor1.numpy()

print(tensor1)
print(type(tensor1))

which leads to the exact same error for the line tensor1 = tensor1.numpy():

tensor([1., 2.], requires_grad=True)
<class 'torch.Tensor'>
Traceback (most recent call last):
  File "/home/badScript.py", line 8, in <module>
    tensor1 = tensor1.numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Process finished with exit code 1

Generic solution

this was suggested to you in your error message, just replace var with your variable name

import torch

tensor1 = torch.tensor([1.0,2.0],requires_grad=True)

print(tensor1)
print(type(tensor1))

tensor1 = tensor1.detach().numpy()

print(tensor1)
print(type(tensor1))

which returns as expected

tensor([1., 2.], requires_grad=True)
<class 'torch.Tensor'>
[1. 2.]
<class 'numpy.ndarray'>

Process finished with exit code 0

Some explanation

You need to convert your tensor to another tensor that isn't requiring a gradient in addition to its actual value definition. This other tensor can be converted to a numpy array. Cf. this discuss.pytorch post. (I think, more precisely, that one needs to do that in order to get the actual tensor out of its pytorch Variable wrapper, cf. this other discuss.pytorch post).

2 of 5

33

I had the same error message but it was for drawing a scatter plot on matplotlib.

There is 2 steps I could get out of this error message :

import the fastai.basics library with : from fastai.basics import *
If you only use the torch library, remember to take off the requires_grad with :
```
with torch.no_grad():
    (your code)
```

Medium

medium.com › data-scientists-diary › converting-pytorch-tensors-to-numpy-arrays-793792ec43ea

Converting PyTorch Tensors to NumPy Arrays | by Amit Yadav | Data Scientist’s Diary | Medium

January 19, 2025 - Why This Matters: Detaching is essential when working with complex workflows that include both PyTorch and NumPy. If you try to convert directly, PyTorch throws an error because it’s still in “autograd mode.” By detaching, you preserve the tensor’s values but cut off its gradient chain, keeping both frameworks happy.