In short:

  • torch.mm - performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)
  • torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
  • torch.matmul - matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).

Some details:

  1. torch.mm - performs a matrix multiplication without broadcasting

It expects two 2D tensors so n×m * m×p = n×p

From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:

This function does not broadcast. For broadcasting matrix products, see torch.matmul().
  1. torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)

Docs: https://pytorch.org/docs/stable/generated/torch.mul.html

torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:

import torch

a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)
tensor([[1., 1., 1.],
        [2., 2., 2.],
        [3., 3., 3.]])
tensor([[  1.,  10., 100.],
        [  1.,  10., 100.],
        [  1.,  10., 100.]])
tensor([[  1.,  10., 100.],
        [  2.,  20., 200.],
        [  3.,  30., 300.]])
  1. torch.matmul

It is better to check out the official documentation https://pytorch.org/docs/stable/generated/torch.matmul.html as it uses different modes depending on the input tensors. It may perform dot product, matrix-matrix product or batched matrix products with broadcasting.

As for your question regarding product of:

tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)

it is a batched version of a product. please check this simple example for understanding:

import torch

# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)

r2 = torch.stack((
    torch.matmul(a[0], b),
    torch.matmul(a[1], b),
    torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)

So it can be seen as a multiple operations stacked together across batch dimension.

Also it may be useful to read about broadcasting:

https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

Answer from u1234x1234 on Stack Overflow
🌐
PyTorch
docs.pytorch.org › reference api › torch.matmul
torch.matmul — PyTorch 2.10 documentation
January 1, 2023 - torch.matmul(input, other, *, out=None) → Tensor# Matrix product of two tensors. The behavior depends on the dimensionality of the tensors as follows: If both tensors are 1-dimensional, the dot product (scalar) is returned.
Top answer
1 of 2
44

In short:

  • torch.mm - performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)
  • torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
  • torch.matmul - matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).

Some details:

  1. torch.mm - performs a matrix multiplication without broadcasting

It expects two 2D tensors so n×m * m×p = n×p

From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:

This function does not broadcast. For broadcasting matrix products, see torch.matmul().
  1. torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)

Docs: https://pytorch.org/docs/stable/generated/torch.mul.html

torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:

import torch

a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)
tensor([[1., 1., 1.],
        [2., 2., 2.],
        [3., 3., 3.]])
tensor([[  1.,  10., 100.],
        [  1.,  10., 100.],
        [  1.,  10., 100.]])
tensor([[  1.,  10., 100.],
        [  2.,  20., 200.],
        [  3.,  30., 300.]])
  1. torch.matmul

It is better to check out the official documentation https://pytorch.org/docs/stable/generated/torch.matmul.html as it uses different modes depending on the input tensors. It may perform dot product, matrix-matrix product or batched matrix products with broadcasting.

As for your question regarding product of:

tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)

it is a batched version of a product. please check this simple example for understanding:

import torch

# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)

r2 = torch.stack((
    torch.matmul(a[0], b),
    torch.matmul(a[1], b),
    torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)

So it can be seen as a multiple operations stacked together across batch dimension.

Also it may be useful to read about broadcasting:

https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

2 of 2
5

I want to add the introduction of torch.bmm, which is batch matrix-matrix product.

torch.bmm(input,mat2,*,out=None)→Tensor

shape: (b×n×m),(b×m×p) -->(b×n×p)

Performs a batch matrix-matrix product of matrices stored in input and mat2. input and mat2 must be 3-D tensors each containing the same number of matrices.

This function does not broadcast.

Example

input = torch.randn(10, 3, 4)
mat2 = torch.randn(10, 4, 5)
res = torch.bmm(input, mat2)
res.size()  # torch.Size([10, 3, 5])
Discussions

python - How to multiply matrices in PyTorch? - Stack Overflow
Since this is accepted answer, I think you should include torch.matmul. It performs dot product for 1D arrays and matrix multiplication for 2D arrays. More on stackoverflow.com
🌐 stackoverflow.com
nn.Linear vs torch.matmul
I am wondering, why these outputs are different… my_data = torch.tensor([1,2,3], dtype=torch.float32).repeat(1000, 1) weights = torch.rand((3,2)) out = torch.matmul(my_data, weights) print(out) linear = nn.Linear(3, 2) output_linear = linear(my_data) print(output_linear) out is not close ... More on discuss.pytorch.org
🌐 discuss.pytorch.org
1
1
December 27, 2021
big difference between numpy.matmul and torch.matmul
🐛 Bug numpy.matmul and torch.matmul (after converting from numpy using torch.from_numpy) produces different results. The maximum absolute error is of the order 10^-4. To Reproduce Steps to reproduc... More on github.com
🌐 github.com
1
March 5, 2019
why torch.nn.linear op is much faster than torch.matmul even when they have same calculating complexity?
🐛 Describe the bug why linear op is so much faster than multiplication ? if I use multiplication(@ op), how could i speed up so that i could get almost same performance like torch.nn.linear? here i... More on github.com
🌐 github.com
8
November 17, 2023
🌐
Medium
medium.com › @satishjasthi › torch-mm-vs-torch-mul-vs-torch-matmul-657f70fd2e04
torch.mm vs torch.mul vs torch.matmul | by Satish Jasthi | Medium
May 23, 2024 - M1 = torch.rand(2) M2 = torch.rand(2,3) print(f"M1.shape: {M1.shape}") print(f"M2.shape: {M2.shape}") # M1 is converted from [2] -> [1,2] # so [1,2] x [2,3] -> [1,3] # we are comparing matmul with mm by manually reshaping M1 to [1,2] # as torch does it internally print((M1.matmul(M2)==M1.view(1,2).mm(M2)).all().sum()==1) print(f"M1.matmul(M2).shape: ",M1.matmul(M2).shape) # >> M1.shape: torch.Size([2]) # >> M2.shape: torch.Size([2, 3]) # >> tensor(True) # >> M1.matmul(M2).shape: torch.Size([3])
🌐
RDocumentation
rdocumentation.org › packages › torch › versions › 0.14.2 › topics › torch_matmul
torch_matmul function - RDocumentation
torch (version 0.14.2) Matmul · torch_matmul(self, other) self · (Tensor) the first tensor to be multiplied · other · (Tensor) the second tensor to be multiplied · Matrix product of two tensors. The behavior depends on the dimensionality of the tensors as follows: If both tensors are ...
Top answer
1 of 4
133

Use torch.mm:

torch.mm(a, b)

torch.dot() behaves differently to np.dot(). There's been some discussion about what would be desirable here. Specifically, torch.dot() treats both a and b as 1D vectors (irrespective of their original shape) and computes their inner product. The error is thrown because this behaviour makes your a a vector of length 6 and your b a vector of length 2; hence their inner product can't be computed. For matrix multiplication in PyTorch, use torch.mm(). Numpy's np.dot() in contrast is more flexible; it computes the inner product for 1D arrays and performs matrix multiplication for 2D arrays.

torch.matmul performs matrix multiplications if both arguments are 2D and computes their dot product if both arguments are 1D. For inputs of such dimensions, its behaviour is the same as np.dot. It also lets you do broadcasting or matrix x matrix, matrix x vector and vector x vector operations in batches.

# 1D inputs, same as torch.dot
a = torch.rand(n)
b = torch.rand(n)
torch.matmul(a, b) # torch.Size([])

# 2D inputs, same as torch.mm
a = torch.rand(m, k)
b = torch.rand(k, j)
torch.matmul(a, b) # torch.Size([m, j])
2 of 4
60

To perform a matrix (rank 2 tensor) multiplication, use any of the following equivalent ways:

AB = A.mm(B)

AB = torch.mm(A, B)

AB = torch.matmul(A, B)

AB = A @ B  # Python 3.5+ only

There are a few subtleties. From the PyTorch documentation:

torch.mm does not broadcast. For broadcasting matrix products, see torch.matmul().

For instance, you cannot multiply two 1-dimensional vectors with torch.mm, nor multiply batched matrices (rank 3). To this end, you should use the more versatile torch.matmul. For an extensive list of the broadcasting behaviours of torch.matmul, see the documentation.

For element-wise multiplication, you can simply do (if A and B have the same shape)

A * B  # element-wise matrix multiplication (Hadamard product)
Find elsewhere
🌐
DEV Community
dev.to › hyperkai › matmul-and-dot-in-pytorch-4e8p
matmul and dot in PyTorch - DEV Community
November 5, 2024 - matmul() can do dot, matrix-vector ... more D tensor of one or more elements: ... The 1st argument(input) with torch or using a tensor(Required-Type:tensor of int, float or complex)....
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-matrix-multiplication-using-pytorch
Python - Matrix multiplication using Pytorch - GeeksforGeeks
August 30, 2024 - import torch # both arguments 1D vec_1 = torch.tensor([3, 6, 2]) vec_2 = torch.tensor([4, 1, 9]) print("Single dimensional tensors :", torch.matmul(vec_1, vec_2)) # both arguments 2D mat_1 = torch.tensor([[1, 2, 3], [4, 3, 8], [1, 7, 2]]) mat_2 = torch.tensor([[2, 4, 1], [1, 3, 6], [2, 6, 5]]) out = torch.matmul(mat_1, mat_2) print("\n3x3 dimensional tensors :\n", out)
🌐
PyTorch
docs.pytorch.org › reference api › torch.bmm
torch.bmm — PyTorch 2.10 documentation
January 1, 2023 - >>> input = torch.randn(10, 3, 4) >>> mat2 = torch.randn(10, 4, 5) >>> res = torch.bmm(input, mat2) >>> res.size() torch.Size([10, 3, 5])
🌐
GitHub
github.com › pytorch › pytorch › issues › 17678
big difference between numpy.matmul and torch.matmul · Issue #17678 · pytorch/pytorch
March 5, 2019 - import numpy as np import torch torch.set_printoptions(precision=16) np.set_printoptions(precision=16) testJ = np.load('matrix.npy') t_testJ = torch.from_numpy(testJ) testPhi = np.matmul(testJ.T, testJ) t_testPhi = torch.matmul(torch.t(t_testJ), t_testJ) error = np.abs(testPhi - t_testPhi.data.numpy()).max() t_error = torch.abs(torch.from_numpy(testPhi) - t_testPhi).max() #print('Numpy matrix:') #print(testPhi) #print('Torch matrix:') #print(t_testPhi) print('torch version: %s\nnumpy version: %s\n' % (torch.__version__, np.__version__)) print('testJ type: %s\nt_testJ type: %s\ntestPhi type: %s\nt_testPhi type: %s\n' % (testJ.dtype, t_testJ.dtype, testPhi.dtype, t_testPhi.dtype)) print('Max absolute difference as numpy: %0.16f' % error) print('Max absolute difference as torch: %0.16f' % t_error)
Published   Mar 05, 2019
🌐
PyTorch
docs.pytorch.org › reference api › torch.tensor › torch.tensor.matmul
torch.Tensor.matmul — PyTorch 2.10 documentation
January 1, 2023 - Tensor.matmul(tensor2) → Tensor# See torch.matmul() On this page · Show Source · PyTorch Libraries · torchao · torchrec · torchft · TorchCodec · torchvision · ExecuTorch · PyTorch on XLA Devices · Access comprehensive developer documentation for PyTorch View Docs ·
🌐
GitHub
github.com › pytorch › pytorch › issues › 113934
why torch.nn.linear op is much faster than torch.matmul even when they have same calculating complexity? · Issue #113934 · pytorch/pytorch
November 17, 2023 - 🐛 Describe the bug why linear op is so much faster than multiplication ? if I use multiplication(@ op), how could i speed up so that i could get almost same performance like torch.nn.linear? here is my testcase: import torch import time ...
Published   Nov 17, 2023
🌐
YouTube
youtube.com › watch
PyTorch Matrix Multiplication Explained – torch.mm(), torch ...
In this in-depth PyTorch tutorial, we’ll cover all Matrix Multiplication operations and their variations. You’ll learn the difference between torch.mm() for ...
Published   August 5, 2025
🌐
Tenstorrent
docs.tenstorrent.com › tt-metal › v0.58.0 › ttnn › ttnn › tutorials › matmul.html
Matmul Operation — TT-NN documentation
Matmul Operation · View page source · Matrix Multiplication · Enable program cache · Configuration · Initialize tensors a and b with random values using torch · Matrix multiply tensor a and b · Inspect the layout of matrix multiplication output · Inspect the result of the matrix multiplication ·
🌐
PyTorch
docs.pytorch.org › reference api › torch.mm
torch.mm — PyTorch 2.10 documentation
January 1, 2023 - >>> mat1 = torch.randn(2, 3) >>> mat2 = torch.randn(3, 3) >>> torch.mm(mat1, mat2) tensor([[ 0.4851, 0.5037, -0.3633], [-0.0760, -3.6705, 2.4784]])
🌐
PyTorch Forums
discuss.pytorch.org › t › understanding-batch-multiplication-using-torch-matmul › 16882
Understanding batch multiplication using torch.matmul - PyTorch Forums
April 24, 2018 - The bullet point about batch matrix multiplication in the documentation of torch.matmul mentions the following statement: "The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if tensor1 is a (j×1×n×m) tensor and tensor2 is a (k×m×p) tensor, ...