Brave Search

What's the difference between torch.mm, torch.matmul and torch.mul?

stackoverflow.com › questions › 73924697 › whats-the-difference-between-torch-mm-torch-matmul-and-torch-mul

In short:

torch.mm - performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)
torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
torch.matmul - matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).

Some details:

torch.mm - performs a matrix multiplication without broadcasting

It expects two 2D tensors so n×m * m×p = n×p

From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:

This function does not broadcast. For broadcasting matrix products, see torch.matmul().

torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)

Docs: https://pytorch.org/docs/stable/generated/torch.mul.html

torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:

import torch

a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)

tensor([[1., 1., 1.],
        [2., 2., 2.],
        [3., 3., 3.]])
tensor([[  1.,  10., 100.],
        [  1.,  10., 100.],
        [  1.,  10., 100.]])
tensor([[  1.,  10., 100.],
        [  2.,  20., 200.],
        [  3.,  30., 300.]])

torch.matmul

It is better to check out the official documentation https://pytorch.org/docs/stable/generated/torch.matmul.html as it uses different modes depending on the input tensors. It may perform dot product, matrix-matrix product or batched matrix products with broadcasting.

As for your question regarding product of:

tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)

it is a batched version of a product. please check this simple example for understanding:

import torch

# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)

r2 = torch.stack((
    torch.matmul(a[0], b),
    torch.matmul(a[1], b),
    torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)

So it can be seen as a multiple operations stacked together across batch dimension.

Also it may be useful to read about broadcasting:

https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

Answer from u1234x1234 on Stack Overflow

PyTorch

docs.pytorch.org › reference api › torch.matmul

torch.matmul — PyTorch 2.10 documentation

January 1, 2023 - torch.matmul(input, other, *, out=None) → Tensor# Matrix product of two tensors. The behavior depends on the dimensionality of the tensors as follows: If both tensors are 1-dimensional, the dot product (scalar) is returned.

Stack Overflow

stackoverflow.com › questions › 73924697 › whats-the-difference-between-torch-mm-torch-matmul-and-torch-mul

python 3.x - What's the difference between torch.mm, torch.matmul and torch.mul? - Stack Overflow

Top answer

1 of 2

In short:

torch.mm - performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)
torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
torch.matmul - matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).

Some details:

torch.mm - performs a matrix multiplication without broadcasting

It expects two 2D tensors so n×m * m×p = n×p

From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:

This function does not broadcast. For broadcasting matrix products, see torch.matmul().

torch.mul - performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)

Docs: https://pytorch.org/docs/stable/generated/torch.mul.html

torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:

import torch

a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)

tensor([[1., 1., 1.],
        [2., 2., 2.],
        [3., 3., 3.]])
tensor([[  1.,  10., 100.],
        [  1.,  10., 100.],
        [  1.,  10., 100.]])
tensor([[  1.,  10., 100.],
        [  2.,  20., 200.],
        [  3.,  30., 300.]])

torch.matmul

As for your question regarding product of:

tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)

it is a batched version of a product. please check this simple example for understanding:

import torch

# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)

r2 = torch.stack((
    torch.matmul(a[0], b),
    torch.matmul(a[1], b),
    torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)

So it can be seen as a multiple operations stacked together across batch dimension.

Also it may be useful to read about broadcasting:

https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

2 of 2

I want to add the introduction of torch.bmm, which is batch matrix-matrix product.

torch.bmm(input,mat2,*,out=None)→Tensor

shape: (b×n×m),(b×m×p) -->(b×n×p)

Performs a batch matrix-matrix product of matrices stored in input and mat2. input and mat2 must be 3-D tensors each containing the same number of matrices.

This function does not broadcast.

Example

input = torch.randn(10, 3, 4)
mat2 = torch.randn(10, 4, 5)
res = torch.bmm(input, mat2)
res.size()  # torch.Size([10, 3, 5])

Discussions

python - How to multiply matrices in PyTorch? - Stack Overflow

Since this is accepted answer, I think you should include torch.matmul. It performs dot product for 1D arrays and matrix multiplication for 2D arrays. More on stackoverflow.com

stackoverflow.com

nn.Linear vs torch.matmul

I am wondering, why these outputs are different… my_data = torch.tensor([1,2,3], dtype=torch.float32).repeat(1000, 1) weights = torch.rand((3,2)) out = torch.matmul(my_data, weights) print(out) linear = nn.Linear(3, 2) output_linear = linear(my_data) print(output_linear) out is not close ... More on discuss.pytorch.org

discuss.pytorch.org

December 27, 2021

big difference between numpy.matmul and torch.matmul

🐛 Bug numpy.matmul and torch.matmul (after converting from numpy using torch.from_numpy) produces different results. The maximum absolute error is of the order 10^-4. To Reproduce Steps to reproduc... More on github.com

github.com

March 5, 2019

why torch.nn.linear op is much faster than torch.matmul even when they have same calculating complexity?

🐛 Describe the bug why linear op is so much faster than multiplication ? if I use multiplication(@ op), how could i speed up so that i could get almost same performance like torch.nn.linear? here i... More on github.com

github.com

November 17, 2023

Videos