In short:
torch.mm- performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)torch.mul- performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)torch.matmul- matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).
Some details:
torch.mm- performs a matrix multiplication without broadcasting
It expects two 2D tensors so n×m * m×p = n×p
From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:
This function does not broadcast. For broadcasting matrix products, see torch.matmul().
torch.mul- performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
Docs: https://pytorch.org/docs/stable/generated/torch.mul.html
torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:
import torch
a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)
tensor([[1., 1., 1.],
[2., 2., 2.],
[3., 3., 3.]])
tensor([[ 1., 10., 100.],
[ 1., 10., 100.],
[ 1., 10., 100.]])
tensor([[ 1., 10., 100.],
[ 2., 20., 200.],
[ 3., 30., 300.]])
torch.matmul
It is better to check out the official documentation https://pytorch.org/docs/stable/generated/torch.matmul.html as it uses different modes depending on the input tensors. It may perform dot product, matrix-matrix product or batched matrix products with broadcasting.
As for your question regarding product of:
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)
it is a batched version of a product. please check this simple example for understanding:
import torch
# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)
r2 = torch.stack((
torch.matmul(a[0], b),
torch.matmul(a[1], b),
torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)
So it can be seen as a multiple operations stacked together across batch dimension.
Also it may be useful to read about broadcasting:
https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics
Answer from u1234x1234 on Stack OverflowIn short:
torch.mm- performs a matrix multiplication without broadcasting - (2D tensor) by (2D tensor)torch.mul- performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)torch.matmul- matrix product with broadcasting - (Tensor) by (Tensor) with different behaviors depending on the tensor shapes (dot product, matrix product, batched matrix products).
Some details:
torch.mm- performs a matrix multiplication without broadcasting
It expects two 2D tensors so n×m * m×p = n×p
From the documentation https://pytorch.org/docs/stable/generated/torch.mm.html:
This function does not broadcast. For broadcasting matrix products, see torch.matmul().
torch.mul- performs a elementwise multiplication with broadcasting - (Tensor) by (Tensor or Number)
Docs: https://pytorch.org/docs/stable/generated/torch.mul.html
torch.mul does not perform a matrix multiplication. It broadcasts two tensors and performs an elementwise multiplication. So when you uses it with tensors 1x4 * 4x1 it will work similar to:
import torch
a = torch.FloatTensor([[1], [2], [3]])
b = torch.FloatTensor([[1, 10, 100]])
a, b = torch.broadcast_tensors(a, b)
print(a)
print(b)
print(a * b)
tensor([[1., 1., 1.],
[2., 2., 2.],
[3., 3., 3.]])
tensor([[ 1., 10., 100.],
[ 1., 10., 100.],
[ 1., 10., 100.]])
tensor([[ 1., 10., 100.],
[ 2., 20., 200.],
[ 3., 30., 300.]])
torch.matmul
It is better to check out the official documentation https://pytorch.org/docs/stable/generated/torch.matmul.html as it uses different modes depending on the input tensors. It may perform dot product, matrix-matrix product or batched matrix products with broadcasting.
As for your question regarding product of:
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)
it is a batched version of a product. please check this simple example for understanding:
import torch
# 3x1x3
a = torch.FloatTensor([[[1, 2, 3]], [[3, 4, 5]], [[6, 7, 8]]])
# 3
b = torch.FloatTensor([1, 10, 100])
r1 = torch.matmul(a, b)
r2 = torch.stack((
torch.matmul(a[0], b),
torch.matmul(a[1], b),
torch.matmul(a[2], b),
))
assert torch.allclose(r1, r2)
So it can be seen as a multiple operations stacked together across batch dimension.
Also it may be useful to read about broadcasting:
https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics
I want to add the introduction of torch.bmm, which is batch matrix-matrix product.
torch.bmm(input,mat2,*,out=None)→Tensor
shape: (b×n×m),(b×m×p) -->(b×n×p)
Performs a batch matrix-matrix product of matrices stored in input and mat2.
input and mat2 must be 3-D tensors each containing the same number of matrices.
This function does not broadcast.
Example
input = torch.randn(10, 3, 4)
mat2 = torch.randn(10, 4, 5)
res = torch.bmm(input, mat2)
res.size() # torch.Size([10, 3, 5])
python - How to multiply matrices in PyTorch? - Stack Overflow
nn.Linear vs torch.matmul
big difference between numpy.matmul and torch.matmul
why torch.nn.linear op is much faster than torch.matmul even when they have same calculating complexity?
Videos
Use torch.mm:
torch.mm(a, b)
torch.dot() behaves differently to np.dot(). There's been some discussion about what would be desirable here. Specifically, torch.dot() treats both a and b as 1D vectors (irrespective of their original shape) and computes their inner product. The error is thrown because this behaviour makes your a a vector of length 6 and your b a vector of length 2; hence their inner product can't be computed. For matrix multiplication in PyTorch, use torch.mm(). Numpy's np.dot() in contrast is more flexible; it computes the inner product for 1D arrays and performs matrix multiplication for 2D arrays.
torch.matmul performs matrix multiplications if both arguments are 2D and computes their dot product if both arguments are 1D. For inputs of such dimensions, its behaviour is the same as np.dot. It also lets you do broadcasting or matrix x matrix, matrix x vector and vector x vector operations in batches.
# 1D inputs, same as torch.dot
a = torch.rand(n)
b = torch.rand(n)
torch.matmul(a, b) # torch.Size([])
# 2D inputs, same as torch.mm
a = torch.rand(m, k)
b = torch.rand(k, j)
torch.matmul(a, b) # torch.Size([m, j])
To perform a matrix (rank 2 tensor) multiplication, use any of the following equivalent ways:
AB = A.mm(B)
AB = torch.mm(A, B)
AB = torch.matmul(A, B)
AB = A @ B # Python 3.5+ only
There are a few subtleties. From the PyTorch documentation:
torch.mmdoes not broadcast. For broadcasting matrix products, seetorch.matmul().
For instance, you cannot multiply two 1-dimensional vectors with torch.mm, nor multiply batched matrices (rank 3). To this end, you should use the more versatile torch.matmul. For an extensive list of the broadcasting behaviours of torch.matmul, see the documentation.
For element-wise multiplication, you can simply do (if A and B have the same shape)
A * B # element-wise matrix multiplication (Hadamard product)