Factsheet
space
space
Videos
A norm is a function that takes a vector as an input and returns a scalar value that can be interpreted as the "size", "length" or "magnitude" of that vector. More formally, norms are defined as having the following mathematical properties:
- They scale multiplicatively, i.e. Norm(aยทv) = |a|ยทNorm(v) for any scalar a
- They satisfy the triangle inequality, i.e. Norm(u + v) โค Norm(u) + Norm(v)
- The norm of a vector is zero if and only if it is the zero vector, i.e. Norm(v) = 0 โ v = 0
The Euclidean norm (also known as the Lยฒ norm) is just one of many different norms - there is also the max norm, the Manhattan norm etc. The Lยฒ norm of a single vector is equivalent to the Euclidean distance from that point to the origin, and the Lยฒ norm of the difference between two vectors is equivalent to the Euclidean distance between the two points.
As @nobar's answer says, np.linalg.norm(x - y, ord=2) (or just np.linalg.norm(x - y)) will give you Euclidean distance between the vectors x and y.
Since you want to compute the Euclidean distance between a[1, :] and every other row in a, you could do this a lot faster by eliminating the for loop and broadcasting over the rows of a:
dist = np.linalg.norm(a[1:2] - a, axis=1)
It's also easy to compute the Euclidean distance yourself using broadcasting:
dist = np.sqrt(((a[1:2] - a) ** 2).sum(1))
The fastest method is probably scipy.spatial.distance.cdist:
from scipy.spatial.distance import cdist
dist = cdist(a[1:2], a)[0]
Some timings for a (1000, 1000) array:
a = np.random.randn(1000, 1000)
%timeit np.linalg.norm(a[1:2] - a, axis=1)
# 100 loops, best of 3: 5.43 ms per loop
%timeit np.sqrt(((a[1:2] - a) ** 2).sum(1))
# 100 loops, best of 3: 5.5 ms per loop
%timeit cdist(a[1:2], a)[0]
# 1000 loops, best of 3: 1.38 ms per loop
# check that all 3 methods return the same result
d1 = np.linalg.norm(a[1:2] - a, axis=1)
d2 = np.sqrt(((a[1:2] - a) ** 2).sum(1))
d3 = cdist(a[1:2], a)[0]
assert np.allclose(d1, d2) and np.allclose(d1, d3)
The concept of a "norm" is a generalized idea in mathematics which, when applied to vectors (or vector differences), broadly represents some measure of length. There are various different approaches to computing a norm, but the one called Euclidean distance is called the "2-norm" and is based on applying an exponent of 2 (the "square"), and after summing applying an exponent of 1/2 (the "square root").
It's a bit cryptic in the docs, but you get Euclidean distance between two vectors by setting the parameter ord=2.
sum(abs(x)**ord)**(1./ord)
becomes sqrt(sum(x**2)).
Note: as pointed out by @Holt, the default value is ord=None, which is documented to compute the "2-norm" for vectors. This is, therefore, equivalent to ord=2 (Euclidean distance).
Since I don't have a great amount of experience in math or the concepts of deep learning, I was often confused whether the 2 simply meant, in conjunction with the double bars, "apply the L2 norm to the terms within, i.e. square each of them and then sum the result" or if the 2 was, itself, a squaring of whatever the double brackets meant on their own.
I have never seen an author disambiguate the norm delimiters $\lVert\quad\rVert$ through the use of a superscript. In analysis, such notation would be incredibly confusing, since we frequently need to establish inequalities among norms of vectors raised to some power.
Also, an $L^2$ norm of a vector is the square root of the sum of the absolute squares of its components: $$\lVert x\rVert_2=\sqrt{\sum_{i=1}^n\lvert x_i\rvert^2}\text{;}$$ consequently, $$\lVert x\rVert^2_2=\sum_{i=1}^n\lvert x_i\rvert^2\text{.}$$
Just to add to this: since OP mentioned "concepts of deep learning", I'm guessing that the expressions that contain these norms appear in loss functions. Usually, regressive loss functions have a square because they have nicer derivatives for gradient descent.
For example, with a regularisation term added, you would see something like $$ L = \frac{1}{2}\sum_{i=1}^N (y_i - f(\vec x_i))^2 + \frac{\lambda}{2}||\vec w||^2 $$ with $f(\vec x) = \vec w \cdot \vec x$. Lots of squares here, but differentiating to any $w_j$, we have $$ \frac{\partial L}{\partial w_j} = \sum_{i=1}^N (f(\vec x_i) - y_i) x_j + \lambda w_j $$ which would have been a lot less nice without the squares. Other norms are sometimes also used (e.g. $||\vec w||_1$ for L1-regularisation), but the convention in machine learning is always that $||\vec w||$ means $||\vec w||_2$ by default.
To compute the 0-, 1-, and 2-norm you can either use torch.linalg.norm, providing the ord argument (0, 1, and 2 respectively). Or directly on the tensor: Tensor.norm, with the p argument. Here are the three variants: manually computed, with torch.linalg.norm, and with Tensor.norm.
0-norm
>>> x.norm(dim=1, p=0) >>> torch.linalg.norm(x, dim=1, ord=0) >>> x.ne(0).sum(dim=1)1-norm
>>> x.norm(dim=1, p=1) >>> torch.linalg.norm(x, dim=1, ord=1) >>> x.abs().sum(dim=1)2-norm
>>> x.norm(dim=1, p=2) >>> torch.linalg.norm(x, dim=1, ord=2) >>> x.pow(2).sum(dim=1).sqrt()
To calculate norm of different order, you just have to pass in a ord argument with your desiered order. For example:
torch.linalg.norm(t, dim=1, ord = 0)should work for norm.torch.linalg.norm(t, dim=1, ord = 1)should work for 1-Norm.torch.linalg.norm(t, dim=1, ord = 2)should work for 2-Norm.
And so on.
Use numpy.linalg.norm:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:

Use scipy.spatial.distance.euclidean:
from scipy.spatial import distance
a = (1, 2, 3)
b = (4, 5, 6)
dst = distance.euclidean(a, b)
The Euclidean distance formula finds the distance between any two points in Euclidean space.
A point in Euclidean space is also called a Euclidean vector.
You can use the Euclidean distance formula to calculate the distance between vectors of two different lengths.
For vectors of different dimension, the same principle applies.
Suppose a vector of lower dimension also exists in the higher dimensional space. You can then set all of the missing components in the lower dimensional vector to 0 so that both vectors have the same dimension. You would then use any of the mentioned distance formulas for computing the distance.
For example, consider a 2-dimensional vector A in Rยฒ with components (a1,a2), and a 3-dimensional vector B in Rยณ with components (b1,b2,b3).
To express A in Rยณ, you would set its components to (a1,a2,0). Then, the Euclidean distance d between A and B can be found using the formula:
dยฒ = (b1 - a1)ยฒ + (b2 - a2)ยฒ + (b3 - 0)ยฒ
d = sqrt((b1 - a1)ยฒ + (b2 - a2)ยฒ + b3ยฒ)
For your particular case, the components will be either 0 or 1, so all differences will be -1, 0, or 1. The squared differences will then only be 0 or 1.
If you're using integers or individual bits to represent the components, you can use simple bitwise operations instead of some arithmetic (^ means XOR or exclusive or):
d = sqrt(b1 ^ a1 + b2 ^ a2 + ... + b(n-1) ^ a(n-1) + b(n) ^ a(n))
And we're assuming the trailing components of A are 0, so the final formula will be:
d = sqrt(b1 ^ a1 + b2 ^ a2 + ... + b(n-1) + b(n))
There is no unique definition of distance if you mix vectors of differing number of elements ("length", "dimensionality"); usually, people just map vectors of one space to the other.
There are many ways to do these mappings:
- Fill up with zeroes. Say, if you have a car (2D coordinate) and need to compute its distance to an airplane (3D coordinate), this effectively places the car at sea level and does a 3D distance.
- Alternatively, you can drop the z value of the airplane and do a 2D distance.
- Look up the missing values somewhere. With the car-airplane example, to get a 3D coordinate for the car, you'd fire up your geo database and look up heights from longitude/latitude.
These are just the most common possibilities one could come up with; I'm sure there are more.
Choose the one that makes most sense to your application, there is no single "right" way to do it.