Use numpy.linalg.norm:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:

Use numpy.linalg.norm:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:

Use scipy.spatial.distance.euclidean:
from scipy.spatial import distance
a = (1, 2, 3)
b = (4, 5, 6)
dst = distance.euclidean(a, b)
Computing Euclidean distance for numpy in python - Stack Overflow
python - In Numpy, find Euclidean distance between each pair from two arrays - Stack Overflow
python - How to calculate euclidean distance between pair of rows of a numpy array - Stack Overflow
How to calculate euclidean distance using NumPy? - Python - Data Science Dojo Discussions
There are much, much faster alternatives to using nested for loops for this. I'll show you two different approaches - the first will be a more general method that will introduce you to broadcasting and vectorization, and the second uses a more convenient scipy library function.
- The general way, using broadcasting & vectorization
One of the first things I'd suggest doing is switching to using np.array rather than np.matrix. Arrays are preferred for a number of reasons, most importantly because they can have >2 dimensions, and they make element-wise multiplication much less awkward.
import numpy as np
ncoord = np.array(ncoord)
With an array, we can eliminate the nested for loops by inserting a new singleton dimension and broadcasting the subtraction over it:
# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)
# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T
print(xydiff.shape)
# (20, 2, 20)
This is equivalent to looping over every pair of rows using nested for loops, but much, much faster!
xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
for jj in range(20):
for kk in range(2):
xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]
# check that these give the same result
print(np.all(xydiff == xydiff2))
# True
The rest we can also do using vectorized operations:
# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)
# finally we take the square root
D = np.sqrt(ssdiff)
The whole thing could be done in one line like this:
D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))
- The lazy way, using
pdist
It turns out that there's already a fast and convenient function for computing all pairwise distances: scipy.spatial.distance.pdist.
from scipy.spatial.distance import pdist, squareform
d = pdist(ncoord)
# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:
print(d.shape)
# (190,)
D2 = squareform(d)
print(D2.shape)
# (20, 20)
# check that the two methods are equivalent
print np.all(D == D2)
# True
for i in range(0, n):
for j in range(i+1, n):
c[i, j] = math.sqrt((ncoord[i, 0] - ncoord[j, 0])**2
+ (ncoord[i, 1] - ncoord[j, 1])**2)
Note: ncoord[i, j] is not the same as ncoord[i][j] for a Numpy matrix. This appears to be the source of confusion. If ncoord is a Numpy array then they will give the same result.
For a Numpy matrix, ncoord[i] returns the ith row of ncoord, which itself is a Numpy matrix object with shape 1 x 2 in your case. Therefore, ncoord[i][j] actually means: take the ith row of ncoord and take the jth row of that 1 x 2 matrix. This is where your indexing problems comes about when j > 0.
Regarding your comments on assigning to c[i][j] "working", it shouldn't. At least on my build of Numpy 1.9.1 it shouldn't work if your indices i and j iterates up to n.
As an aside, remember to add the transpose of the matrix c to itself.
It is recommended to use Numpy arrays instead of matrix. See this post.
If your coordinates are stored as a Numpy array, then pairwise distance can be computed as:
from scipy.spatial.distance import pdist
pairwise_distances = pdist(ncoord, metric="euclidean", p=2)
or simply
pairwise_distances = pdist(ncoord)
since the default metric is "euclidean", and default "p" is 2.
In a comment below I mistakenly mentioned that the result of pdist is a n x n matrix. To get a n x n matrix, you will need to do the following:
from scipy.spatial.distance import pdist, squareform
pairwise_distances = squareform(pdist(ncoord))
or
from scipy.spatial.distance import cdist
pairwise_distances = cdist(ncoord, ncoord)
I'm not seeing a built-in, but you could do it yourself pretty easily.
distances = (a-b)**2
distances = distances.sum(axis=-1)
distances = np.sqrt(distances)
hypot is another valid alternative
a, b = randn(10, 2), randn(10, 2)
ahat, bhat = (a - b).T
r = hypot(ahat, bhat)
Result of timeits between manual calculation and hypot:
Manual:
timeit sqrt(((a - b) ** 2).sum(-1))
100000 loops, best of 3: 10.3 µs per loop
Using hypot:
timeit hypot(ahat, bhat)
1000000 loops, best of 3: 1.3 µs per loop
Now how about some adult-sized arrays:
a, b = randn(1e7, 2), randn(1e7, 2)
ahat, bhat = (a - b).T
timeit -r10 -n3 hypot(ahat, bhat)
3 loops, best of 10: 208 ms per loop
timeit -r10 -n3 sqrt(((a - b) ** 2).sum(-1))
3 loops, best of 10: 224 ms per loop
Not much of a performance difference between the two methods. You can squeeze out a tiny bit more from the latter by avoiding pow:
d = a - b
timeit -r10 -n3 sqrt((d * d).sum(-1))
3 loops, best of 10: 184 ms per loop
In terms of something more "elegant" you could always use scikitlearn pairwise euclidean distance:
from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances(a,a)
having the same output as a single array.
array([[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 0. , 1.41421356, 2. ],
[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 2. , 1.41421356, 0. ]])
And for completeness, einsum is often referenced for distance calculations.
a = np.array([[1,0,1,0],
[1,1,0,0],
[1,0,1,0],
[0,0,1,1]])
b = a.reshape(a.shape[0], 1, a.shape[1])
np.sqrt(np.einsum('ijk, ijk->ij', a-b, a-b))
array([[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 0. , 1.41421356, 2. ],
[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 2. , 1.41421356, 0. ]])
There are functions for that in scipy.spatial.distance:
import numpy as np
from scipy.spatial.distance import pdist,squareform
a = np.random.randint(0,10,(3,4))
# pairwise dist, compressed
pdist(a.T)
# array([ 8.60232527, 8.77496439, 10.29563014, 6.70820393, 8.1240384 ,
# 3. ])
# same expanded to full table
squareform(pdist(a.T))
# array([[ 0. , 8.60232527, 8.77496439, 10.29563014],
# [ 8.60232527, 0. , 6.70820393, 8.1240384 ],
# [ 8.77496439, 6.70820393, 0. , 3. ],
# [10.29563014, 8.1240384 , 3. , 0. ]])
S.Vengat is right, you will have to use loops one way or another, however there is a library which can help you do this in 1 line:
import numpy as np
import scipy
data = np.array([[1,2,3],[4,5,6],[7,8,9]])
scipy.spatial.distance.cdist(data.T,data.T)
which gives:
array([[0. , 1.73205081, 3.46410162],
[1.73205081, 0. , 1.73205081],
[3.46410162, 1.73205081, 0. ]])
Your code for comparision:
import numpy as np
def all_column_euclidean(x):
output = np.zeros((len(x[0]),len(x[0])))
for i in range(len(x[0])):
for j in range(len(x[0])):
output[i][j] = np.sqrt(np.sum((x[:,i]-x[:,j])**2))
return output
data = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(all_column_euclidean(data))
Output:
[[0. 1.73205081 3.46410162]
[1.73205081 0. 1.73205081]
[3.46410162 1.73205081 0. ]]