To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array x from your question. The buffer assigned to x will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j (and multiplying with the size of an int to get an actual memory offset).
If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can't use i*y.shape[1]+j to calculate the offset into the array, because the data belonging to y is not consecutive in memory.
NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1] (and this already includes the factor for the size of an int):
x.strides
(16, 4)
When y is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y would just be equal to x.) The new array object will have a different shape then x and maybe a different starting offset into the buffer, but will share the strides with x (in this case at least):
y.shape
(2,2)
y.strides
(16, 4)
This way, computing the memory offset for y[i,j] will yield the correct result.
But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won't allow correct indexing if the original buffer is used for z. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.
This is covered in depth in the NumPy documentation on indexing.
Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:
x[[[1],[3]],[1,3]]
This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:
x[1::2, 1::2]
Answer from Sven Marnach on Stack Overflowpython - Slicing of a NumPy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)? - Stack Overflow
python - Understanding the slicing of NumPy array - Stack Overflow
python - Numpy array slicing using colons - Stack Overflow
Slicing multiple, non-contiguous rows and columns from a numpy array or matrix
Videos
To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array x from your question. The buffer assigned to x will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j (and multiplying with the size of an int to get an actual memory offset).
If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can't use i*y.shape[1]+j to calculate the offset into the array, because the data belonging to y is not consecutive in memory.
NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1] (and this already includes the factor for the size of an int):
x.strides
(16, 4)
When y is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y would just be equal to x.) The new array object will have a different shape then x and maybe a different starting offset into the buffer, but will share the strides with x (in this case at least):
y.shape
(2,2)
y.strides
(16, 4)
This way, computing the memory offset for y[i,j] will yield the correct result.
But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won't allow correct indexing if the original buffer is used for z. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.
This is covered in depth in the NumPy documentation on indexing.
Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:
x[[[1],[3]],[1,3]]
This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:
x[1::2, 1::2]
As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.
There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.
The ending indices (the 3's in 0:3 and 1:3) are exclusive, not inclusive, while the starting indices (0 and 1) are in fact inclusive. If the ending indices were inclusive, then the output would be as you expect. But because they're exclusive, you're actually only grabbing rows 0, 1, and 2, and columns 1 and 2. The output is the intersection of those, which is equivalent to the output you're seeing.
If you are trying to get the data you expect, you can do myNumpyArray[:, 1:]. The : simply grabs all the elements of the array (in your case, in the first dimension of the array), and the 1: grabs all the content of the array starting at index 1, ignoring the data in the 0th place.
This is a classic case of just needing to understand slice notation.
inside the brackets, you have the slice for each dimension:
arr[dim1_start:dim1_end, dim2_start, dim2_end]
For the above notation, the slice will include the elements starting at dimX_start, up to, and not including, dimX_end.
So, for what you wrote: myNumpyArray[0:3, 1:3]
you selected rows 0, 1, and 2 (not including 3) and columns 1 and 2 (not including 3)
I hope that helps explain your results.
For the result you were expecting, you would need something more like:
print(myNumpyArray[0:4, 1:4])
For more info on slicing, you might go to the numpy docs or look at a similar question posted a while back.
The commas in slicing are to separate the various dimensions you may have. In your first example you are reshaping the data to have 4 dimensions each of length 2. This may be a little difficult to visualize so if you start with a 2D structure it might make more sense:
>>> a = np.arange(16).reshape((4, 4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[0] # access the first "row" of data
array([0, 1, 2, 3])
>>> a[0, 2] # access the 3rd column (index 2) in the first row of the data
2
If you want to access multiple values using slicing you can use the colon to express a range:
>>> a[:, 1] # get the entire 2nd (index 1) column
array([[1, 5, 9, 13]])
>>> a[1:3, -1] # get the second and third elements from the last column
array([ 7, 11])
>>> a[1:3, 1:3] # get the data in the second and third rows and columns
array([[ 5, 6],
[ 9, 10]])
You can do steps too:
>>> a[::2, ::2] # get every other element (column-wise and row-wise)
array([[ 0, 2],
[ 8, 10]])
Hope that helps. Once that makes more sense you can look in to stuff like adding dimensions by using None or np.newaxis or using the ... ellipsis:
>>> a[:, None].shape
(4, 1, 4)
You can find more here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
It might pay to explore the shape and individual entries as we go along.
Let's start with
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
>>> a.shape
(16, )
This is a one-dimensional array of length 16.
Now let's try
>>> a = a.reshape(2,2,2,2)
>>> a.shape
(2, 2, 2, 2)
It's a multi-dimensional array with 4 dimensions.
Let's see the 0, 1 element:
>>> a[0, 1]
array([[5, 6],
[7, 8]])
Since there are two dimensions left, it's a matrix of two dimensions.
Now a[:, 1] says: take a[i, 1 for all possible values of i:
>>> a[:, 1]
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
It gives you an array where the first item is a[0, 1], and the second item is a[1, 1].
If I have an NxN matrix or array, is there an elegant way to get a subset of the rows and columns? For example:
a = np.array ([[1,2,3,4][5,6,7,8][9,10,11,12][13,14,15,16])
Would there be a way to slice out the 2nd and 4th rows and columns to arrive at [[6,8][10,12]]?
As it was not mentioned clearly enough (and i was looking for it too):
an equivalent to:
Copya = my_array[:, :, :, 8]
b = my_array[:, :, :, 2:7]
is:
Copya = my_array.take(indices=8, axis=3)
b = my_array.take(indices=range(2, 7), axis=3)
I think one way would be to use slice(None):
Copy>>> m = np.arange(2*3*5).reshape((2,3,5))
>>> axis, start, end = 2, 1, 3
>>> target = m[:, :, 1:3]
>>> target
array([[[ 1, 2],
[ 6, 7],
[11, 12]],
[[16, 17],
[21, 22],
[26, 27]]])
>>> slc = [slice(None)] * len(m.shape)
>>> slc[axis] = slice(start, end)
>>> np.allclose(m[slc], target)
True
I have a vague feeling I've used a function for this before, but I can't seem to find it now..
So,here I come again ๐คฃ
I don't get slicing in 2D..
In my lesson,I was taught that using this
d[1:2,1]
means the 2nd element from the last two rows,and 2nd element from 1st column should be sliced..but when I use it I get only one element.Did I do something wrong?Can some of you awesome people hook me up with an explanation?
Here's some code for your palates:
a=[[1,2,3],[4,5,6],[7,8,9]] import numpy as np d=np.array(a) d[1:2,1]
You can slice and insert a new axis in one single operation. For example, here's a 2D array:
>>> a = np.arange(1, 7).reshape(2, 3)
>>> a
array([[1, 2, 3],
[4, 5, 6]])
To slice out a single column (returning array of shape (2, 1)), slice with None as the third dimension:
>>> a[:, 1, None]
array([[2],
[5]])
To slice out a single row (returning array of shape (1, 3)), slice with None as the second dimension:
>>> a[0, None, :]
array([[1, 2, 3]])
Make the index a slice, list or array
X[[0],:]
X[0:1,4]
But there's nothing wrong with reshape other than the fact that it requires typing. It isn't slow. [None,:] is a nice short hand for it.
Use of a list index may be the shortest, but it does produce a copy (a plus or minus?) and is slower
For (100,100) integer array:
In [487]: timeit x[[50],:]
100000 loops, best of 3: 10.3 ยตs per loop # slowest
In [488]: timeit x[50:51,:]
100000 loops, best of 3: 2.24 ยตs per loop # slice indexing is fast
In [489]: timeit x[50,:].reshape(1,-1)
100000 loops, best of 3: 3.29 ยตs per loop # minimal time penalty
In [490]: timeit x[50,:][None,:]
100000 loops, best of 3: 3.55 ยตs per loop
In [543]: timeit x[None,50,:] # **best**
1000000 loops, best of 3: 1.76 ยตs per loop
One test for copy is to compare the data buffer pointer with the original.
In [492]: x.__array_interface__['data']
Out[492]: (175920456, False)
In [493]: x[50,:].__array_interface__['data']
Out[493]: (175940456, False)
In [494]: x[[50],:].__array_interface__['data']
Out[494]: (175871672, False) # different pointer
In [495]: x[50:51,:].__array_interface__['data']
Out[495]: (175940456, False)
In [496]: x[50,:][None,:].__array_interface__['data']
Out[496]: (175940456, False)