That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
Answer from Stephen Simmons on Stack OverflowVideos
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
This works:
a = [[1, 2, 3], [4, 5, 6]]
nd_a = np.array(a)
So this should work too:
nd_a = np.array([[x for x in y] for y in a])
To create a new array, it seems numpy.zeros is the way to go
import numpy as np
a = np.zeros(shape=(x, y))
You can also set a datatype to allocate it sensibly
>>> np.zeros(shape=(5,2), dtype=np.uint8)
array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]], dtype=uint8)
>>> np.zeros(shape=(5,2), dtype="datetime64[ns]")
array([['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000']],
dtype='datetime64[ns]')
See also
- How do I create an empty array/matrix in NumPy?
- np.full(size, 0) vs. np.zeros(size) vs. np.empty()
Could be one line, but this is more explicit:
x = np.multiply.accumulate( np.ones( 10 )*2)
x[0] = 0
OR
x = 2**np.arange(1,10)
x[0] = 0
You can use numpy.logspace to get log-spaced ranges. Use the base=N keyword argument to set the base of the exponent:
In [27]: np.logspace(0, 10, 11, base=2).astype(int)
Out[27]: array([ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024])
I like this method because the "logspace" function name makes it clear that I'm going for a range with log (as opposed to linear) spacing.