Simply create the list using list comprehension:
[[foo(m, f) for f in F] for m in M]
Related to pre-allocation: Pre-allocating a list of None
The thought of preallocating memory brings back trauma from when I had to learn C, but in a recent non-computing class that heavily uses Python I was told that preallocating lists is "best practices".
As an example, add the number c to every element of list a:
c=20 a=[1,2,3,4,5] b=[None] *len(a) for i in range(len(a)): # this was also how they iterated a lot of things b[i] = a[i] + c
Now I'm a comp sci student and I don't remember ever being told that this is best practice and I have never been penalised for not preallocating and I have never seen any of my friends preallocate lists either. From my understanding it's not "Pythonic" , but it can be more efficient when working with larger data sets as you don't have to constantly grow the list.
I argued to just use append (and a different iterator):
c=20 a=[1,2,3,4,5] b=[] for n in a: b.append(n+c)
The python taught is relatively introductory so they don't even mention list comprehension.
Thoughts? Because apparently they spent 30 minutes discussing whether or not to preallocate lists before settling on teaching us to preallocate lists.
To preallocate or not to preallocate lists in Python - Stack Overflow
python - What is the preferred way to preallocate NumPy arrays? - Stack Overflow
python - Pre-allocating a list of None - Stack Overflow
python - Should I preallocate a numpy array? - Stack Overflow
Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.
There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.
So
ar0 = np.linspace(10, 20, 16).reshape(4, 4)
is just fine if this comes closest to the ar0 you desire.
However, to make the last column all 1's, I think the preferred way would be to just say
ar0[:,-1]=1
Since the shape of ar0[:,-1] is (4,), the 1 is broadcasted to match this shape.
In cases where performance is important, np.empty and np.zeros appear to be the fastest ways to initialize numpy arrays.
Below are test results for each method and a few others. Values are in seconds.
>>> timeit("np.empty(1000000)",number=1000, globals=globals())
0.033749611208094166
>>> timeit("np.zeros(1000000)",number=1000, globals=globals())
0.03421245135849915
>>> timeit("np.arange(0,1000000,1)",number=1000, globals=globals())
1.2212416112155324
>>> timeit("np.ones(1000000)",number=1000, globals=globals())
2.2877375495381145
>>> timeit("np.linspace(0,1000000,1000000)",number=1000, globals=globals())
3.0824269766860652
When you append an item to a list, Python 'over-allocates', see the source-code of the list object. This means that for example when adding 1 item to a list of 8 items, it actually makes room for 8 new items, and uses only the first one of those. The next 7 appends are then 'for free'.
In many languages (e.g. old versions of Matlab, the newer JIT might be better) you are always told that you need to pre-allocate your vectors, since appending during a loop is very expensive. In the worst case, appending of a single item to a list of length n can cost O(n) time, since you might have to create a bigger list and copy all the existing items over. If you need to do this on every iteration, the overall cost of adding n items is O(n^2), ouch. Python's pre-allocation scheme spreads the cost of growing the array over many single appends (see amortized costs), effectively making the cost of a single append O(1) and the overall cost of adding n items O(n).
Additionally, the overhead of the rest of your Python code is usually so large, that the tiny speedup that can be obtained by pre-allocating is insignificant. So in most cases, simply forget about pre-allocating, unless your profiler tells you that appending to a list is a bottleneck.
The other answers show some profiling of the list preallocation itself, but this is useless. The only thing that matters is profiling your complete code, with all your calculations inside your loop, with and without pre-allocation. If my prediction is right, the difference is so small that the computation time you win is dwarfed by the time spent thinking about, writing and maintaining the extra lines to pre-allocate your list.
In between those two options the first one is clearly better as no Python for loop is involved.
>>> %timeit [None] * 100
1000000 loops, best of 3: 469 ns per loop
>>> %timeit [None for x in range(100)]
100000 loops, best of 3: 4.8 us per loop
Update:
And list.append has an O(1) complexity too, it might be a better choice than pre-creating list if you assign the list.append method to a variable.
>>> n = 10**3
>>> %%timeit
lis = [None]*n
for _ in range(n):
lis[_] = _
...
10000 loops, best of 3: 73.2 us per loop
>>> %%timeit
lis = []
for _ in range(n):
lis.append(_)
...
10000 loops, best of 3: 92.2 us per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
app(_)
...
10000 loops, best of 3: 59.4 us per loop
>>> n = 10**6
>>> %%timeit
lis = [None]*n
for _ in range(n):
lis[_] = _
...
10 loops, best of 3: 106 ms per loop
>>> %%timeit
lis = []
for _ in range(n):
lis.append(_)
...
10 loops, best of 3: 122 ms per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
app(_)
...
10 loops, best of 3: 91.8 ms per loop
Numpy arrays are fast, once created. However, creating an array is pretty expensive. Much more than, say, creating a python list.
In a case such as yours, where you create a new array again and again (in a for loop?), I would ALWAYS pre-allocate the array structure and then reuse it.
I can't comment on whether Python is smart enough to optimize this, but I would guess it's not :)
How big is your array and how frequent are calls to this method?
Yes, you need to preallocate large arrays. But if this will be efficient depends on how you use these arrays then.
This will cause several new allocations for intermediate results of computation:
self.temp = a * b + c
This will not (if self.x is preallocated):
numpy.multiply(a, b, out=self.x)
numpy.add(c, self.x, out=self.temp)
But for these cases (when you work with large arrays in not-trivial formulae) it is better to use numexpr or einsum for matrix calculations.
Warning: This answer is contested. See comments.
def doAppend( size=10000 ):
result = []
for i in range(size):
message= "some unique object %d" % ( i, )
result.append(message)
return result
def doAllocate( size=10000 ):
result=size*[None]
for i in range(size):
message= "some unique object %d" % ( i, )
result[i]= message
return result
Results. (evaluate each function 144 times and average the duration)
simple append 0.0102
pre-allocate 0.0098
Conclusion. It barely matters.
Premature optimization is the root of all evil.
Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:
l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
# baz
l[i] = bar
# qux
Perhaps you could avoid the list by using a generator instead:
def my_things():
while foo:
#baz
yield bar
#qux
for thing in my_things():
# do something with thing
This way, the list isn't every stored all in memory at all, merely generated as needed.
Maybe you should consider using NumPy. It seems like you're doing numerical work, which is what it's made for. This is the fastest so far, not including the import statement:
import numpy
Np = 80
zeroMatrix = numpy.zeros((Np, Np))
Times:
>python -m timeit -s "import numpy; Np = 80" "zeroMatrix = numpy.zeros((Np, Np))"
100000 loops, best of 3: 4.36 usec per loop
>python -m timeit -s "Np = 80" "zeroArray = [0]*Np" "zeroMatrix = [None] * Np" "for i in range(Np):" " zeroMatrix[i] = zeroArray[:]"
10000 loops, best of 3: 62.5 usec per loop
>python -m timeit -s "Np = 80" "zeroMatrix = [[0] * Np for i in range (Np)]"
10000 loops, best of 3: 77.5 usec per loop
>python -m timeit -s "Np = 80" "zeroMatrix = [[0 for _ in range(Np)] for _ in range(Np)]"
1000 loops, best of 3: 474 usec per loop
You could do this:
zeroMatrix = [[0] * Np for i in range(Np)]
Update: Well if we're going to make it into a race, I've found something faster (on my computer) than Omnifarious' method. This doesn't beat numpy of course; but this is all academic anyway right? I mean we're talking about microseconds here.
I think this works because it avoids append and avoids preallocating zeroMatrix.
zeroArray = [0] * Np
zeroMatrix = [zeroArray[:] for i in range(Np)]
My test results:
$ python -m timeit -s "Np = 80" "zeroMatrix = [[0] * Np for i in range(Np)]"
1000 loops, best of 3: 200 usec per loop
$ python -m timeit -s "Np = 80" "zeroArray = [0] * Np" "zeroMatrix = [None] * Np" "for i in range(Np):" " zeroMatrix[i] = zeroArray[:]"
10000 loops, best of 3: 171 usec per loop
$ python -m timeit -s "Np = 80" "zeroArray = [0] * Np" "zeroMatrix = [zeroArray[:] for i in range(Np)]"
10000 loops, best of 3: 165 usec per loop
I did a bit of googling and found this lovely article with some C code to do exactly what you're asking on Windows. Here's that C code translated to ctypes (written for readability):
import ctypes
import msvcrt
# https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfileinformationbyhandle
set_file_information = ctypes.windll.kernel32.SetFileInformationByHandle
class AllocationInfo(ctypes.Structure):
_fields_ = [('AllocationSize', ctypes.c_longlong)]
def allocate(file, length):
"""Tell the filesystem to preallocate `length` bytes on disk for the specified `file` without increasing the
file's length.
In other words, advise the filesystem that you intend to write at least `length` bytes to the file.
"""
allocation_info = AllocationInfo(length)
retval = set_file_information(ctypes.c_long(msvcrt.get_osfhandle(file.fileno())),
ctypes.c_long(5), # constant for FileAllocationInfo in the FILE_INFO_BY_HANDLE_CLASS enum
ctypes.pointer(allocation_info),
ctypes.sizeof(allocation_info)
)
if retval != 1:
raise OSError('SetFileInformationByHandle failed')
This will change the file's Size on disk: as shown in file explorer to the length you specify (plus a few kilobytes for metadata), but leave the Size: unchanged.
However, in the half hour I've spent googling, I've not found a way to do that on POSIX. fallocate() actually does the exact opposite of what you're after: it sets the file's apparent length to the length you give it, but allocates it as a sparse extent on the disk, so writing to multiple files simultaneously will still result in fragmentation. Ironic, isn't it, that Windows has a file management feature that POSIX lacks?
I'd love nothing more than to be proven wrong, but I don't think it's possible.
FILENAME = "somefile.bin"
SIZE = 4200000
with open(FILENAME, "wb") as file:
file.seek(SIZE - 1)
file.write(b"\0")
Advantages:
- Portable across all platforms.
- Very efficient if you'd be
mmaping (memory-mapping) the files to perform writes on them (viaMADV_SEQUENTIALif sequential access is needed).