Brave Search

To preallocate or not to preallocate lists in Python

stackoverflow.com › questions › 22769666 › to-preallocate-or-not-to-preallocate-lists-in-python

Simply create the list using list comprehension:

[[foo(m, f) for f in F] for m in M]

Related to pre-allocation: Pre-allocating a list of None

Answer from Ashwini Chaudhary on Stack Overflow

Hacker News

news.ycombinator.com › item

Pre-Allocated Lists in Python | Hacker News

March 31, 2022 - In [4]: length = 10000 In [5]: %%timeit ...: #preallocating list ...: pre = [None] \* length ...: for i in range(length): ...: pre[i] = i ...: 346 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: %%timeit ...: pre = np.empty(length, dtype=int) ...: for i in ...

reddit.com › r/learnpython › is it really best practice to preallocate lists

r/learnpython on Reddit: Is it really best practice to preallocate lists

July 25, 2020 -

The thought of preallocating memory brings back trauma from when I had to learn C, but in a recent non-computing class that heavily uses Python I was told that preallocating lists is "best practices".

As an example, add the number c to every element of list a:

c=20
a=[1,2,3,4,5]
b=[None] *len(a) 

for i in range(len(a)): # this was also how they iterated a lot of things 
  b[i] = a[i] + c

Now I'm a comp sci student and I don't remember ever being told that this is best practice and I have never been penalised for not preallocating and I have never seen any of my friends preallocate lists either. From my understanding it's not "Pythonic" , but it can be more efficient when working with larger data sets as you don't have to constantly grow the list.

I argued to just use append (and a different iterator):

c=20
a=[1,2,3,4,5]
b=[] 

for n in a:  
  b.append(n+c)

The python taught is relatively introductory so they don't even mention list comprehension.

Thoughts? Because apparently they spent 30 minutes discussing whether or not to preallocate lists before settling on teaching us to preallocate lists.

Top answer

1 of 5

125

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

2 of 5

for i in range(len(a)): # this was also how they iterated a lot of things If the above is true their advice is essentially worthless IMO. In any case, they are wrong. Preallocating your lists is an unnecessary micro-optimization that isn't needed unless appending things to lists is actually identified as a significant bottleneck in your program (which I have literally never seen in real-life code). See this for an example of how silly it is: https://stackoverflow.com/questions/311775/python-create-a-list-with-initial-capacity

Discussions

To preallocate or not to preallocate lists in Python - Stack Overflow

When should and shouldn't I preallocate a list of lists in python? For example, I have a function that takes 2 lists and creates a lists of lists out of it. Quite like, but not exactly, matrix More on stackoverflow.com

stackoverflow.com

python - What is the preferred way to preallocate NumPy arrays? - Stack Overflow

I am new to NumPy/SciPy. From the documentation, it seems more efficient to preallocate a single array rather than call append/insert/concatenate. For example, to add a column of 1's to an array, i More on stackoverflow.com

stackoverflow.com

python - Pre-allocating a list of None - Stack Overflow

Additionally, the overhead of the rest of your Python code is usually so large, that the tiny speedup that can be obtained by pre-allocating is insignificant. So in most cases, simply forget about pre-allocating, unless your profiler tells you that appending to a list is a bottleneck. The other answers show some profiling of the list preallocation ... More on stackoverflow.com

stackoverflow.com

python - Should I preallocate a numpy array? - Stack Overflow

I have a class and its method. The method repeats many times during execution. This method uses a numpy array as a temporary buffer. I don't need to store values inside the buffer between method ca... More on stackoverflow.com

stackoverflow.com

Quora

quora.com › How-do-you-preallocate-space-for-lists-in-Python

How to preallocate space for lists in Python - Quora

If mutable placeholders are used ... for lists in Python means creating a list with a fixed length (filled with placeholders) so you can assign by index without repeated append reallocations....

Stack Overflow

stackoverflow.com › questions › 22769666 › to-preallocate-or-not-to-preallocate-lists-in-python

To preallocate or not to preallocate lists in Python - Stack Overflow

Top answer

1 of 1

Simply create the list using list comprehension:

[[foo(m, f) for f in F] for m in M]

Related to pre-allocation: Pre-allocating a list of None

Rednafi

rednafi.com › home › python › pre-allocated lists in python

Pre-allocated lists in Python | Redowan's Reflections

March 27, 2022 - Understand CPython list memory allocation: how lists store pointer references, grow dynamically, and when pre-allocation with [None]*n helps.

Stack Overflow

stackoverflow.com › questions › 3491802 › what-is-the-preferred-way-to-preallocate-numpy-arrays

python - What is the preferred way to preallocate NumPy arrays? - Stack Overflow

Top answer

1 of 3

Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.

There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.

ar0 = np.linspace(10, 20, 16).reshape(4, 4)

is just fine if this comes closest to the ar0 you desire.

However, to make the last column all 1's, I think the preferred way would be to just say

ar0[:,-1]=1

Since the shape of ar0[:,-1] is (4,), the 1 is broadcasted to match this shape.

2 of 3

In cases where performance is important, np.empty and np.zeros appear to be the fastest ways to initialize numpy arrays.

Below are test results for each method and a few others. Values are in seconds.

>>> timeit("np.empty(1000000)",number=1000, globals=globals())
0.033749611208094166

>>> timeit("np.zeros(1000000)",number=1000, globals=globals())
0.03421245135849915

>>> timeit("np.arange(0,1000000,1)",number=1000, globals=globals())
1.2212416112155324

>>> timeit("np.ones(1000000)",number=1000, globals=globals())
2.2877375495381145

>>> timeit("np.linspace(0,1000000,1000000)",number=1000, globals=globals())
3.0824269766860652

Python

docs.python.org › 3 › c-api › memory.html

Memory Management — Python 3.14.4 documentation

Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manager. The Python memory manager has different components which deal with various dynamic storage management aspects, like sharing, segmentation, preallocation or caching.

Statistics Globe

statisticsglobe.com › home › python programming language for statistics & data science › preallocate list in python (3 examples)

Preallocate Python List (3 Examples) | Create with Initial Capacity

July 5, 2023 - If you need to preallocate a list with a specific data type, you can use the array module from the Python standard library.

Find elsewhere

Google Bing Mojeek

Python

bugs.python.org › issue36551

Issue 36551: Optimize list comprehensions with preallocate size and protect against overflow - Python tracker

This issue tracker has been migrated to GitHub, and is currently read-only. For more information, see the GitHub FAQs in the Python's Developer Guide · This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80732

Stack Overflow

stackoverflow.com › questions › 22225666 › pre-allocating-a-list-of-none

python - Pre-allocating a list of None - Stack Overflow

Top answer

1 of 3

When you append an item to a list, Python 'over-allocates', see the source-code of the list object. This means that for example when adding 1 item to a list of 8 items, it actually makes room for 8 new items, and uses only the first one of those. The next 7 appends are then 'for free'.

In many languages (e.g. old versions of Matlab, the newer JIT might be better) you are always told that you need to pre-allocate your vectors, since appending during a loop is very expensive. In the worst case, appending of a single item to a list of length n can cost O(n) time, since you might have to create a bigger list and copy all the existing items over. If you need to do this on every iteration, the overall cost of adding n items is O(n^2), ouch. Python's pre-allocation scheme spreads the cost of growing the array over many single appends (see amortized costs), effectively making the cost of a single append O(1) and the overall cost of adding n items O(n).

The other answers show some profiling of the list preallocation itself, but this is useless. The only thing that matters is profiling your complete code, with all your calculations inside your loop, with and without pre-allocation. If my prediction is right, the difference is so small that the computation time you win is dwarfed by the time spent thinking about, writing and maintaining the extra lines to pre-allocate your list.

2 of 3

In between those two options the first one is clearly better as no Python for loop is involved.

>>> %timeit [None] * 100
1000000 loops, best of 3: 469 ns per loop
>>> %timeit [None for x in range(100)] 
100000 loops, best of 3: 4.8 us per loop

Update:

And list.append has an O(1) complexity too, it might be a better choice than pre-creating list if you assign the list.append method to a variable.

>>> n = 10**3
>>> %%timeit
lis = [None]*n           
for _ in range(n):
    lis[_] = _
... 
10000 loops, best of 3: 73.2 us per loop
>>> %%timeit
lis = []                 
for _ in range(n):
    lis.append(_)
... 
10000 loops, best of 3: 92.2 us per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10000 loops, best of 3: 59.4 us per loop

>>> n = 10**6
>>> %%timeit
lis = [None]*n
for _ in range(n):
    lis[_] = _
... 
10 loops, best of 3: 106 ms per loop
>>> %%timeit
lis = []      
for _ in range(n):
    lis.append(_)
... 
10 loops, best of 3: 122 ms per loop
>>> %%timeit
lis = [];app = lis.append
for _ in range(n):
    app(_)
... 
10 loops, best of 3: 91.8 ms per loop

Stack Overflow

stackoverflow.com › questions › 33333683 › should-i-preallocate-a-numpy-array

python - Should I preallocate a numpy array? - Stack Overflow

Top answer

1 of 2

Numpy arrays are fast, once created. However, creating an array is pretty expensive. Much more than, say, creating a python list.

In a case such as yours, where you create a new array again and again (in a for loop?), I would ALWAYS pre-allocate the array structure and then reuse it.

I can't comment on whether Python is smart enough to optimize this, but I would guess it's not :)

How big is your array and how frequent are calls to this method?

2 of 2

Yes, you need to preallocate large arrays. But if this will be efficient depends on how you use these arrays then.

This will cause several new allocations for intermediate results of computation:

self.temp = a * b + c

This will not (if self.x is preallocated):

numpy.multiply(a, b, out=self.x)
numpy.add(c, self.x, out=self.temp)

But for these cases (when you work with large arrays in not-trivial formulae) it is better to use numexpr or einsum for matrix calculations.

Stack Overflow

stackoverflow.com › questions › 311775 › create-a-list-with-initial-capacity-in-python

Create a list with initial capacity in Python - Stack Overflow

Top answer

1 of 11

118

Warning: This answer is contested. See comments.

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102
pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.

2 of 11

Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

This way, the list isn't every stored all in memory at all, merely generated as needed.

MicroPython

docs.micropython.org › en › v1.9 › unix › reference › speed_python.html

Maximising Python Speed — MicroPython 1.9 documentation

May 26, 2017 - Nonetheless, memoryview is indispensable for advanced preallocated buffer management. .readinto() method discussed above puts data at the beginning of buffer and fills in entire buffer. What if you need to put data in the middle of existing buffer? Just create a memoryview into the needed section of buffer and pass it to .readinto(). This is a process known as profiling and is covered in textbooks and (for standard Python...

IncludeHelp

includehelp.com › python › preferred-way-to-preallocate-numpy-arrays.aspx

Preferred way to preallocate NumPy arrays

When we resize an array by either append(), insert(), concatenate(), or resize(), there may be a need for copying the array to a larger block of memory and that is why preallocation is preferred over resizing.

DEV Community

dev.to › srbhr › understanding-lists-in-python-an-in-depth-overview-3aak

Understanding Lists in Python: An In-Depth Overview - DEV Community

July 22, 2023 - This tells us about where we can use Python Lists and where we should think of alternatives. Preallocate List Space: If you know how many items your list will hold, preallocate space for it using the [None] * n syntax.

GitHub

gist.github.com › richbai90 › 808179585438767de960c6759f68154f

A guide for memory optimization in machine learning · GitHub

By default, python treats all numerical values as float64, which requires 8 bytes of memory per number. It may be bennefitical to specify the data type of the array if we know that our value will require less precision. This can be done by passing the dtype argument to the np.zeros function.

Stack Overflow

stackoverflow.com › questions › 5347108 › how-to-preallocate-a-list-of-lists

python - How to preallocate a list of lists? - Stack Overflow

Top answer

1 of 4

Maybe you should consider using NumPy. It seems like you're doing numerical work, which is what it's made for. This is the fastest so far, not including the import statement:

import numpy
Np = 80
zeroMatrix = numpy.zeros((Np, Np))

Times:

>python -m timeit -s "import numpy; Np = 80" "zeroMatrix = numpy.zeros((Np, Np))"
100000 loops, best of 3: 4.36 usec per loop

>python -m timeit -s "Np = 80" "zeroArray = [0]*Np" "zeroMatrix = [None] * Np" "for i in range(Np):" "  zeroMatrix[i] = zeroArray[:]"
10000 loops, best of 3: 62.5 usec per loop

>python -m timeit -s "Np = 80" "zeroMatrix = [[0] * Np for i in range (Np)]"
10000 loops, best of 3: 77.5 usec per loop

>python -m timeit -s "Np = 80" "zeroMatrix = [[0 for _ in range(Np)] for _ in range(Np)]"
1000 loops, best of 3: 474 usec per loop

2 of 4

You could do this:

zeroMatrix = [[0] * Np for i in range(Np)]

Update: Well if we're going to make it into a race, I've found something faster (on my computer) than Omnifarious' method. This doesn't beat numpy of course; but this is all academic anyway right? I mean we're talking about microseconds here.

I think this works because it avoids append and avoids preallocating zeroMatrix.

zeroArray = [0] * Np
zeroMatrix = [zeroArray[:] for i in range(Np)]

My test results:

$ python -m timeit -s "Np = 80" "zeroMatrix = [[0] * Np for i in range(Np)]"
1000 loops, best of 3: 200 usec per loop
$ python -m timeit -s "Np = 80" "zeroArray = [0] * Np" "zeroMatrix = [None] * Np" "for i in range(Np):" "    zeroMatrix[i] = zeroArray[:]"
10000 loops, best of 3: 171 usec per loop
$ python -m timeit -s "Np = 80" "zeroArray = [0] * Np" "zeroMatrix = [zeroArray[:] for i in range(Np)]"
10000 loops, best of 3: 165 usec per loop

sqlpey

sqlpey.com › python › top-8-ways-to-preallocate-a-list-in-python

Top 8 Ways to Preallocate a List in Python for Optimal Performance

November 6, 2024 - import time def append_items(size): result = [] for i in range(size): result.append(i) def allocate_items(size): result = [None] * size for i in range(size): result[i] = i def time_function(func, size): start_time = time.time() func(size) return time.time() - start_time size = 100000 append_time = time_function(append_items, size) allocate_time = time_function(allocate_items, size) print(f"Appending time: {append_time:.5f} seconds") print(f"Preallocating time: {allocate_time:.5f} seconds") Using a list comprehension provides a more Pythonic and readable way to create lists:

Python

mail.python.org › pipermail › python-ideas › 2013-March › 019809.html

[Python-ideas] Length hinting and preallocation for container types

May 3, 2013 - > > A new context manager aids users with preallocation and shrinking: > > class LengthHint: > def __init__(self, container, hint): > self.container = container > self.hint = hint > self.instance = None > > def __enter__(self): > self.instance = self.container() > self.instance.__preallocate__(self.hint) > return self.instance > > def __exit__(self, exc_type, exc_val, exc_tb): > self.instance.__shrink__() > > > with LengthHint(list, 200) as lst: > # lst has 200 ob_item slots but len(lst) == 0 > for i in range(100): > lst.append(i*i) > # __exit__ shrinks ob_item to 100 > The real problem is that this code is not idiomatic Python, especially if you want it to be reasonably fast: lst = [] for i in range(100): lst.append(i*i) Why not: lst = [i*i for i in range(100)] If the "append" pattern is complex, just "preallocate" like this: lst = [0] * 100 And then fill it.

Stack Overflow

stackoverflow.com › questions › 63275794 › preallocate-disk-space-for-a-file-in-python-without-changing-its-size

Preallocate disk space for a file in Python without changing its size - Stack Overflow

Top answer

1 of 2

I did a bit of googling and found this lovely article with some C code to do exactly what you're asking on Windows. Here's that C code translated to ctypes (written for readability):

    import ctypes
    import msvcrt
    # https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfileinformationbyhandle
    set_file_information = ctypes.windll.kernel32.SetFileInformationByHandle

    class AllocationInfo(ctypes.Structure):
        _fields_ = [('AllocationSize', ctypes.c_longlong)]
    
    def allocate(file, length):
        """Tell the filesystem to preallocate `length` bytes on disk for the specified `file` without increasing the
        file's length.
        In other words, advise the filesystem that you intend to write at least `length` bytes to the file.
        """
        allocation_info = AllocationInfo(length)
        retval = set_file_information(ctypes.c_long(msvcrt.get_osfhandle(file.fileno())),
                                      ctypes.c_long(5),  # constant for FileAllocationInfo in the FILE_INFO_BY_HANDLE_CLASS enum
                                      ctypes.pointer(allocation_info),
                                      ctypes.sizeof(allocation_info)
                                      )
        if retval != 1:
            raise OSError('SetFileInformationByHandle failed')

This will change the file's Size on disk: as shown in file explorer to the length you specify (plus a few kilobytes for metadata), but leave the Size: unchanged.

However, in the half hour I've spent googling, I've not found a way to do that on POSIX. fallocate() actually does the exact opposite of what you're after: it sets the file's apparent length to the length you give it, but allocates it as a sparse extent on the disk, so writing to multiple files simultaneously will still result in fragmentation. Ironic, isn't it, that Windows has a file management feature that POSIX lacks?

I'd love nothing more than to be proven wrong, but I don't think it's possible.

2 of 2

FILENAME = "somefile.bin"
SIZE = 4200000

with open(FILENAME, "wb") as file:
    file.seek(SIZE - 1)
    file.write(b"\0")

Advantages:

Portable across all platforms.
Very efficient if you'd be mmaping (memory-mapping) the files to perform writes on them (via MADV_SEQUENTIAL if sequential access is needed).