From the docs:
list.remove(x) Remove the first item from the list whose value is x. It is an error if there is no such item.
Without going into the details of the implementation, the item to remove can be anywhere in the list. A linear scan time is necessary in order to find the item before it can be removed. Once you find the index of the item to remove, you need to shift all the elements down by one index. In any case there's index amount of traversal and size - index amount of shifting involved. Therefore the removal time is equivalent to traversing the entire list: O(n).
You can find the source here: https://hg.python.org/cpython/file/tip/Objects/listobject.c#l2197 (also look for list_ass_slice(..)).
However, a set is different. It uses the hash value of the object being stored to locate it in its buckets. On an average, locating of objects using hash value is almost constant time. Note that it might not always be constant time where there's hash collision and a further search is required. But assuming a good hash function, it usually is.
UPDATE: I must thank Stefan Pochmann for pointing out the mistake.
Answer from UltraInstinct on Stack Overflowalgorithm - Why does Python take O(n) time to remove the first element from a list? - Stack Overflow
Why is removing elements from a list so slow, and is there a faster way?
I was surprised at how slow list.pop() is! And list.remove() is even many times slower
python - Speed of del vs remove on list - Stack Overflow
The pointer to where the list really starts must be retained for the purpose of freeing the memory appropriately.
Indeed, remove(0) could be made faster by having a second pointer which is increased in this case. And if an .add(0, x) happens afterwards, this could be made faster by decrementing this "data start timer" as long as it is bigger than the "memory start timer".
But all other operations, i. e. insertions and deletions to other indexes, would still be O(n), so that wouldn't change much.
Just know what your operations will be and thus which data structure to pick.
Python list is actually an array. deque is a real linked list. It is Python's fault for using the wrong term (for which I do not have an explanation). O(n) for insertion and deletion is normal for arrays (as following elements need to be shifted up or down), which is a tradeoff for the O(1) speed for get and set. Linked lists make a similar tradeoff in the opposite direction: O(1) for operations at ends, but O(n) for any access in the middle.
I was trying to write a simple application, which is ao supposed to filter a list of words down to a list of words of a certain length. For that I could either remove the words of the wrong length, or create a new list of words with the correct length.
I had a list of around 58000 words, and wanted to filter out all the 6 letter words, which are around 6900.
with open('words.txt') as f:
words = f.readlines()
for i in range(len(words)):
words[i] = words[i].strip()
length = int(input("Desired word length "))
for i in reversed(words):
if len(i) != length:
words.remove(i)This took 22 seconds.
Another way is to just create a new list with words of the correct length. I did this as follows:
with open('words.txt') as f:
words = f.readlines()
for i in range(len(words)):
words[i] = words[i].strip()
length = int(input("Desired word length "))
clw = []
for i in words:
if len(i) == length:
clw.append(i)This only took 0.03 seconds. How can it be that creating a list of 6900 words takes 0.03 seconds, but removing 51100 words takes 22? It's only 7 times as many words, but takes 700 times as long. And is there a better and faster way to quickly remove list elements?
I know there is a list.clear(), I'm just sharing that I didn't expect that using list.pop() and list.remove() specifically could slow down the program that much.
li = list(range(500000))
Creating a list is quick.
So we are going to test out pop/remove specific values. For the purpose of this "benchmark", we are going to remove all elements from the list:
while (li):
li.pop(0)
It took 74.735 seconds to pop all the elements! It's ridiculously long.
I KNOW it would have been much faster if I even had used li.pop() without the index or maybe used filter function, list comprehension with conditional or whatever
But that's what I'm trying to show, how slow it is to remove certain list items specifically using pop and remove methods.
And li.remove(), which always requires a specified value to remove, is even worse than pop!
for num in li:
li.remove(num)This one took me 303.268 seconds to complete. How crazy it is.
I've been having fun with abstract data structures. Implemented linked lists and a queues running on linked lists.
And for the sake of interest, I decided to compare the performance of the queue based on the linked list and the usual python list. And I was surprised. When my linked list Queue dequeued 500.000 elements in 0.5 seconds, while python list Queue was doing it in 75 seconds.
If you know the index already, you'd use del.
Otherwise, remove first needs to traverse the list, find the (first) index for the element, then del it. This would, therefore, make it slower.
Without any knowledge on how del or remove() performs, we can write a test using the timeit library to determine which one is faster. In this example, I simulate using both methods 10,000 times and print the average time required:
import timeit
num_runs = 10000
del_method = 'lst = [1, 2, 3]; del lst[i]'
del_setup = 'i = 0'
print(timeit.Timer(del_method, setup=del_setup).timeit(number=num_runs))
remove_method = 'lst = [1, 2, 3]; lst.remove(ele)'
remove_setup = 'ele = 1'
print(timeit.Timer(remove_method, setup=remove_setup).timeit(number=num_runs))
Ouput:
0.0005947000000000036
0.0007260000000000044
As we can see, del performs faster in this simple scenario. This makes sense knowing that remove() performs a search before removing the element. I can imagine with an even larger list the difference between the times would only grow.