Why is removing elements from a list so slow, and is there a faster way?
python - Speed of del vs remove on list - Stack Overflow
Explain time and space complexity of two python codes. Which one is the best? (Non- subjective) - Stack Overflow
how much time will it take to remove a key, value pair from a dictionary?
I was trying to write a simple application, which is ao supposed to filter a list of words down to a list of words of a certain length. For that I could either remove the words of the wrong length, or create a new list of words with the correct length.
I had a list of around 58000 words, and wanted to filter out all the 6 letter words, which are around 6900.
with open('words.txt') as f:
words = f.readlines()
for i in range(len(words)):
words[i] = words[i].strip()
length = int(input("Desired word length "))
for i in reversed(words):
if len(i) != length:
words.remove(i)This took 22 seconds.
Another way is to just create a new list with words of the correct length. I did this as follows:
with open('words.txt') as f:
words = f.readlines()
for i in range(len(words)):
words[i] = words[i].strip()
length = int(input("Desired word length "))
clw = []
for i in words:
if len(i) == length:
clw.append(i)This only took 0.03 seconds. How can it be that creating a list of 6900 words takes 0.03 seconds, but removing 51100 words takes 22? It's only 7 times as many words, but takes 700 times as long. And is there a better and faster way to quickly remove list elements?
If you know the index already, you'd use del.
Otherwise, remove first needs to traverse the list, find the (first) index for the element, then del it. This would, therefore, make it slower.
Without any knowledge on how del or remove() performs, we can write a test using the timeit library to determine which one is faster. In this example, I simulate using both methods 10,000 times and print the average time required:
import timeit
num_runs = 10000
del_method = 'lst = [1, 2, 3]; del lst[i]'
del_setup = 'i = 0'
print(timeit.Timer(del_method, setup=del_setup).timeit(number=num_runs))
remove_method = 'lst = [1, 2, 3]; lst.remove(ele)'
remove_setup = 'ele = 1'
print(timeit.Timer(remove_method, setup=remove_setup).timeit(number=num_runs))
Ouput:
0.0005947000000000036
0.0007260000000000044
As we can see, del performs faster in this simple scenario. This makes sense knowing that remove() performs a search before removing the element. I can imagine with an even larger list the difference between the times would only grow.
According to the Python wiki, deleting an item from a list takes linear time proportional to the number of elements in the list. Since you delete every item in the list, and each deletion takes linear time, the overall runtime is proportional to the square of number of items in the list.
In your second code snippet, both sum as well as map take linear time. So the overall complexity is linear proportional to the number of elements in the list. Interestingly, sum_of_elements isn't used at all (but it doesn't sum all even elements either).
what about the following?
import numpy as np
a = np.arange(20)
print np.sum(a[a%2==0])
It seems to be much more lightweight compared to your two code snippets.
Small timings with an np.arange(998):
Pure numpy:
248502
0.0
Class recursion:
248502
0.00399994850159
List/Numpy one:
248502
0.00200009346008
And, if there's a 999 element array, your class runs in failure, because the maximum recursion depth is reached.