The complexity of in depends entirely on what L is. e in L will become L.__contains__(e).
See this time complexity document for the complexity of several built-in types.
Here is the summary for in:
- list - Average: O(n)
- set/dict - Average: O(1), Worst: O(n)
The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__ is implemented poorly. This only happens if everything in your set has the same hash value.
The complexity of in depends entirely on what L is. e in L will become L.__contains__(e).
See this time complexity document for the complexity of several built-in types.
Here is the summary for in:
- list - Average: O(n)
- set/dict - Average: O(1), Worst: O(n)
The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__ is implemented poorly. This only happens if everything in your set has the same hash value.
It depends entirely on the type of the container. Hashing containers (dict, set) use the hash and are essentially O(1). Typical sequences (list, tuple) are implemented as you guess and are O(n). Trees would be average O(log n). And so on. Each of these types would have an appropriate __contains__ method with its big-O characteristics.
Python string 'in' operator implementation algorithm and time complexity - Stack Overflow
time complexity - 'in' operator functionality in python - Stack Overflow
Python "in" operator time complexity on range() - Stack Overflow
Time complexity of Counters in Python
Videos
I’m not the biggest python user. But I was looking at a friends code yesterday and they had something like:
For x in (list of 40000)
For y in (list of 2.7 million)
If x = y
Append something This was obviously super slow so they changed it to something like:
For x in (list of 2.7 million)
If y in (list of 40000)
Append something
This moved much faster. I get the point of one for loop being faster than two, but what is that “in” exists function doing that makes it so much faster. I always thought that to check if something exists is O(n) which shouldn’t be faster. Also this was for ML purposes so they were likely using numpy stuff.
It's a combination of Boyer-Moore and Horspool.
You can view the C code here:
Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.
From the link above:
When designing the new algorithm, I used the following constraints:
- should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
- small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
- sublinear search behaviour in good cases (O(n/m))
- no worse than the current algorithm in worst case (O(nm))
- should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
- many real-life searches should be good, very few should be worst case
- reasonably simple implementation
Since 2021, CPython uses Crochemore and Perrin's Two-Way algorithm for larger n.
From the source code:
If the strings are long enough, use Crochemore and Perrin's Two-Way algorithm, which has worst-case O(n) runtime and best-case O(n/k). Also compute a table of shifts to achieve O(n/k) in more cases, and often (data dependent) deduce larger shifts than pure C&P can deduce. See stringlib_find_two_way_notes.txt in this folder for a detailed explanation.
See https://github.com/python/cpython/pull/22904
in does not necessarily use loops behind the scenes. For example:
r = range(100000000000)
print(333 in r) # prints True immediately without looping
If you were to loop r it will take quite a long time, so clearly that doesn't happen.
in basically calls (behind the scenes) the object's __contains__ method. For some iterators it will in fact "loop" through everything but that's not always the case.
This example is basically the same as calling:
r.__contains__(333)
As pointed out in comments - the str objects specifically have a smarter algorithm than just plain loops, as you can see here
Also see example answer here
And see the documentation here
Because the real world scenario would probably mean that string1 can be arbitrarily long, but the characters to be removed would be a finite and small set, it will probably be much more efficient to just add up all the characters that aren't in string2. Something like this:
def removeChars (string1, string2):
result = ''
for char in string1:
if char not in string2:
result += char
return result
This will involve looping over string1 just once, but multiple checks against string2 using in. This can be further simplified (to avoid += loop over the result):
def removeChars (string1, string2):
return ''.join(char for char in string1 if char not in string2)
When in is equivalent to a loop, it has to 'walk' the iterator and compare each item in turn.
This means each loop is O(n), so for 2 levels this is O(n²)
https://wiki.python.org/moin/TimeComplexity
Note that you actually have 3 loops here - since your replace will also walk through the string.
Since replace doens't raise any errors if char isn't found, it is simpler to just do this without first testing char in string1.