len(obj) simply calls obj.__len__():
>>> [1, 2, 3, 4].__len__()
4
It is therefore not correct to say that len() is always O(1) -- calling len() on most objects (e.g. lists) is O(1), but an arbitrary object might implement __len__ in an arbitrarily inefficient way.
max(obj) is a different story, because it doesn't call a single magic __max__ method on obj; it instead iterates over it, calling __iter__ and then calling __next__. It does this n times (and also does a comparison each time to track the max item it's seen so far), so it must always be at least O(n). (It can be slower if __next__ or the comparison methods are slow, although that would be very unusual.)
For either of these, we don't count the time it took to build the collection as part of the cost of calling the operation itself -- this is because you might build a list once and then call len() on it many times, and it's useful to know that the len() by itself is very cheap even if building the list was very expensive.
len(obj) simply calls obj.__len__():
>>> [1, 2, 3, 4].__len__()
4
It is therefore not correct to say that len() is always O(1) -- calling len() on most objects (e.g. lists) is O(1), but an arbitrary object might implement __len__ in an arbitrarily inefficient way.
max(obj) is a different story, because it doesn't call a single magic __max__ method on obj; it instead iterates over it, calling __iter__ and then calling __next__. It does this n times (and also does a comparison each time to track the max item it's seen so far), so it must always be at least O(n). (It can be slower if __next__ or the comparison methods are slow, although that would be very unusual.)
For either of these, we don't count the time it took to build the collection as part of the cost of calling the operation itself -- this is because you might build a list once and then call len() on it many times, and it's useful to know that the len() by itself is very cheap even if building the list was very expensive.
Let's check it:
import time
from matplotlib import pyplot as plt
import numpy as np
def main():
a = []
data = []
for i in range(10_000):
a.append(i)
ts_len = time.time()
_ = len(a)
te_len = time.time()
ts_max = time.time()
_ = max(a)
te_max = time.time()
ts_min = time.time()
_ = min(a)
te_min = time.time()
data.append([i, te_len - ts_len, te_max - ts_max, te_min - ts_min])
data = np.array(data)
plt.plot(data[:, 0], data[:, 1], "-r", label="len")
plt.plot(data[:, 0], data[:, 2], "--g", label="max")
plt.plot(data[:, 0], data[:, 2], ".b", label="min")
plt.title("Len/max/min")
plt.xlabel("Size of the list")
plt.ylabel("Time elapsed (s)")
plt.legend()
plt.show()
if __name__ == '__main__':
main()

It's O(1) (constant time, not depending of actual length of the element - very fast) on every type you've mentioned, plus set and others such as array.array.
Calling len() on those data types is O(1) in CPython, the official and most common implementation of the Python language. Here's a link to a table that provides the algorithmic complexity of many different functions in CPython:
TimeComplexity Python Wiki Page
python - Complexity of len() with regard to sets and lists - Stack Overflow
c - What is the secret behind Python's len() builtin time complexity of O(1) - Stack Overflow
python - Time complexity of accessing collections.deque length - Stack Overflow
Time complexity for running len(array) in python? - Stack Overflow
Inspecting the c-source of dictobject.c shows that the structure contains a member responsible for maintaining an explicit count (dk_size)
layout:
+---------------+
| dk_refcnt |
| dk_size |
| dk_lookup |
| dk_usable |
| dk_nentries |
+---------------+
...
Thus it will have order O(1)
According to this page:
Time Complexity: O(1) – In Python, a variable is maintained inside the container(here the dictionary) that holds the current size of the container. So, whenever anything is pushed or popped into a container, the value of the variable is incremented(for the push operation)/decremented(for the pop operation). Let’s say, there are 2 elements already present in a dictionary. When we insert another element in the dictionary, the value of the variable holding the size of the dictionary is also incremented, as we insert the element. Its value becomes 3. When we call len() on the dictionary, it calls the magic function len() which simply returns the size variable. Hence, it is O(1) operation.
Space Complexity: O(1) – Since there’s only a single variable holding the size of the dictionary, there’s no auxiliary space involved. Hence the space complexity of the method is O(1) too.
Firstly, you have not measured the speed of len(), you have measured the speed of creating a list/set together with the speed of len().
Use the --setup argument of timeit:
$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "len(a)"
10000000 loops, best of 3: 0.0369 usec per loop
$ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "len(a)"
10000000 loops, best of 3: 0.0372 usec per loop
The statements you pass to --setup are run before measuring the speed of len().
Secondly, you should note that len(a) is a pretty quick statement. The process of measuring its speed may be subject to "noise". Consider that the code executed (and measured) by timeit is equivalent to the following:
for i in itertools.repeat(None, number):
len(a)
Because both len(a) and itertools.repeat(...).__next__() are fast operations and their speeds may be similar, the speed of itertools.repeat(...).__next__() may influence the timings.
For this reason, you'd better measure len(a); len(a); ...; len(a) (repeated 100 times or so) so that the body of the for loop takes a considerably higher amount of time than the iterator:
$ python -m timeit --setup "a=[1,2,3,4,5,6,7,8,9,10]" "$(for i in {0..1000}; do echo "len(a)"; done)"
10000 loops, best of 3: 29.2 usec per loop
$ python -m timeit --setup "a={1,2,3,4,5,6,7,8,9,10}" "$(for i in {0..1000}; do echo "len(a)"; done)"
10000 loops, best of 3: 29.3 usec per loop
(The results still says that len() has the same performances on lists and sets, but now you are sure that the result is correct.)
Thirdly, it's true that "complexity" and "speed" are related, but I believe you are making some confusion. The fact that len() has O(1) complexity for lists and sets does not imply that it must run with the same speed on lists and sets.
It means that, on average, no matter how long the list a is, len(a) performs the same asymptotic number of steps. And no matter how long the set b is, len(b) performs the same asymptotic number of steps. But the algorithm for computing the size of lists and sets may be different, resulting in different performances (timeit shows that this is not the case, however this may be a possibility).
Lastly,
If the creation of a set object takes more time compared to creating a list, what would be the underlying reason?
A set, as you know, does not allow repeated elements. Sets in CPython are implemented as hash tables (to ensure average O(1) insertion and lookup): constructing and maintaining a hash table is much more complex than adding elements to a list.
Specifically, when constructing a set, you have to compute hashes, build the hash table, look it up to avoid inserting duplicated events and so on. By contrast, lists in CPython are implemented as a simple array of pointers that is malloc()ed and realloc()ed as required.
The relevant lines are http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup#l640
640 static Py_ssize_t
641 set_len(PyObject *so)
642 {
643 return ((PySetObject *)so)->used;
644 }
and http://svn.python.org/view/python/trunk/Objects/listobject.c?view=markup#l431
431 static Py_ssize_t
432 list_length(PyListObject *a)
433 {
434 return Py_SIZE(a);
435 }
Both are only a dynamic lookup.
So what is the difference you may ask. You measure the creation of the objects, too. And it is a little more time consuming to create a set than a list.
I guess you are missing one concept that is how a data structure can return its size in constant time i.e. O(1).
Roughly, think of a program like this:
void init(){
// code to initialize (or allocate memory) the container
size = 0;
}
void add(Something something){
container.add(something);
size++;
}
void remove(Something something){
//Code to search 'something'
if(found) {
container.remove(something);
size--;
}
}
int len(){
return size;
}
Now any time you call the method len(), it is ready to return the integral value without any need to traverse the container.
Why strlen or any C related data structure doesn't work that way is because of the space overhead of having a counter like size. But that doesn't mean you can't define one.
Hint:
Use struct and keep the size maintained there.
Any string/list in python is an object. Like many objects, it has a __len__ method, which stores the length of the list. When we call len, __len__ gets called internally, and returns the stored value, which is an O(1) operation.
I was looking at the time complexity of different list operations and the "get length" operation has constant time O(1). Why is that? I would understand it if lists could only store a single type of object (e.g. only integers or characters), but since lists can hold objects of a different size I don't understand how the len() function has O(1) complexity.
I checked it in a real condition and counting was 2 time or more faster than len().
My code that read two CSV format texts and then calculate number of lines is this:
import time
import csv
start_time = time.time()
psr = open('e_psr.txt')
cpr = open('e_cp.txt')
csv_psr = csv.reader(psr, delimiter=',')
csv_cp = csv.reader(cpr, delimiter=',')
csv_cp_copy = []
csv_psr_copy = []
r=0
e=0
for row in csv_psr:
csv_psr_copy.append(row)
for row in csv_cp:
csv_cp_copy.append(row)
e = len(csv_cp_copy)
r = len(csv_psr_copy)
psr.close()
cpr.close()
print(e,r)
print("\n--- %s seconds ---" % (time.time() - start_time))
and when I replaced len() with a simple counter in for loop (e += 1) the result significantly changed.
return with len():
10000 10000
--- 0.13390278816223145 seconds ---
return with counter:
10000 10000
--- 0.05642294883728027 seconds ---
from timeit import default_timer as timer
test = [x for x in range(1000)]
count = 0
start = timer()
len(test)
end = timer()
print(end - start)
start = timer()
for i in test:
count += 1
end = timer()
print(end - start)
Returns:
2.643069343567298e-06
213.7110354941546e-06
If test = "This is a test string."
Returns:
2.2654880087719696e-06
1.0572277374269745e-05