They are not the same thing at all.
len() queries for the number of items contained in a container. For a string that's the number of characters:
Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary).
sys.getsizeof() on the other hand returns the memory size of the object:
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Python string objects are not simple sequences of characters, 1 byte per character.
Specifically, the sys.getsizeof() function includes the garbage collector overhead if any:
getsizeof()calls the object’s__sizeof__method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
String objects do not need to be tracked (they cannot create circular references), but string objects do need more memory than just the bytes per character. In Python 2, __sizeof__ method returns (in C code):
Py_ssize_t res;
res = PyStringObject_SIZE + PyString_GET_SIZE(v) * Py_TYPE(v)->tp_itemsize;
return PyInt_FromSsize_t(res);
where PyStringObject_SIZE is the C struct header size for the type, PyString_GET_SIZE basically is the same as len() and Py_TYPE(v)->tp_itemsize is the per-character size. In Python 2.7, for byte strings, the size per character is 1, but it's PyStringObject_SIZE that is confusing you; on my Mac that size is 37 bytes:
>>> sys.getsizeof('')
37
For unicode strings the per-character size goes up to 2 or 4 (depending on compilation options). On Python 3.3 and newer, Unicode strings take up between 1 and 4 bytes per character, depending on the contents of the string.
For containers such as dictionaries or lists that reference other objects, the memory size given covers only the memory used by the container and the pointer values used to reference those other objects. There is no straightforward method of including the memory size of the ‘contained’ objects because those same objects could have many more references elsewhere and are not necessarily owned by a single container.
The documentation states it like this:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
If you need to calculate the memory footprint of a container and anything referenced by that container you’ll have to use some method of traversing to those contained objects and get their size; the documentation points to a recursive recipe.
Answer from Martijn Pieters on Stack OverflowThey are not the same thing at all.
len() queries for the number of items contained in a container. For a string that's the number of characters:
Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary).
sys.getsizeof() on the other hand returns the memory size of the object:
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Python string objects are not simple sequences of characters, 1 byte per character.
Specifically, the sys.getsizeof() function includes the garbage collector overhead if any:
getsizeof()calls the object’s__sizeof__method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
String objects do not need to be tracked (they cannot create circular references), but string objects do need more memory than just the bytes per character. In Python 2, __sizeof__ method returns (in C code):
Py_ssize_t res;
res = PyStringObject_SIZE + PyString_GET_SIZE(v) * Py_TYPE(v)->tp_itemsize;
return PyInt_FromSsize_t(res);
where PyStringObject_SIZE is the C struct header size for the type, PyString_GET_SIZE basically is the same as len() and Py_TYPE(v)->tp_itemsize is the per-character size. In Python 2.7, for byte strings, the size per character is 1, but it's PyStringObject_SIZE that is confusing you; on my Mac that size is 37 bytes:
>>> sys.getsizeof('')
37
For unicode strings the per-character size goes up to 2 or 4 (depending on compilation options). On Python 3.3 and newer, Unicode strings take up between 1 and 4 bytes per character, depending on the contents of the string.
For containers such as dictionaries or lists that reference other objects, the memory size given covers only the memory used by the container and the pointer values used to reference those other objects. There is no straightforward method of including the memory size of the ‘contained’ objects because those same objects could have many more references elsewhere and are not necessarily owned by a single container.
The documentation states it like this:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
If you need to calculate the memory footprint of a container and anything referenced by that container you’ll have to use some method of traversing to those contained objects and get their size; the documentation points to a recursive recipe.
key difference is that len() will give actual length of elements in container , Whereas sys.getsizeof() will give it's memory size which it occupy
for more information read docs of python which is available at https://docs.python.org/3/library/sys.html#module-sys
Size of python objects different? [Real memory vs sys.getsizeof()]
sys.getsizeof gives you the amount of memory allocated to the list itself, but you also have 10...00 int objects that the list only contains a pointer to.
Strange behavior of sys.getsizeof
BUG: Incorrect results from `sys.getsizeof()` for multi-dimensional arrays
What does python sys getsizeof for string return? - Stack Overflow
Videos
Hi Pyople!
Yesterday I learned about sys.getsizeof() function and try some code. More specifically:
lst = [i for i in range(1000000000)] # one mld numbers, creating for about a minute
When I use sys.getsizeof(lst), it returns: 8058558880. Which is correct. But when I look at my system resources in Linux Centos7 IPython (Python 3.4) I see: ipython Memory: 39592564 K Shared Mem: 5176 K - That's freaking 40GB.
I don't understand why, if a object is 8 GB in size, takes 40 KGB system memory. I tried it in list that had around 400 MB and system took 400 * 5 (approx) = 2 GB (approx)
Why is it taking 5-times more memory than it should? Or is the problem only because I tried it in iPython / Konsole? And in program it wouldn't be a problem?
sys.getsizeof gives you the amount of memory allocated to the list itself, but you also have 10...00 int objects that the list only contains a pointer to.
The size of an object does not include the size of all the objects that that object refers to. For example:
>>> import sys
>>> foo = ['a' * 1000000]
>>> sys.getsizeof(foo)
40
>>> sys.getsizeof(foo[0])
1000025
foo is a list object that contains one item. Its size is 40 bytes, because that's how much memory it takes to store a list big enough to hold a reference to one object. That object happens to be about a megabyte in size, but it's a completely separate object from the list object and doesn't count towards the size of the list object.
I will attempt to answer your question from a broader point of view. You're referring to two functions and comparing their outputs. Let's take a look at their documentation first:
- len():
Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set).
So in case of string, you can expect len() to return the number of characters.
- sys.getsizeof():
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
So in case of string (as with many other objects) you can expect sys.getsizeof() the size of the object in bytes. There is no reason to think that it should be the same as the number of characters.
Let's have a look at some examples:
>>> first = "First"
>>> len(first)
5
>>> sys.getsizeof(first)
42
This example confirms that the size is not the same as the number of characters.
>>> second = "Second"
>>> len(second)
6
>>> sys.getsizeof(second)
43
We can notice that if we look at a string one character longer, its size is one byte bigger as well. We don't know if it's a coincidence or not though.
>>> together = first + second
>>> print(together)
FirstSecond
>>> len(together)
11
If we concatenate the two strings, their combined length is equal to the sum of their lengths, which makes sense.
>>> sys.getsizeof(together)
48
Contrary to what someone might expect though, the size of the combined string is not equal to the sum of their individual sizes. But it still seems to be the length plus something. In particular, something worth 37 bytes. Now you need to realize that it's 37 bytes in this particular case, using this particular Python implementation etc. You should not rely on that at all. Still, we can take a look why it's 37 bytes what they are (approximately) used for.
String objects are in CPython (probably the most widely used implementation of Python) implemented as PyStringObject. This is the C source code (I use the 2.7.9 version):
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;
You can see that there is something called PyObject_VAR_HEAD, one int, one long and a char array. The char array will always contain one more character to store the '\0' at the end of the string. This, along with the int, long and PyObject_VAR_HEAD take the additional 37 bytes. PyObject_VAR_HEAD is defined in another C source file and it refers to other implementation-specific stuff, you need to explore if you want to find out where exactly are the 37 bytes. Plus, the documentation mentions that sys.getsizeof()
adds an additional garbage collector overhead if the object is managed by the garbage collector.
Overall, you don't need to know what exactly takes the something (the 37 bytes here) but this answer should give you a certain idea why the numbers differ and where to find more information should you really need it.
To quote the documentation:
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Built in strings are not simple character sequences - they are full fledged objects, with garbage collection overhead, which probably explains the size discrepancy you're noticing.
I was watching a numpy video on YouTube and the presenter made a point about numpy arrays versus python lists, and he did so in an odd manner. He was pointing out that one of the advantages of numpy array are how they take up less space, but when I tried to remake his demonstration in IDLE I didn’t get the same results.
import numpy as np
import sys
a = range(1000)
print('Information pertaining to a')
print('Get size', sys.getsizeof(a))
print('Type', type(a))
print('Print of actual a', a, '\n')
b = []
for i in range (1000):
b.append(i)
print('Information pertaining to b')
print('Get size', sys.getsizeof(b))
print('Type', type(b))
print('Print of actual b', b, '\n')
c = np.arange(1000)
print('Information pertaining to c')
print('Get size', sys.getsizeof(c))
print('Type', type(c))
print('Print of actual c', c, '\n')
d = 5
print('Information pertaining to d')
print('Get size', sys.getsizeof(d))
print('Type', type(d))
print('Print of actual d', d, '\n')
e = 'e'
print('Information pertaining to e')
print('Get size', sys.getsizeof(e))
print('Type', type(e))
print('Print of actual e', e, '\n')When I run this code, it shows me that the np array is indeed a bit lighter than the py list (9000 bytes vs 8000 bytes), but for some reason it doesn’t show me the full size of the range. It looks like he’s only showing me the space taken up by the letter 'a' (only 48 bytes). So I’m wondering, what exactly is getsizeof() meant to do? And why does it treat different kinds of list-like objects differently?
As others have stated, sys.getsizeof only returns the size of the object structure that represents your data. So if, for instance, you have a dynamic array that you keep adding elements to, sys.getsizeof(my_array) will only ever show the size of the base DynamicArray object, not the growing size of memory that its elements take up.
pympler.asizeof.asizeof() gives an approximate complete size of objects and may be more accurate for you.
from pympler import asizeof
asizeof.asizeof(my_object) # should give you the full object size
sys.getsizeof returns a number which is more specialized and less useful than people think. In fact, if you increase the number of attributes to six, your test3_obj remains at 32, but test4_obj jumps to 48 bytes. This is because getsizeof is returning the size of the PyObject structure implementing the type, which for test3_obj doesn't include the dict holding the attributes, but for test4_obj, the attributes aren't stored in a dict, they are stored in slots, so they are accounted for in the size.
But a class defined with __slots__ takes less memory than a class without, precisely because there is no dict to hold the attributes.
Why override __sizeof__? What are you really trying to accomplish?