I think that Pympler has already beaten you to the punch on this one.
From their documentation:
>>> from pympler.asizeof import asizeof
>>> obj = [1, 2, (3, 4), 'text']
>>> asizeof(obj)
176
The source code can be found here.
Answer from cwallenpoole on Stack OverflowObject size
How do I determine the size of an object in Python? - Stack Overflow
What is the difference between len() and sys.getsizeof() methods in python? - Stack Overflow
python - Strange output sys.getsizeof() - Stack Overflow
Videos
I am generating a fairly large object with the intention to store it in mongodb using GridFS. It turns out that the file size in MongoDB is huge (several hundred MBs) while the object size in python returns a fairly manageable size (roughly 5000 bytes) as measured by:
sum = 0 for d in dir(object): sum += sys.getsizeof(d)
What am I missing here? The sum variable does seemingly underestimate true size, since a object size of 5000 bytes wouldn't theoretically require GridFS for storing. Is there a way to loop through all layers attributes in an object?
Thanks!
Just use the sys.getsizeof function defined in the sys module.
sys.getsizeof(object[, default]):Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
The
defaultargument allows to define a value which will be returned if the object type does not provide means to retrieve the size and would cause aTypeError.
getsizeofcalls the object’s__sizeof__method and adds an additional garbage collector overhead if the object is managed by the garbage collector.See recursive sizeof recipe for an example of using
getsizeof()recursively to find the size of containers and all their contents.
Usage example, in python 3.0:
>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
24
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48
If you are in python < 2.6 and don't have sys.getsizeof you can use this extensive module instead. Never used it though.
How do I determine the size of an object in Python?
The answer, "Just use sys.getsizeof", is not a complete answer.
That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as custom objects, tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.
A More Complete Answer
Using 64-bit Python 3.6 from the Anaconda distribution, with sys.getsizeof, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don't grow again until after a set amount (which may vary by implementation of the language):
Python 3:
Empty
Bytes type scaling notes
28 int +4 bytes about every 30 powers of 2
37 bytes +1 byte per additional byte
49 str +1-4 per additional character (depending on max width)
48 tuple +8 per additional item
64 list +8 for each additional
224 set 5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240 dict 6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136 func def does not include default args and other attrs
1056 class def no slots
56 class inst has a __dict__ attr, same scaling as dict above
888 class def with slots
16 __slots__ seems to store in mutable tuple-like structure
first slot grows to 48, and so on.
How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that's 1736 bytes in total
Some caveats for function and class definitions:
Note each class definition has a proxy __dict__ (48 bytes) structure for class attrs. Each slot has a descriptor (like a property) in the class definition.
Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.
Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a __dict__.
Also note that we use sys.getsizeof() because we care about the marginal space usage, which includes the garbage collection overhead for the object, from the docs:
getsizeof()calls the object’s__sizeof__method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
Also note that resizing lists (e.g. repetitively appending to them) causes them to preallocate space, similarly to sets and dicts. From the listobj.c source code:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
Historical data
Python 2.7 analysis, confirmed with guppy.hpy and sys.getsizeof:
Bytes type empty + scaling notes
24 int NA
28 long NA
37 str + 1 byte per additional character
52 unicode + 4 bytes per additional character
56 tuple + 8 bytes per additional item
72 list + 32 for first, 8 for each additional
232 set sixth item increases to 744; 22nd, 2280; 86th, 8424
280 dict sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120 func def does not include default args and other attrs
64 class inst has a __dict__ attr, same scaling as dict above
16 __slots__ class with slots has no dict, seems to store in
mutable tuple-like structure.
904 class def has a proxy __dict__ structure for class attrs
104 old class makes sense, less stuff, has real dict though.
Note that dictionaries (but not sets) got a more compact representation in Python 3.6
I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.
And for more on slots, see this answer.
A More Complete Function
We want a function that searches the elements in lists, tuples, sets, dicts, obj.__dict__'s, and obj.__slots__, as well as other things we may not have yet thought of.
We want to rely on gc.get_referents to do this search because it works at the C level (making it very fast). The downside is that get_referents can return redundant members, so we need to ensure we don't double count.
Classes, modules, and functions are singletons - they exist one time in memory. We're not so interested in their size, as there's not much we can do about them - they're a part of the program. So we'll avoid counting them if they happen to be referenced.
We're going to use a blacklist of types so we don't include the entire program in our size count.
import sys
from types import ModuleType, FunctionType
from gc import get_referents
# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType
def getsize(obj):
"""sum size of object & members."""
if isinstance(obj, BLACKLIST):
raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
seen_ids = set()
size = 0
objects = [obj]
while objects:
need_referents = []
for obj in objects:
if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
seen_ids.add(id(obj))
size += sys.getsizeof(obj)
need_referents.append(obj)
objects = get_referents(*need_referents)
return size
To contrast this with the following whitelisted function, most objects know how to traverse themselves for the purposes of garbage collection (which is approximately what we're looking for when we want to know how expensive in memory certain objects are. This functionality is used by gc.get_referents.) However, this measure is going to be much more expansive in scope than we intended if we are not careful.
For example, functions know quite a lot about the modules they are created in.
Another point of contrast is that strings that are keys in dictionaries are usually interned so they are not duplicated. Checking for id(key) will also allow us to avoid counting duplicates, which we do in the next section. The blacklist solution skips counting keys that are strings altogether.
Whitelisted Types, Recursive visitor
To cover most of these types myself, instead of relying on the gc module, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise).
This sort of function gives much more fine-grained control over the types we're going to count for memory usage, but has the danger of leaving important types out:
import sys
from numbers import Number
from collections import deque
from collections.abc import Set, Mapping
ZERO_DEPTH_BASES = (str, bytes, Number, range, bytearray)
def getsize(obj_0):
"""Recursively iterate to sum size of object & members."""
_seen_ids = set()
def inner(obj):
obj_id = id(obj)
if obj_id in _seen_ids:
return 0
_seen_ids.add(obj_id)
size = sys.getsizeof(obj)
if isinstance(obj, ZERO_DEPTH_BASES):
pass # bypass remaining control flow and return
elif isinstance(obj, (tuple, list, Set, deque)):
size += sum(inner(i) for i in obj)
elif isinstance(obj, Mapping) or hasattr(obj, 'items'):
size += sum(inner(k) + inner(v) for k, v in getattr(obj, 'items')())
# Check for custom object instances - may subclass above too
if hasattr(obj, '__dict__'):
size += inner(vars(obj))
if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
return size
return inner(obj_0)
And I tested it rather casually (I should unittest it):
>>> getsize(['a', tuple('bcd'), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple('bcd'))
194
>>> getsize(['a', tuple('bcd'), Foo(), {'foo': 'bar', 'baz': 'bar'}])
752
>>> getsize({'foo': 'bar', 'baz': 'bar'})
400
>>> getsize({})
280
>>> getsize({'foo':'bar'})
360
>>> getsize('foo')
40
>>> class Bar():
... def baz():
... pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280
This implementation breaks down on class definitions and function definitions because we don't go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn't matter too much.
They are not the same thing at all.
len() queries for the number of items contained in a container. For a string that's the number of characters:
Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary).
sys.getsizeof() on the other hand returns the memory size of the object:
Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
Python string objects are not simple sequences of characters, 1 byte per character.
Specifically, the sys.getsizeof() function includes the garbage collector overhead if any:
getsizeof()calls the object’s__sizeof__method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
String objects do not need to be tracked (they cannot create circular references), but string objects do need more memory than just the bytes per character. In Python 2, __sizeof__ method returns (in C code):
Py_ssize_t res;
res = PyStringObject_SIZE + PyString_GET_SIZE(v) * Py_TYPE(v)->tp_itemsize;
return PyInt_FromSsize_t(res);
where PyStringObject_SIZE is the C struct header size for the type, PyString_GET_SIZE basically is the same as len() and Py_TYPE(v)->tp_itemsize is the per-character size. In Python 2.7, for byte strings, the size per character is 1, but it's PyStringObject_SIZE that is confusing you; on my Mac that size is 37 bytes:
>>> sys.getsizeof('')
37
For unicode strings the per-character size goes up to 2 or 4 (depending on compilation options). On Python 3.3 and newer, Unicode strings take up between 1 and 4 bytes per character, depending on the contents of the string.
For containers such as dictionaries or lists that reference other objects, the memory size given covers only the memory used by the container and the pointer values used to reference those other objects. There is no straightforward method of including the memory size of the ‘contained’ objects because those same objects could have many more references elsewhere and are not necessarily owned by a single container.
The documentation states it like this:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
If you need to calculate the memory footprint of a container and anything referenced by that container you’ll have to use some method of traversing to those contained objects and get their size; the documentation points to a recursive recipe.
key difference is that len() will give actual length of elements in container , Whereas sys.getsizeof() will give it's memory size which it occupy
for more information read docs of python which is available at https://docs.python.org/3/library/sys.html#module-sys
From the documentation (my bold) (a):
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
So the size of v does not include the sizes of the elements it refers to.
If you change kite into kites, you'll also see that its size increases but not the size of v (I've replaced your big number with 100...00 in the output to ease formatting):
1 size is: 12
2 size is: 12
kite size is: 25
100...00 size is: 102
Total size is: 48
1 size is: 12
2 size is: 12
kites size is: 26
100...00 size is: 102
Total size is: 48
Think of it like this:
/ +-----+
| v | ref | -> 1
Size | | ref | -> 2
of v | | ref | -> 'kite'
| | ref | -> 100**100
\ +-----+
\___________________________/
Size of things referred
to by v
(a) That page also has a link to a recipe for doing recursive size calculations if you need that information. The link is duplicated here for citation, and the code is duplicated below to make this answer more self-contained.
Plugging your structure into that code gives:
48 <type 'list'> [1, 2, 'kites', 100...00L]
12 <type 'int'> 1
12 <type 'int'> 2
26 <type 'str'> 'kites'
102 <type 'long'> 100...00L
200
The code, with your structure, is shown below.
from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
from reprlib import repr
except ImportError:
pass
def total_size(o, handlers={}, verbose=False):
""" Returns the approximate memory footprint an object and all of its contents.
Automatically finds the contents of the following builtin containers and
their subclasses: tuple, list, deque, dict, set and frozenset.
To search other containers, add handlers to iterate over their contents:
handlers = {SomeContainerClass: iter,
OtherContainerClass: OtherContainerClass.get_elements}
"""
dict_handler = lambda d: chain.from_iterable(d.items())
all_handlers = {tuple: iter,
list: iter,
deque: iter,
dict: dict_handler,
set: iter,
frozenset: iter,
}
all_handlers.update(handlers) # user handlers take precedence
seen = set() # track which object id's have already been seen
default_size = getsizeof(0) # estimate sizeof object without __sizeof__
def sizeof(o):
if id(o) in seen: # do not double count the same object
return 0
seen.add(id(o))
s = getsizeof(o, default_size)
if verbose:
print(s, type(o), repr(o), file=stderr)
for typ, handler in all_handlers.items():
if isinstance(o, typ):
s += sum(map(sizeof, handler(o)))
break
return s
return sizeof(o)
##### Example call #####
if __name__ == '__main__':
v = [1,2,'kites',100**100]
print(total_size(v, verbose=True))
This happens because your "Total size" is actually the size of the list structure without the contents. So you can store an object of any size there and it won't change your "Total size." You need a "recursive" getsizeof(), and for that, see here: Python deep getsizeof list with contents? or here: Deep version of sys.getsizeof
» pip install objsize