df.memory_usage() will return how many bytes each column occupies:
>>> df.memory_usage()
Row_ID 20906600
Household_ID 20906600
Vehicle 20906600
Calendar_Year 20906600
Model_Year 20906600
...
The values are in units of bytes.
To include indexes, pass index=True.
So to get overall memory consumption:
>>> df.memory_usage(index=True).sum()
731731000
As before, the value is in units of bytes.
Also, passing deep=True will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.
This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False (default case).
Videos
df.memory_usage() will return how many bytes each column occupies:
>>> df.memory_usage()
Row_ID 20906600
Household_ID 20906600
Vehicle 20906600
Calendar_Year 20906600
Model_Year 20906600
...
The values are in units of bytes.
To include indexes, pass index=True.
So to get overall memory consumption:
>>> df.memory_usage(index=True).sum()
731731000
As before, the value is in units of bytes.
Also, passing deep=True will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.
This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False (default case).
Here's a comparison of the different methods - sys.getsizeof(df) is simplest.
For this example, df is a dataframe with 814 rows, 11 columns (2 ints, 9 objects) - read from a 427kb shapefile
sys.getsizeof(df)
>>> import sys >>> sys.getsizeof(df) (gives results in bytes) 462456
df.memory_usage()
>>> df.memory_usage() ... (lists each column at 8 bytes/row) >>> df.memory_usage().sum() 71712 (roughly rows * cols * 8 bytes) >>> df.memory_usage(deep=True) (lists each column's full memory usage) >>> df.memory_usage(deep=True).sum() (gives results in bytes) 462432
df.info()
Prints dataframe info to stdout. Technically these are kibibytes (KiB), not kilobytes - as the docstring says, "Memory usage is shown in human-readable units (base-2 representation)." So to get bytes would multiply by 1024, e.g. 451.6 KiB = 462,438 bytes.
>>> df.info() ... memory usage: 70.0+ KB >>> df.info(memory_usage='deep') ... memory usage: 451.6 KB