df.memory_usage() will return how many bytes each column occupies:

>>> df.memory_usage()

Row_ID            20906600
Household_ID      20906600
Vehicle           20906600
Calendar_Year     20906600
Model_Year        20906600
...

The values are in units of bytes.

To include indexes, pass index=True.

So to get overall memory consumption:

>>> df.memory_usage(index=True).sum()
731731000

As before, the value is in units of bytes.

Also, passing deep=True will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.

This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False (default case).

Answer from Oleksiy Syvokon on Stack Overflow
Top answer
1 of 8
207

df.memory_usage() will return how many bytes each column occupies:

>>> df.memory_usage()

Row_ID            20906600
Household_ID      20906600
Vehicle           20906600
Calendar_Year     20906600
Model_Year        20906600
...

The values are in units of bytes.

To include indexes, pass index=True.

So to get overall memory consumption:

>>> df.memory_usage(index=True).sum()
731731000

As before, the value is in units of bytes.

Also, passing deep=True will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.

This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False (default case).

2 of 8
157

Here's a comparison of the different methods - sys.getsizeof(df) is simplest.

For this example, df is a dataframe with 814 rows, 11 columns (2 ints, 9 objects) - read from a 427kb shapefile

sys.getsizeof(df)

>>> import sys
>>> sys.getsizeof(df)
(gives results in bytes)
462456

df.memory_usage()

>>> df.memory_usage()
...
(lists each column at 8 bytes/row)

>>> df.memory_usage().sum()
71712
(roughly rows * cols * 8 bytes)

>>> df.memory_usage(deep=True)
(lists each column's full memory usage)

>>> df.memory_usage(deep=True).sum()
(gives results in bytes)
462432

df.info()

Prints dataframe info to stdout. Technically these are kibibytes (KiB), not kilobytes - as the docstring says, "Memory usage is shown in human-readable units (base-2 representation)." So to get bytes would multiply by 1024, e.g. 451.6 KiB = 462,438 bytes.

>>> df.info()
...
memory usage: 70.0+ KB

>>> df.info(memory_usage='deep')
...
memory usage: 451.6 KB
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.memory_usage.html
pandas.DataFrame.memory_usage — pandas 3.0.2 documentation
Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output. ... If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values. ... A Series whose index is the original column names and whose values is the memory usage of each column in bytes.
Discussions

python - Accurate memory usage estimate of a pandas dataframe - Stack Overflow
In my experience, the dataframe memory estimates are grossly low when loading large JSON files that have arrays in the JSON objects. I have an example of a 28 MB JSON file loaded into a Pandas dataframe. The 'deep' memory usage displays 18 MB, however, the RSS memory consumed is nearly 300 MB. More on stackoverflow.com
🌐 stackoverflow.com
How to efficiently manage memory of pandas dataframe?
If all numbers are float64 (8 bytes), and you have 50 files, each with 232717 rows and 2027 columns, you end up needing 1.6 GB as a minimum, just for the floats. And you need to do this for 50 states? That is a minimum of 80 GB, You could reduce your numbers to float32 or even float16 (4 or 2 bytes) and cut your minimum to 40 or 20 GB if you can live with the decreased precision. But that would still not solve the issue of those fields, which have other contents than floats. Do you need all 2027 columns? If you are doing some analysis across states and this analysis only needs a subset of columns, I would filter the columns before concat'ing anything. You can even do this filtering in the call of pd.read_csv(). Look up the 'usecols' argument. More on reddit.com
🌐 r/learnpython
8
4
February 23, 2024
Optionally use sys.getsizeof in DataFrame.memory_usage
I would like to know how many bytes my dataframe takes up in memory. The standard way to do this is the memory_usage method df.memory_usage(index=True) For object dtype columns this measures 8 byte... More on github.com
🌐 github.com
14
November 13, 2015
Pandas - data size limit
When you use pandas, try to use it like this: import pyspark.pandas as ps normally you would just use import pandas as pd. But normal pandas does not work in a distributed manner and worse, runs in just a single core. But pyspark.pandas runs in distributed manner similar to Spark making your code performance efficient. Spark doc says it is not 100% compliant, so I would ask you to test it with your code. But when it got introduced I had 3 or 4 python files that I had in pandas which I simply converted to pyspark.pandas just by replacing the import statement. And all of them worked without issues ( each file consisted of 100 lines of pandas code with complex transforms). So I guess it should be fine for most use cases. More on reddit.com
🌐 r/MicrosoftFabric
14
5
July 29, 2024
🌐
Pandas
pandas.pydata.org › docs › user_guide › gotchas.html
Frequently Asked Questions (FAQ) — pandas 3.0.2 documentation
The memory usage displayed by the info() method utilizes the memory_usage() method to determine the memory usage of a DataFrame while also formatting the output in human-readable units (base-2 representation; i.e. 1KB = 1024 bytes). See also Categorical Memory Usage. pandas follows the NumPy ...
🌐
Python Forum
python-forum.io › thread-39302.html
help how to get size of pandas dataframe into MB\GB
January 28, 2023 - Hi Team, I want to convert memory usage of DataFrame into MB OR GB. in a Pythonic way. below value I want to get size 19647323 import pandas as pd data = pd.read_csv(r'E:\data\mobile_list.csv') df = pd.DataFrame(data) print(df.memory_usage(deep=...
🌐
Google Groups
groups.google.com › g › pydata › c › Q6ZDNpb74Y8
Estimating DataFrame Size in Memory
Hi All, I wrote this simple function to return how many MB are taken up by the data contained in a python DataFrame. Maybe there is a better way to extract this data and perhaps it should be a DataFrame/Series method. def df_size(df): """Return the size of a DataFrame in Megabyes""" total = 0.0 for col in df: total += df[col].nbytes return total/1048576 -Gagi
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.DataFrame.memory_usage.html
pandas.DataFrame.memory_usage — pandas 3.0.1 documentation
Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output. ... If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values. ... A Series whose index is the original column names and whose values is the memory usage of each column in bytes.
Find elsewhere
🌐
Python⇒Speed
pythonspeed.com › articles › estimating-pandas-memory
Don’t bother trying to estimate Pandas memory usage
February 1, 2023 - In some cases, however, loading the data into the DataFrame can take much more memory than the final DataFrame does. For example: Loading a Parquet with PyArrow might double memory usage compared to using fastparquet. Loading data from a SQL database can be inefficient if you’re not careful. ... There is no general heuristic that will tell you how much Pandas memory usage to expect just for loading a file, let alone how much memory you will need to process the data.
🌐
Towards Data Science
towardsdatascience.com › home › latest › pandas – save memory with these simple tricks
Pandas - Save Memory with These Simple Tricks | Towards Data Science
January 24, 2025 - We can check the memory usage for the complete dataframe in megabytes with a couple of math operations: df.memory_usage().sum() / (1024**2) #converting to megabytes 93.45909881591797 · So the total size is 93.46 MB.
🌐
Python⇒Speed
pythonspeed.com › articles › pandas-dataframe-series-memory-usage
Measuring the memory usage of a Pandas DataFrame
October 1, 2021 - Which is correct, is memory usage 8MB or 300MB? Neither! In this special case, it’s actually 67MB, at least with the default Python interpreter. This is partially because I cheated, and often 300MB will actually be closer to the truth. What’s going on? Let’s find out! Most Pandas columns are stored as NumPy arrays, and for types like integers or floats the values are stored inside the array itself.
🌐
Bobby Hadz
bobbyhadz.com › blog › get-memory-size-of-dataframe-in-pandas
How to get the Memory size of a DataFrame in Pandas | bobbyhadz
You can also use the sys.getsizeof() method to get the memory size of a DataFrame. ... Copied!import sys import pandas as pd df = pd.DataFrame({ 'Name': [ 'Alice', 'Bobby', 'Carl' ], 'Date': [ '2023-07-12', '2023-08-23', '2023-08-21' ] }) ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pandas-dataframe-memory_usage
Python | Pandas dataframe.memory_usage() - GeeksforGeeks
June 22, 2021 - Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.memory_usage() function return the memory usage of each column in bytes.
🌐
Dataquest
dataquest.io › blog › pandas-big-data
Tutorial: Using Pandas to Analyze Big Data in Python
April 9, 2023 - Average memory usage for float columns: 1.29 MB Average memory usage for int columns: 1.12 MB Average memory usage for object columns: 9.53 MB · Immediately we can see that most of our memory is used by our 78 object columns.
🌐
Reddit
reddit.com › r/learnpython › how to efficiently manage memory of pandas dataframe?
r/learnpython on Reddit: How to efficiently manage memory of pandas dataframe?
February 23, 2024 -

I have a huge dataset of one state in the US..

There were 50 csv files...each containing no. of rows = 232717

i used concat at the end after using map...using list comprehension, like this:

len of csv_files = 50...

df = pd.concat(map(pd.read_csv, csv_files), axis =1)

and the shape of the concatenated file is (232717, 2027) this is the shape of the file that we got after concatenation of 50 csv files.

when i saw its memory usage..it is around 3.6+ GB

now this was just for one state..I've to do it for all the states in the US..

so how can i effectively reduce memory...

i read about changing the datatype of each column...I'm planning on doing that..BUT there are mixed datatypes in some columns

but what else?...let me know if y'all can provide any inputs..

I'm doing this for the first time...so..thankss

Edit: I took one of the 50 csv files.. df.shape is (232717, 6) There was one row across all columns that was text..it is not required. So, merged that in the header...

  1. memory usage of this df = 10.7 MB

  2. converted 4 columns into numeric and downcasted the dtype from int64 to int8

  3. now the memory usage is 4.4 MB

Question :

  1. If I do this for all the 50 CSV files...will it reduce a substantial amount of memory?

  2. what else should I be doing?

🌐
GitHub
github.com › pandas-dev › pandas › issues › 11595
Optionally use sys.getsizeof in DataFrame.memory_usage · Issue #11595 · pandas-dev/pandas
November 13, 2015 - I would like to know how many bytes my dataframe takes up in memory. The standard way to do this is the memory_usage method df.memory_usage(index=True) For object dtype columns this measures 8 byte...
Author   mrocklin
🌐
Python⇒Speed
pythonspeed.com › articles › pandas-load-less-data
Reducing Pandas memory usage #1: lossless compression
January 6, 2023 - In the following example, I am ... pd.read_csv("voters.csv") >>> df.info(verbose=False, memory_usage="deep") ... memory usage: 71.2 MB >>> df = df[["First Name ", "Last Name "]] >>> df.info(verbose=False, memory_usage="deep") ...
🌐
Substack
machinelearningplus.substack.com › p › how-to-reduce-the-memory-size-of
How to reduce the memory size of Pandas Data frame?
May 13, 2024 - Mem. usage decreased to 2.02 Mb (23.5% reduction) Now’ lets check the memory usage. ... <class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ ...
🌐
Towards Data Science
towardsdatascience.com › home › latest › make working with large dataframes easier, at least for your memory
Make working with large DataFrames easier, at least for your memory | Towards Data Science
March 5, 2025 - Well, we reduce the size of the DataFrame from 237.6 Mb to 49.59 Mb, which is approximately 80%. But when we only consider variables that were modified, then the result is actually 94%! I would say this is a job well done 🙂 · One more tip ...