Reducing memory usage in Python is difficult, because Python does not actually release memory back to the operating system. If you delete objects, then the memory is available to new Python objects, but not free()'d back to the system (see this question).

If you stick to numeric numpy arrays, those are freed, but boxed objects are not.

>>> import os, psutil, numpy as np # psutil may need to be installed
>>> def usage():
...     process = psutil.Process(os.getpid())
...     return process.memory_info()[0] / float(2 ** 20)
... 
>>> usage() # initial memory usage
27.5 

>>> arr = np.arange(10 ** 8) # create a large array without boxing
>>> usage()
790.46875
>>> del arr
>>> usage()
27.52734375 # numpy just free()'d the array

>>> arr = np.arange(10 ** 8, dtype='O') # create lots of objects
>>> usage()
3135.109375
>>> del arr
>>> usage()
2372.16796875  # numpy frees the array, but python keeps the heap big

Reducing the Number of Dataframes

Python keep our memory at high watermark, but we can reduce the total number of dataframes we create. When modifying your dataframe, prefer inplace=True, so you don't create copies.

Another common gotcha is holding on to copies of previously created dataframes in ipython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'foo': [1,2,3,4]})

In [3]: df + 1
Out[3]: 
   foo
0    2
1    3
2    4
3    5

In [4]: df + 2
Out[4]: 
   foo
0    3
1    4
2    5
3    6

In [5]: Out # Still has all our temporary DataFrame objects!
Out[5]: 
{3:    foo
 0    2
 1    3
 2    4
 3    5, 4:    foo
 0    3
 1    4
 2    5
 3    6}

You can fix this by typing %reset Out to clear your history. Alternatively, you can adjust how much history ipython keeps with ipython --cache-size=5 (default is 1000).

Reducing Dataframe Size

Wherever possible, avoid using object dtypes.

>>> df.dtypes
foo    float64 # 8 bytes per value
bar      int64 # 8 bytes per value
baz     object # at least 48 bytes per value, often more

Values with an object dtype are boxed, which means the numpy array just contains a pointer and you have a full Python object on the heap for every value in your dataframe. This includes strings.

Whilst numpy supports fixed-size strings in arrays, pandas does not (it's caused user confusion). This can make a significant difference:

>>> import numpy as np
>>> arr = np.array(['foo', 'bar', 'baz'])
>>> arr.dtype
dtype('S3')
>>> arr.nbytes
9

>>> import sys; import pandas as pd
>>> s = pd.Series(['foo', 'bar', 'baz'])
dtype('O')
>>> sum(sys.getsizeof(x) for x in s)
120

You may want to avoid using string columns, or find a way of representing string data as numbers.

If you have a dataframe that contains many repeated values (NaN is very common), then you can use a sparse data structure to reduce memory usage:

>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 605.5 MB

>>> df1.shape
(39681584, 1)

>>> df1.foo.isnull().sum() * 100. / len(df1)
20.628483479893344 # so 20% of values are NaN

>>> df1.to_sparse().info()
<class 'pandas.sparse.frame.SparseDataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 543.0 MB

Viewing Memory Usage

You can view the memory usage (docs):

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 14 columns):
...
dtypes: datetime64ns, float64(8), int64(1), object(4)
memory usage: 4.4+ GB

As of pandas 0.17.1, you can also do df.info(memory_usage='deep') to see memory usage including objects.

Answer from Wilfred Hughes on Stack Overflow
Top answer
1 of 7
201

Reducing memory usage in Python is difficult, because Python does not actually release memory back to the operating system. If you delete objects, then the memory is available to new Python objects, but not free()'d back to the system (see this question).

If you stick to numeric numpy arrays, those are freed, but boxed objects are not.

>>> import os, psutil, numpy as np # psutil may need to be installed
>>> def usage():
...     process = psutil.Process(os.getpid())
...     return process.memory_info()[0] / float(2 ** 20)
... 
>>> usage() # initial memory usage
27.5 

>>> arr = np.arange(10 ** 8) # create a large array without boxing
>>> usage()
790.46875
>>> del arr
>>> usage()
27.52734375 # numpy just free()'d the array

>>> arr = np.arange(10 ** 8, dtype='O') # create lots of objects
>>> usage()
3135.109375
>>> del arr
>>> usage()
2372.16796875  # numpy frees the array, but python keeps the heap big

Reducing the Number of Dataframes

Python keep our memory at high watermark, but we can reduce the total number of dataframes we create. When modifying your dataframe, prefer inplace=True, so you don't create copies.

Another common gotcha is holding on to copies of previously created dataframes in ipython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'foo': [1,2,3,4]})

In [3]: df + 1
Out[3]: 
   foo
0    2
1    3
2    4
3    5

In [4]: df + 2
Out[4]: 
   foo
0    3
1    4
2    5
3    6

In [5]: Out # Still has all our temporary DataFrame objects!
Out[5]: 
{3:    foo
 0    2
 1    3
 2    4
 3    5, 4:    foo
 0    3
 1    4
 2    5
 3    6}

You can fix this by typing %reset Out to clear your history. Alternatively, you can adjust how much history ipython keeps with ipython --cache-size=5 (default is 1000).

Reducing Dataframe Size

Wherever possible, avoid using object dtypes.

>>> df.dtypes
foo    float64 # 8 bytes per value
bar      int64 # 8 bytes per value
baz     object # at least 48 bytes per value, often more

Values with an object dtype are boxed, which means the numpy array just contains a pointer and you have a full Python object on the heap for every value in your dataframe. This includes strings.

Whilst numpy supports fixed-size strings in arrays, pandas does not (it's caused user confusion). This can make a significant difference:

>>> import numpy as np
>>> arr = np.array(['foo', 'bar', 'baz'])
>>> arr.dtype
dtype('S3')
>>> arr.nbytes
9

>>> import sys; import pandas as pd
>>> s = pd.Series(['foo', 'bar', 'baz'])
dtype('O')
>>> sum(sys.getsizeof(x) for x in s)
120

You may want to avoid using string columns, or find a way of representing string data as numbers.

If you have a dataframe that contains many repeated values (NaN is very common), then you can use a sparse data structure to reduce memory usage:

>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 605.5 MB

>>> df1.shape
(39681584, 1)

>>> df1.foo.isnull().sum() * 100. / len(df1)
20.628483479893344 # so 20% of values are NaN

>>> df1.to_sparse().info()
<class 'pandas.sparse.frame.SparseDataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 543.0 MB

Viewing Memory Usage

You can view the memory usage (docs):

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 14 columns):
...
dtypes: datetime64ns, float64(8), int64(1), object(4)
memory usage: 4.4+ GB

As of pandas 0.17.1, you can also do df.info(memory_usage='deep') to see memory usage including objects.

2 of 7
84

As noted in the comments, there are some things to try: gc.collect (@EdChum) may clear stuff, for example. At least from my experience, these things sometimes work and often don't.

There is one thing that always works, however, because it is done at the OS, not language, level.

Suppose you have a function that creates an intermediate huge DataFrame, and returns a smaller result (which might also be a DataFrame):

def huge_intermediate_calc(something):
    ...
    huge_df = pd.DataFrame(...)
    ...
    return some_aggregate

Then if you do something like

import multiprocessing

result = multiprocessing.Pool(1).map(huge_intermediate_calc, [something_])[0]

Then the function is executed at a different process. When that process completes, the OS retakes all the resources it used. There's really nothing Python, pandas, the garbage collector, could do to stop that.

🌐
Saturn Cloud
saturncloud.io › blog › how-to-release-memory-used-by-a-pandas-dataframe
How to Release Memory Used by a Pandas DataFrame | Saturn Cloud Blog
December 7, 2023 - One simple way to release memory used by a Pandas DataFrame is to use the del statement to delete the DataFrame object.
Discussions

Pandas's memory management
Hi, Lately I have started to use pandas heavily with relatively big files. When the RAM becomes full I delete (with del and gc) the big objects (Data frames), though this doesn't clear the memo... More on github.com
🌐 github.com
18
March 24, 2015
How to delete multiple pandas (python) dataframes from memory to save RAM? - Stack Overflow
In essence, not even gc.collect() ... dataframes in a different process will ensure that the resources taken by the process are given back when your process ends. The link also has tips on how to reduce memory usage by Pandas, in general. ... When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected. If you want to release memory, your ... More on stackoverflow.com
🌐 stackoverflow.com
Variable deletion consumes a lot of memory
Hi team, I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am g... More on github.com
🌐 github.com
8
July 27, 2017
python - Delete and release memory of a single pandas dataframe - Stack Overflow
I am running a long ETL pipeline in pandas. I have to create different pandas dataframes and I want to release memory for some of the dataframes. I have been reading how to release memory and I saw... More on stackoverflow.com
🌐 stackoverflow.com
🌐
Reddit
reddit.com › r/learnpython › how to release the memory of dataframe ?
r/learnpython on Reddit: How to release the memory of dataframe ?
July 19, 2021 -

I have several big csv file. I want to extract the column "item id" in each on them.

And combine all of them and return a unique one.

My code is as follow:

    for csv_file in folder:
        df = pd.read_csv(csv_file)
        list_df.append(df['item id'])

    df_all_itemNo = pd.concat(list_df, ignore_index=True)
    df_all_itemNo = df_all_itemNo.drop_duplicates()

It is working when there is only a few csv file. The problem is when several big csv is read, all of my computer memory is used up.

From the memory usage graph, I see that the memory was keep on increasing. It never release back when every time

df = pd.read_csv(csv_file) is executed. The old df was stuck in memory.

Is there any solutions ?

🌐
GeeksforGeeks
geeksforgeeks.org › python › memory-leak-using-pandas-dataframe
Memory leak using Pandas DataFrame - GeeksforGeeks
July 23, 2025 - Delete the reference to the DataFrame. Call the malloc_trim function with a zero argument. This will release any memory that was previously allocated using the malloc function and is no longer being used by the application.
🌐
GitHub
github.com › pandas-dev › pandas › issues › 9716
Pandas's memory management · Issue #9716 · pandas-dev/pandas
March 24, 2015 - Hi, Lately I have started to use pandas heavily with relatively big files. When the RAM becomes full I delete (with del and gc) the big objects (Data frames), though this doesn't clear the memo...
Author   borisRa
🌐
GitHub
github.com › pandas-dev › pandas › issues › 17092
Variable deletion consumes a lot of memory · Issue #17092 · pandas-dev/pandas
July 27, 2017 - I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am getting rid of part of the data, but it still needs to allocate a big amount of memory producing MemoryErrors. Just to give you a little bit of context, I am working with a DataFrame which contains 33M of rows and 500 columns (just a big one!), almost all of them numeric, in a machine with 360GB of RAM.
Author   ivallesp
Top answer
1 of 1
13

From the original link that you included, you have to include variable in the list, delete the variable and then delete the list. If you just add to the list, it won't delete the original dataframe, when you delete the list.

import pandas
import psutil 
import gc
psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 68.44267845153809

df = pd.read_csv('pythonSRC/bigFile.txt',sep='|')
len(df)
>> 20082056

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total

>> 56.380510330200195

lst = [df]
del lst

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 56.22601509094238

lst = [df]
del df
del lst

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 76.77617073059082

gc.collect()

>> 0

I tried also just deleting the dataframe and using gc.collect() with the same result!

del df
gc.collect()
psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 76.59363746643066

However, the execution time of adding the dataframe to the list and deleting the list and the variable is a bit faster then calling gc.collect(). I used time.time() to measure the difference and gc.collect() was almost a full second slower!

EDIT:

according to the correct comment below, del df and del [df] indeed generate the same code. The problem with the original post, and my original answer is that as soon as you give a name to the list as in lst=[df], you are no longer referencing the original dataframe.

lst=[df] 
del lst

is not the same as:

del [df]
Find elsewhere
🌐
Kaggle
kaggle.com › questions-and-answers › 55558
Is it possible to free memory during kernel execution?
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
GeeksforGeeks
geeksforgeeks.org › pandas-memory-management
Pandas Memory Management - GeeksforGeeks
April 2, 2025 - We can also delete the null columns present in the data frame which can also lead to saving more space. We can use the del keyword followed by the item you want to delete. This should remove the item.
🌐
Medium
medium.com › @gautamrajotya › how-to-reduce-memory-usage-in-python-pandas-158427a99001
How to reduce memory usage in Python (Pandas)? | by Gautamrajotya | Medium
October 20, 2022 - The info() method in Pandas tells us how much memory is being taken up by a particular dataframe. To do this, we can assign the memory_usage argument a value = “deep” within the info() method.
🌐
Saturn Cloud
saturncloud.io › blog › releasing-memory-used-by-pandas-dataframes-but-not-slices
Releasing Memory Used by Pandas DataFrames But Not Slices | Saturn Cloud Blog
September 9, 2023 - Before we dive into the solution, let’s understand how Pandas manages memory. When you create a DataFrame, Pandas allocates memory to store the data. This memory isn’t automatically released when you delete the DataFrame using the del keyword.
🌐
YouTube
youtube.com › watch
Stop wasting memory in your Pandas DataFrame! - YouTube
Watch how quickly we can reduce your DataFrame's memory usage with just a couple of tips.00:00 - Intro00:10 - Initial read_csv00:49 - Tip 1: usecols1:58 - Ca...
Published   May 10, 2022
🌐
GitHub
github.com › pandas-dev › pandas › issues › 49582
PERF: Memory leak when returning subset of DataFrame and deleting the rest · Issue #49582 · pandas-dev/pandas
November 8, 2022 - Pandas version: 1.1.5 Initial memory usage: 92.499968 MB Memory usage after iteration: 2263.015424 MB Memory usage after iteration: 2538.786816 MB Memory usage after iteration: 2814.967808 MB Memory usage after iteration: 3090.624512 MB Memory usage after iteration: 3366.793216 MB Memory usage after iteration: 3642.445824 MB Memory usage after iteration: 3918.639104 MB Memory usage after iteration: 4194.566144 MB Memory usage after iteration: 4470.468608 MB Memory usage after iteration: 4746.375168 MB Memory usage of df: 1124.821968 Final memory usage: 4746.088448 MB
Author   mar-ses
🌐
DataCamp
campus.datacamp.com › courses › cleaning-data-with-pyspark › improving-performance
Removing a DataFrame from cache | Spark
You've finished the analysis tasks with the departures_df DataFrame, but have some other processing to do. You'd like to remove the DataFrame from the cache to prevent any excess memory usage on your cluster.
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.memory_usage.html
pandas.DataFrame.memory_usage — pandas 3.0.2 documentation
The memory usage can optionally include the contribution of the index and elements of object dtype. This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.
🌐
Stack Overflow
stackoverflow.com › questions › 49940899 › efficient-way-to-clear-memory-while-removing-df-python-3-pandas
Efficient way to clear memory while removing DF python 3 pandas - Stack Overflow
Hi! Is there any efficient way in python 3 to release memory after removing object? del DF DF = pd.DataFrame() gc.collect() None of those is working properly. Only working approach is to DF.to_csv
🌐
CopyProgramming
copyprogramming.com › howto › how-do-i-release-memory-used-by-a-pandas-dataframe
Pandas Memory Usage: Complete Guide to Releasing Memory and Optimizing DataFrames in 2026 - Pandas memory usage complete guide to releasing memory and optimizing
December 27, 2025 - The most reliable way to release DataFrame memory is combining del with gc.collect(). This works because deleting the variable removes pandas' reference to the data, and calling the garbage collector explicitly triggers cleanup: