pandas delete dataframe release memory

How do I release memory used by a pandas dataframe?

stackoverflow.com › questions › 39100971 › how-do-i-release-memory-used-by-a-pandas-dataframe

Reducing memory usage in Python is difficult, because Python does not actually release memory back to the operating system. If you delete objects, then the memory is available to new Python objects, but not free()'d back to the system (see this question).

If you stick to numeric numpy arrays, those are freed, but boxed objects are not.

>>> import os, psutil, numpy as np # psutil may need to be installed
>>> def usage():
...     process = psutil.Process(os.getpid())
...     return process.memory_info()[0] / float(2 ** 20)
... 
>>> usage() # initial memory usage
27.5 

>>> arr = np.arange(10 ** 8) # create a large array without boxing
>>> usage()
790.46875
>>> del arr
>>> usage()
27.52734375 # numpy just free()'d the array

>>> arr = np.arange(10 ** 8, dtype='O') # create lots of objects
>>> usage()
3135.109375
>>> del arr
>>> usage()
2372.16796875  # numpy frees the array, but python keeps the heap big

Reducing the Number of Dataframes

Python keep our memory at high watermark, but we can reduce the total number of dataframes we create. When modifying your dataframe, prefer inplace=True, so you don't create copies.

Another common gotcha is holding on to copies of previously created dataframes in ipython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'foo': [1,2,3,4]})

In [3]: df + 1
Out[3]: 
   foo
0    2
1    3
2    4
3    5

In [4]: df + 2
Out[4]: 
   foo
0    3
1    4
2    5
3    6

In [5]: Out # Still has all our temporary DataFrame objects!
Out[5]: 
{3:    foo
 0    2
 1    3
 2    4
 3    5, 4:    foo
 0    3
 1    4
 2    5
 3    6}

You can fix this by typing %reset Out to clear your history. Alternatively, you can adjust how much history ipython keeps with ipython --cache-size=5 (default is 1000).

Reducing Dataframe Size

Wherever possible, avoid using object dtypes.

>>> df.dtypes
foo    float64 # 8 bytes per value
bar      int64 # 8 bytes per value
baz     object # at least 48 bytes per value, often more

Values with an object dtype are boxed, which means the numpy array just contains a pointer and you have a full Python object on the heap for every value in your dataframe. This includes strings.

Whilst numpy supports fixed-size strings in arrays, pandas does not (it's caused user confusion). This can make a significant difference:

>>> import numpy as np
>>> arr = np.array(['foo', 'bar', 'baz'])
>>> arr.dtype
dtype('S3')
>>> arr.nbytes
9

>>> import sys; import pandas as pd
>>> s = pd.Series(['foo', 'bar', 'baz'])
dtype('O')
>>> sum(sys.getsizeof(x) for x in s)
120

You may want to avoid using string columns, or find a way of representing string data as numbers.

If you have a dataframe that contains many repeated values (NaN is very common), then you can use a sparse data structure to reduce memory usage:

>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 605.5 MB

>>> df1.shape
(39681584, 1)

>>> df1.foo.isnull().sum() * 100. / len(df1)
20.628483479893344 # so 20% of values are NaN

>>> df1.to_sparse().info()
<class 'pandas.sparse.frame.SparseDataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 543.0 MB

Viewing Memory Usage

You can view the memory usage (docs):

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 14 columns):
...
dtypes: datetime64ns, float64(8), int64(1), object(4)
memory usage: 4.4+ GB

As of pandas 0.17.1, you can also do df.info(memory_usage='deep') to see memory usage including objects.

Answer from Wilfred Hughes on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 39100971 › how-do-i-release-memory-used-by-a-pandas-dataframe

python - How do I release memory used by a pandas dataframe? - Stack Overflow

Top answer

1 of 7

201

If you stick to numeric numpy arrays, those are freed, but boxed objects are not.

>>> import os, psutil, numpy as np # psutil may need to be installed
>>> def usage():
...     process = psutil.Process(os.getpid())
...     return process.memory_info()[0] / float(2 ** 20)
... 
>>> usage() # initial memory usage
27.5 

>>> arr = np.arange(10 ** 8) # create a large array without boxing
>>> usage()
790.46875
>>> del arr
>>> usage()
27.52734375 # numpy just free()'d the array

>>> arr = np.arange(10 ** 8, dtype='O') # create lots of objects
>>> usage()
3135.109375
>>> del arr
>>> usage()
2372.16796875  # numpy frees the array, but python keeps the heap big

Reducing the Number of Dataframes

Python keep our memory at high watermark, but we can reduce the total number of dataframes we create. When modifying your dataframe, prefer inplace=True, so you don't create copies.

Another common gotcha is holding on to copies of previously created dataframes in ipython:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'foo': [1,2,3,4]})

In [3]: df + 1
Out[3]: 
   foo
0    2
1    3
2    4
3    5

In [4]: df + 2
Out[4]: 
   foo
0    3
1    4
2    5
3    6

In [5]: Out # Still has all our temporary DataFrame objects!
Out[5]: 
{3:    foo
 0    2
 1    3
 2    4
 3    5, 4:    foo
 0    3
 1    4
 2    5
 3    6}

You can fix this by typing %reset Out to clear your history. Alternatively, you can adjust how much history ipython keeps with ipython --cache-size=5 (default is 1000).

Reducing Dataframe Size

Wherever possible, avoid using object dtypes.

>>> df.dtypes
foo    float64 # 8 bytes per value
bar      int64 # 8 bytes per value
baz     object # at least 48 bytes per value, often more

Values with an object dtype are boxed, which means the numpy array just contains a pointer and you have a full Python object on the heap for every value in your dataframe. This includes strings.

Whilst numpy supports fixed-size strings in arrays, pandas does not (it's caused user confusion). This can make a significant difference:

>>> import numpy as np
>>> arr = np.array(['foo', 'bar', 'baz'])
>>> arr.dtype
dtype('S3')
>>> arr.nbytes
9

>>> import sys; import pandas as pd
>>> s = pd.Series(['foo', 'bar', 'baz'])
dtype('O')
>>> sum(sys.getsizeof(x) for x in s)
120

You may want to avoid using string columns, or find a way of representing string data as numbers.

If you have a dataframe that contains many repeated values (NaN is very common), then you can use a sparse data structure to reduce memory usage:

>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 605.5 MB

>>> df1.shape
(39681584, 1)

>>> df1.foo.isnull().sum() * 100. / len(df1)
20.628483479893344 # so 20% of values are NaN

>>> df1.to_sparse().info()
<class 'pandas.sparse.frame.SparseDataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 1 columns):
foo    float64
dtypes: float64(1)
memory usage: 543.0 MB

Viewing Memory Usage

You can view the memory usage (docs):

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39681584 entries, 0 to 39681583
Data columns (total 14 columns):
...
dtypes: datetime64ns, float64(8), int64(1), object(4)
memory usage: 4.4+ GB

As of pandas 0.17.1, you can also do df.info(memory_usage='deep') to see memory usage including objects.

2 of 7

As noted in the comments, there are some things to try: gc.collect (@EdChum) may clear stuff, for example. At least from my experience, these things sometimes work and often don't.

There is one thing that always works, however, because it is done at the OS, not language, level.

Suppose you have a function that creates an intermediate huge DataFrame, and returns a smaller result (which might also be a DataFrame):

def huge_intermediate_calc(something):
    ...
    huge_df = pd.DataFrame(...)
    ...
    return some_aggregate

Then if you do something like

import multiprocessing

result = multiprocessing.Pool(1).map(huge_intermediate_calc, [something_])[0]

Then the function is executed at a different process. When that process completes, the OS retakes all the resources it used. There's really nothing Python, pandas, the garbage collector, could do to stop that.

Saturn Cloud

saturncloud.io › blog › how-to-release-memory-used-by-a-pandas-dataframe

How to Release Memory Used by a Pandas DataFrame | Saturn Cloud Blog

December 7, 2023 - One simple way to release memory used by a Pandas DataFrame is to use the del statement to delete the DataFrame object.

Discussions

Pandas's memory management

Hi, Lately I have started to use pandas heavily with relatively big files. When the RAM becomes full I delete (with del and gc) the big objects (Data frames), though this doesn't clear the memo... More on github.com

github.com

March 24, 2015

How to delete multiple pandas (python) dataframes from memory to save RAM? - Stack Overflow

In essence, not even gc.collect() ... dataframes in a different process will ensure that the resources taken by the process are given back when your process ends. The link also has tips on how to reduce memory usage by Pandas, in general. ... When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected. If you want to release memory, your ... More on stackoverflow.com

stackoverflow.com

Variable deletion consumes a lot of memory

Hi team, I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am g... More on github.com

github.com

July 27, 2017

python - Delete and release memory of a single pandas dataframe - Stack Overflow

I am running a long ETL pipeline in pandas. I have to create different pandas dataframes and I want to release memory for some of the dataframes. I have been reading how to release memory and I saw... More on stackoverflow.com

stackoverflow.com

reddit.com › r/learnpython › how to release the memory of dataframe ?

r/learnpython on Reddit: How to release the memory of dataframe ?

July 19, 2021 -

I have several big csv file. I want to extract the column "item id" in each on them.

And combine all of them and return a unique one.

My code is as follow:

    for csv_file in folder:
        df = pd.read_csv(csv_file)
        list_df.append(df['item id'])

    df_all_itemNo = pd.concat(list_df, ignore_index=True)
    df_all_itemNo = df_all_itemNo.drop_duplicates()

It is working when there is only a few csv file. The problem is when several big csv is read, all of my computer memory is used up.

From the memory usage graph, I see that the memory was keep on increasing. It never release back when every time

df = pd.read_csv(csv_file) is executed. The old df was stuck in memory.

Is there any solutions ?

Top answer

1 of 3

Not sure if this will work better, but you could try: data = (pd.read_csv(file, usecols=['item id']) for file in folder) df = pd.concat(data, ignore_index=True)

2 of 3

One nifty trick I saw that might help you here is using import multiprocessing. First up, put your code into a function def get_column(params): df = pd.read_csv(params[0]) return df[params[1]] and then call it using multiprocessing with a context manager: import multiprocessing if __name__ == '__main__': for csv_file in folder: with multiprocessing.Pool(1) as p: items = p.map(get_column, [(csv_file,'item id')])[0] list_df.append(items) This forces the read to be in a separate process and lets the OS reclaim the memory instead of letting python attempt to hold on to it. You could/should probably extend this into running n processes instead of just 1 at a time. if __name__ == '__main__': with multiprocessing.Pool(len(folder)) as p: list_df = p.map(get_column, [(csv_file,'item id') for csv_file in folder]) If you try this, please let me know if it makes a difference for you. edit: So I tested this locally and it seems to work. With the second one obviously being much faster

GeeksforGeeks

geeksforgeeks.org › python › memory-leak-using-pandas-dataframe

Memory leak using Pandas DataFrame - GeeksforGeeks

July 23, 2025 - Delete the reference to the DataFrame. Call the malloc_trim function with a zero argument. This will release any memory that was previously allocated using the malloc function and is no longer being used by the application.

GitHub

github.com › pandas-dev › pandas › issues › 9716

Pandas's memory management · Issue #9716 · pandas-dev/pandas

March 24, 2015 - Hi, Lately I have started to use pandas heavily with relatively big files. When the RAM becomes full I delete (with del and gc) the big objects (Data frames), though this doesn't clear the memo...

Author borisRa

Stack Overflow

stackoverflow.com › questions › 32247643 › how-to-delete-multiple-pandas-python-dataframes-from-memory-to-save-ram

How to delete multiple pandas (python) dataframes from memory to save RAM? - Stack Overflow

Top answer

1 of 5

`del` statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now

2 of 5

In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).

You can manually trigger the garbage collection using

import gc
gc.collect()

But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.

Reference

GitHub

github.com › pandas-dev › pandas › issues › 17092

Variable deletion consumes a lot of memory · Issue #17092 · pandas-dev/pandas

July 27, 2017 - I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am getting rid of part of the data, but it still needs to allocate a big amount of memory producing MemoryErrors. Just to give you a little bit of context, I am working with a DataFrame which contains 33M of rows and 500 columns (just a big one!), almost all of them numeric, in a machine with 360GB of RAM.

Author ivallesp

Stack Overflow

stackoverflow.com › questions › 64368565 › delete-and-release-memory-of-a-single-pandas-dataframe

python - Delete and release memory of a single pandas dataframe - Stack Overflow

Top answer

1 of 1

From the original link that you included, you have to include variable in the list, delete the variable and then delete the list. If you just add to the list, it won't delete the original dataframe, when you delete the list.

import pandas
import psutil 
import gc
psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 68.44267845153809

df = pd.read_csv('pythonSRC/bigFile.txt',sep='|')
len(df)
>> 20082056

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total

>> 56.380510330200195

lst = [df]
del lst

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 56.22601509094238

lst = [df]
del df
del lst

psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 76.77617073059082

gc.collect()

>> 0

I tried also just deleting the dataframe and using gc.collect() with the same result!

del df
gc.collect()
psutil.virtual_memory().available * 100 / psutil.virtual_memory().total
>> 76.59363746643066

However, the execution time of adding the dataframe to the list and deleting the list and the variable is a bit faster then calling gc.collect(). I used time.time() to measure the difference and gc.collect() was almost a full second slower!

EDIT:

according to the correct comment below, del df and del [df] indeed generate the same code. The problem with the original post, and my original answer is that as soon as you give a name to the list as in lst=[df], you are no longer referencing the original dataframe.

lst=[df] 
del lst

is not the same as:

del [df]

Find elsewhere

Google Bing Mojeek

Kaggle

kaggle.com › questions-and-answers › 55558

Is it possible to free memory during kernel execution?

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

GeeksforGeeks

geeksforgeeks.org › pandas-memory-management

Pandas Memory Management - GeeksforGeeks

April 2, 2025 - We can also delete the null columns present in the data frame which can also lead to saving more space. We can use the del keyword followed by the item you want to delete. This should remove the item.

Medium

medium.com › @gautamrajotya › how-to-reduce-memory-usage-in-python-pandas-158427a99001

How to reduce memory usage in Python (Pandas)? | by Gautamrajotya | Medium

October 20, 2022 - The info() method in Pandas tells us how much memory is being taken up by a particular dataframe. To do this, we can assign the memory_usage argument a value = “deep” within the info() method.

Saturn Cloud

saturncloud.io › blog › releasing-memory-used-by-pandas-dataframes-but-not-slices

Releasing Memory Used by Pandas DataFrames But Not Slices | Saturn Cloud Blog

September 9, 2023 - Before we dive into the solution, let’s understand how Pandas manages memory. When you create a DataFrame, Pandas allocates memory to store the data. This memory isn’t automatically released when you delete the DataFrame using the del keyword.

YouTube

youtube.com › watch

Stop wasting memory in your Pandas DataFrame! - YouTube

05:00

Watch how quickly we can reduce your DataFrame's memory usage with just a couple of tips.00:00 - Intro00:10 - Initial read_csv00:49 - Tip 1: usecols1:58 - Ca...

Published May 10, 2022

Pythonpedia

pythonpedia.com › en › knowledge-base › 32247643 › how-to-delete-multiple-pandas--python--dataframes-from-memory-to-save-ram-

How to delete multiple pandas (python) dataframes from memory to save RAM? | Python Language Knowledge Base

Top answer

1 of 1

del statement does not delete an instance, it merely deletes a name. · When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected. · If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them. · If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection. · >>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()] >>> del lst # memory is released · If you created some variables, you have to delete them all. · >>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame() · >>> lst = [a, b, c] · >>> del a, b, c # dfs still in list · >>> del lst # memory release now

GitHub

github.com › pandas-dev › pandas › issues › 49582

PERF: Memory leak when returning subset of DataFrame and deleting the rest · Issue #49582 · pandas-dev/pandas

November 8, 2022 - Pandas version: 1.1.5 Initial memory usage: 92.499968 MB Memory usage after iteration: 2263.015424 MB Memory usage after iteration: 2538.786816 MB Memory usage after iteration: 2814.967808 MB Memory usage after iteration: 3090.624512 MB Memory usage after iteration: 3366.793216 MB Memory usage after iteration: 3642.445824 MB Memory usage after iteration: 3918.639104 MB Memory usage after iteration: 4194.566144 MB Memory usage after iteration: 4470.468608 MB Memory usage after iteration: 4746.375168 MB Memory usage of df: 1124.821968 Final memory usage: 4746.088448 MB

Author mar-ses

Zditect

zditect.com › blog › 58732114.html

pandas delete dataframe to free memory

We cannot provide a description for this page right now

DataCamp

campus.datacamp.com › courses › cleaning-data-with-pyspark › improving-performance

Removing a DataFrame from cache | Spark

You've finished the analysis tasks with the departures_df DataFrame, but have some other processing to do. You'd like to remove the DataFrame from the cache to prevent any excess memory usage on your cluster.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.memory_usage.html

pandas.DataFrame.memory_usage — pandas 3.0.2 documentation

The memory usage can optionally include the contribution of the index and elements of object dtype. This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Stack Overflow

stackoverflow.com › questions › 49940899 › efficient-way-to-clear-memory-while-removing-df-python-3-pandas

Efficient way to clear memory while removing DF python 3 pandas - Stack Overflow

Hi! Is there any efficient way in python 3 to release memory after removing object? del DF DF = pd.DataFrame() gc.collect() None of those is working properly. Only working approach is to DF.to_csv

CopyProgramming

copyprogramming.com › howto › how-do-i-release-memory-used-by-a-pandas-dataframe

Pandas Memory Usage: Complete Guide to Releasing Memory and Optimizing DataFrames in 2026 - Pandas memory usage complete guide to releasing memory and optimizing

December 27, 2025 - The most reliable way to release DataFrame memory is combining del with gc.collect(). This works because deleting the variable removes pandas' reference to the data, and calling the garbage collector explicitly triggers cleanup: