del statement does not delete an instance, it merely deletes a name.
When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.
If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.
If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.
>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst # memory is released
If you created some variables, you have to delete them all.
>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst # memory release now
Answer from pacholik on Stack Overflowdel statement does not delete an instance, it merely deletes a name.
When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.
If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.
If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.
>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst # memory is released
If you created some variables, you have to delete them all.
>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst # memory release now
In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).
You can manually trigger the garbage collection using
import gc
gc.collect()
But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.
Reference
Removing rows from pandas dataframe efficiently?
[Pandas] Efficiently delete rows from dataframe
How to release the memory of dataframe ?
r - How do I delete all pandas dataframe created by my python code - Stack Overflow
Videos
I have a dataframe containing around 2M rows and 6 columns. Based on 3 of those columns I want to delete certain rows. ATM my code looks like this:
df = df.drop( df[ (df.X == x) & (df.Y==y) & (df.Z==Z)].index )
Unsurprisingly this isn't really fast, however, I couldn't find a way to do it faster.
PS: It's not that it takes ages, just 1 or 2 seconds, but I have to do it 30-40 times each run, so it adds up.
I have several big csv file. I want to extract the column "item id" in each on them.
And combine all of them and return a unique one.
My code is as follow:
for csv_file in folder:
df = pd.read_csv(csv_file)
list_df.append(df['item id'])
df_all_itemNo = pd.concat(list_df, ignore_index=True)
df_all_itemNo = df_all_itemNo.drop_duplicates()It is working when there is only a few csv file. The problem is when several big csv is read, all of my computer memory is used up.
From the memory usage graph, I see that the memory was keep on increasing. It never release back when every time
df = pd.read_csv(csv_file) is executed. The old df was stuck in memory.
Is there any solutions ?