Brave Search

Are Pandas' dataframes (Python) closer to R's dataframes or datatables? [closed]

stackoverflow.com › questions › 47819024 › are-pandas-dataframes-python-closer-to-rs-dataframes-or-datatables

Pandas operates more like data.frame in this regard. You can check this using the memory_profiler package; here's an example of its use in the Jupyter notebook:

First define a program that will test this:

%%file df_memprofile.py
import numpy as np
import pandas as pd

def foo():
    x = np.random.rand(1000000, 5)
    y = pd.DataFrame(x, columns=list('abcde'))
    y.rename(columns = {'e': 'f'}, inplace=True)
    return y

Then load the memory profiler and run + profile the function

%load_ext memory_profiler
from df_memprofile import foo
%mprun -f foo foo()

I get the following output:

Filename: /Users/jakevdp/df_memprofile.py

Line #    Mem usage    Increment   Line Contents
================================================
     4     66.1 MiB     66.1 MiB   def foo():
     5    104.2 MiB     38.2 MiB       x = np.random.rand(1000000, 5)
     6    104.4 MiB      0.2 MiB       y = pd.DataFrame(x, columns=list('abcde'))
     7    142.6 MiB     38.2 MiB       y.rename(columns = {'e': 'f'}, inplace=True)
     8    142.6 MiB      0.0 MiB       return y

You can see a couple things:

when y is created, it is just a light wrapper around the original array: i.e. no data is copied.
When the column in y is renamed, it results in duplication of the entire data array in memory (it's the same 38MB increment as when x is created in the first place).

So, unless I'm missing something, it appears that Pandas operates more like R's dataframes than R's data tables.

Edit: Note that rename() has an argument copy that controls this behavior, and defaults to True. For example, using this:

y.rename(columns = {'e': 'f'}, inplace=True, copy=False)

... results in an inplace operation without copying data.

Alternatively, you can modify the columns attribute directly:

y.columns = ['a', 'b', 'c', 'd', 'f']

Answer from jakevdp on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.html

pandas.DataFrame — pandas 3.0.2 documentation

Read a comma-separated values (csv) file into DataFrame.

W3Schools

w3schools.com › python › pandas › pandas_dataframes.asp

Pandas DataFrames

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

Discussions

Are Pandas' dataframes (Python) closer to R's dataframes or datatables? - Stack Overflow

To understand my question, I should first point out that R datatables aren't just R dataframes with syntaxic sugar, there are important behavioral differences : column assignation/modification by More on stackoverflow.com

stackoverflow.com

What actually defines a DataFrame?

Dataframe is an engineering term, not some strongly defined theoretical term. If it looks like a dataframe, walks like a dataframe, swims like a dataframe, it’s probably a dataframe. More on reddit.com

r/dataengineering

March 24, 2025

How to save DataFrame to a file and read back

I used to use pickle format too, but based on this benchmark , I now favour the use of the feather format in most situations. More on reddit.com

r/pythontips

September 17, 2022

Appending Data to DataFrames

I don't know if you've solved an actual problem because you definitely can just append a row of data to a dataframe using df.append(data)... More on reddit.com

r/learnpython

December 24, 2021

Videos