Pandas operates more like data.frame in this regard. You can check this using the memory_profiler package; here's an example of its use in the Jupyter notebook:

First define a program that will test this:

%%file df_memprofile.py
import numpy as np
import pandas as pd

def foo():
    x = np.random.rand(1000000, 5)
    y = pd.DataFrame(x, columns=list('abcde'))
    y.rename(columns = {'e': 'f'}, inplace=True)
    return y

Then load the memory profiler and run + profile the function

%load_ext memory_profiler
from df_memprofile import foo
%mprun -f foo foo()

I get the following output:

Filename: /Users/jakevdp/df_memprofile.py

Line #    Mem usage    Increment   Line Contents
================================================
     4     66.1 MiB     66.1 MiB   def foo():
     5    104.2 MiB     38.2 MiB       x = np.random.rand(1000000, 5)
     6    104.4 MiB      0.2 MiB       y = pd.DataFrame(x, columns=list('abcde'))
     7    142.6 MiB     38.2 MiB       y.rename(columns = {'e': 'f'}, inplace=True)
     8    142.6 MiB      0.0 MiB       return y

You can see a couple things:

  1. when y is created, it is just a light wrapper around the original array: i.e. no data is copied.

  2. When the column in y is renamed, it results in duplication of the entire data array in memory (it's the same 38MB increment as when x is created in the first place).

So, unless I'm missing something, it appears that Pandas operates more like R's dataframes than R's data tables.


Edit: Note that rename() has an argument copy that controls this behavior, and defaults to True. For example, using this:

y.rename(columns = {'e': 'f'}, inplace=True, copy=False)

... results in an inplace operation without copying data.

Alternatively, you can modify the columns attribute directly:

y.columns = ['a', 'b', 'c', 'd', 'f']
Answer from jakevdp on Stack Overflow
🌐
W3Schools
w3schools.com › python › pandas › pandas_dataframes.asp
Pandas DataFrames
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
Discussions

Are Pandas' dataframes (Python) closer to R's dataframes or datatables? - Stack Overflow
To understand my question, I should first point out that R datatables aren't just R dataframes with syntaxic sugar, there are important behavioral differences : column assignation/modification by More on stackoverflow.com
🌐 stackoverflow.com
What actually defines a DataFrame?
Dataframe is an engineering term, not some strongly defined theoretical term. If it looks like a dataframe, walks like a dataframe, swims like a dataframe, it’s probably a dataframe. More on reddit.com
🌐 r/dataengineering
30
47
March 24, 2025
How to save DataFrame to a file and read back
I used to use pickle format too, but based on this benchmark , I now favour the use of the feather format in most situations. More on reddit.com
🌐 r/pythontips
11
16
September 17, 2022
Appending Data to DataFrames
I don't know if you've solved an actual problem because you definitely can just append a row of data to a dataframe using df.append(data)... More on reddit.com
🌐 r/learnpython
14
1
December 24, 2021
🌐
Databricks
databricks.com › blog › what-are-dataframes
What are Dataframes? | Databricks
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet.
🌐
Juliadata
juliadata.github.io › DataFrames.jl › stable
Introduction · DataFrames.jl
This resource aims to teach you everything you need to know to get up and running with tabular data manipulation using the DataFrames.jl package.
Top answer
1 of 1
12

Pandas operates more like data.frame in this regard. You can check this using the memory_profiler package; here's an example of its use in the Jupyter notebook:

First define a program that will test this:

%%file df_memprofile.py
import numpy as np
import pandas as pd

def foo():
    x = np.random.rand(1000000, 5)
    y = pd.DataFrame(x, columns=list('abcde'))
    y.rename(columns = {'e': 'f'}, inplace=True)
    return y

Then load the memory profiler and run + profile the function

%load_ext memory_profiler
from df_memprofile import foo
%mprun -f foo foo()

I get the following output:

Filename: /Users/jakevdp/df_memprofile.py

Line #    Mem usage    Increment   Line Contents
================================================
     4     66.1 MiB     66.1 MiB   def foo():
     5    104.2 MiB     38.2 MiB       x = np.random.rand(1000000, 5)
     6    104.4 MiB      0.2 MiB       y = pd.DataFrame(x, columns=list('abcde'))
     7    142.6 MiB     38.2 MiB       y.rename(columns = {'e': 'f'}, inplace=True)
     8    142.6 MiB      0.0 MiB       return y

You can see a couple things:

  1. when y is created, it is just a light wrapper around the original array: i.e. no data is copied.

  2. When the column in y is renamed, it results in duplication of the entire data array in memory (it's the same 38MB increment as when x is created in the first place).

So, unless I'm missing something, it appears that Pandas operates more like R's dataframes than R's data tables.


Edit: Note that rename() has an argument copy that controls this behavior, and defaults to True. For example, using this:

y.rename(columns = {'e': 'f'}, inplace=True, copy=False)

... results in an inplace operation without copying data.

Alternatively, you can modify the columns attribute directly:

y.columns = ['a', 'b', 'c', 'd', 'f']
🌐
Reddit
reddit.com › r/dataengineering › what actually defines a dataframe?
r/dataengineering on Reddit: What actually defines a DataFrame?
March 24, 2025 -

I fear this is more a philosophical question then a technical one but I am a bit confused. I’ve been thinking a lot about what makes something a DataFrame, not just in terms of syntax or library, but from a conceptual standpoint.

My current definition is as such:

A DataFrame is a language native, programmable interface for querying and transforming tabular data. Its designed to be embedded directly in general purpose programming workflows.

I like this because it focuses on what a DataFrame is for, rather than what specific tools or libraries implement it.

I think however that this definition is too general and can lead to anything tabular with an API being described as a DF.

Properties that are not exclusive across DataFrames which I previously thought defined them:

  • mutability

    • pandas: mutable, you can add/remove/overwrite columns directly.

    • Spark DataFrames: immutable, transformations return new logical plans.

    • Polars (lazy mode): immutable, transformations build a new plan.

  • execution model

    • pandas: eager, executes immediately.

    • Spark / Polars (lazy): lazy, builds DAGs and executes on trigger.

  • in memory

    • pandas / polars: usually in-memory.

    • Spark: can spill to disk or operate on distributed data.

    • Ibist: abstract, backend might not be memory-bound at all.

Curious how others would describe and define DataFrames.

Find elsewhere
🌐
Apache Spark
spark.apache.org › docs › latest › sql-programming-guide.html
Spark SQL and DataFrames - Spark 4.1.1 Documentation
A DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables ...
🌐
Kaggle
kaggle.com › datasets
Find Open Datasets and Machine Learning Projects
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Wikipedia
en.wikipedia.org › wiki › Dataframe
Dataframe - Wikipedia
December 31, 2025 - Dataframe may refer to: · A tabular data structure common to many data processing libraries: · pandas (software) § DataFrame · The Dataframe API in Apache Spark · DFLib for Java · Data frames in the R programming language · Frame (networking) · Category: · Disambiguation pages · Search
🌐
Microsoft Learn
learn.microsoft.com › en-us › dotnet › machine-learning › how-to-guides › getting-started-dataframe
Get started with DataFrames - ML.NET | Microsoft Learn
December 19, 2024 - Learn how to get started with DataFrames, which are two-dimensional data structures for storing and manipulating data. DataFrames help with preparation of data for a machine learning model.
🌐
LeetCode
leetcode.com › studyplan › 30-days-of-pandas
30 Days of Pandas - Study Plan - LeetCode
Before starting the study plan, you should know basic Python and common data structures like syntax, data types, conditional statements, loops, functions, lists · After finishing the study plan, you'll learn from basic data operations like handling missing values to more intermediate concepts ...
🌐
Dask
docs.dask.org › en › stable › dataframe.html
Dask DataFrame — Dask documentation
Just pandas: Dask DataFrames are a collection of many pandas DataFrames.
🌐
Posit
posit.co › home › positron
Positron | A Next-Generation IDE for Data Science
August 11, 2025 - Open, view, sort, and filter in-memory dataframes or on-disk data for easy exploration alongside your code
🌐
AG Grid
ag-grid.com › react-data-grid › getting-started
React Grid: Quick Start | AG Grid
Build a React Table with AG Grid, the best free, fast and flexible React Data Grid. Features Sorting, Filtering, Pagination, Custom Components, and more. Download AG Grid v35.2.0 today: The best React Table & React Data Grid in the world.