pandas compare two dataframes row by row

Compare two DataFrames and output their differences side-by-side

stackoverflow.com › questions › 17095101 › compare-two-dataframes-and-output-their-differences-side-by-side

The first part is similar to Constantine, you can get the boolean of which rows are empty*:

In [21]: ne = (df1 != df2).any(1)

In [22]: ne
Out[22]:
0    False
1     True
2     True
dtype: bool

Then we can see which entries have changed:

In [23]: ne_stacked = (df1 != df2).stack()

In [24]: changed = ne_stacked[ne_stacked]

In [25]: changed.index.names = ['id', 'col']

In [26]: changed
Out[26]:
id  col
1   score         True
2   isEnrolled    True
    Comment       True
dtype: bool

Here the first entry is the index and the second the columns which has been changed.

In [27]: difference_locations = np.where(df1 != df2)

In [28]: changed_from = df1.values[difference_locations]

In [29]: changed_to = df2.values[difference_locations]

In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
               from           to
id col
1  score       1.11         1.21
2  isEnrolled  True        False
   Comment     None  On vacation

* Note: it's important that df1 and df2 share the same index here. To overcome this ambiguity, you can ensure you only look at the shared labels using df1.index & df2.index, but I think I'll leave that as an exercise.

Answer from Andy Hayden on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 17095101 › compare-two-dataframes-and-output-their-differences-side-by-side

python - Compare two DataFrames and output their differences side-by-side - Stack Overflow

Highlighting the difference between two DataFrames

It is possible to use the DataFrame style property to highlight the background color of the cells where there is a difference.

Using the example data from the original question

The first step is to concatenate the DataFrames horizontally with the concat function and distinguish each frame with the keys parameter:

df_all = pd.concat([df.set_index('id'), df2.set_index('id')], 
                   axis='columns', keys=['First', 'Second'])
df_all

It's probably easier to swap the column levels and put the same column names next to each other:

df_final = df_all.swaplevel(axis='columns')[df.columns[1:]]
df_final

Now, its much easier to spot the differences in the frames. But, we can go further and use the style property to highlight the cells that are different. We define a custom function to do this which you can see in this part of the documentation.

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
                        index=data.index, columns=data.columns)

df_final.style.apply(highlight_diff, axis=None)

This will highlight cells that both have missing values. You can either fill them or provide extra logic so that they don't get highlighted.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.compare.html

pandas.DataFrame.compare — pandas 3.0.1 documentation

>>> df.compare(df2, keep_shape=True) col1 col2 col3 self other self other self other 0 a c NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN 3.0 4.0 3 NaN NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN NaN · Keep all original rows and columns and also all original values

Discussions

Is this the most efficient way to compare 2 Pandas dataframes? Finding rows unique to one dataframe.

The pd.merge() function with the argument how='right' will perform a right join on the two dataframes, which means all the rows from the right_test_df and any common rows from the left_test_df will be in the result. The indicator=True parameter adds a column called _merge to the output DataFrame that can have three possible values: left_only, right_only, or both, indicating the source of each row. By filtering for rows where _merge is right_only, you get all rows that are unique to the right_test_df. However, if you want to find rows that are unique to either DataFrame, you would set how='outer' and then filter for rows where _merge is either right_only or left_only. Here is an example: merged_df = pd.merge(left_test_df, right_test_df, how='outer', indicator=True) final_df = merged_df[merged_df['_merge'] != 'both'] In this version of the code, final_df will contain all rows that are unique to either the left_test_df or the right_test_df. As with many things in programming (and Python/Pandas in particular), there are often multiple valid ways to accomplish a task. This is one efficient way, but depending on the specific requirements of your task, other methods may be more suitable. More on reddit.com

r/learnpython

1

2

July 20, 2023

Pandas: Compare two dataframes that are different lengths

You want to try a merge . So you can have 1 dataframe with columns from both years. In the how parameter if you only want players that have played both seasons you can use inner. And the on parameter for there unique key which will be name (if no player shares same name) More on reddit.com

r/learnpython

5

1

December 16, 2020

compare two DataFrames

Try printing them out and using a highlighter to mark the differences More on reddit.com

r/dataengineering

20

6

August 4, 2023

Pandas or Polars to work with dataframes?

I rewrote some old code that used Pandas to graph statistics recorded at high frequency over long periods. Think around 50 columns, and millions of rows. But the catch? Each row was a span of time, not an instantaneous measurement - and the time spans were variable and overlapping. (specifically, job logs on an HPC cluster). The pandas 1.x code took about half an hour to run - mostly because of the majority of the work being stuck in a single thread. I rewrote it in Polars (with the help of an expert on the Polars discord) and it brought that down to three minutes. In contrast to the original code, this was able to leverage the cores available properly. I did not touch anything directly to do threading, like reading files in a pool or such. 100% the doing of Polars, there. More on reddit.com

r/Python

68

79

April 10, 2023

Videos

02:03

YouTube

Pandas: Visually Compare Two DataFrames with a New DataFrame - YouTube

January 15, 2024

163

youtube.com

Compare Two pandas DataFrames in Python Explained With Example ...

May 20, 2023

03:49

YouTube

Compare Two pandas DataFrames in Python (Example) | Find Differences ...

July 20, 2022

13.4K

youtube.com

Python — Retrieve Matching Rows From Two Dataframes

02:29

YouTube

Concatenating Data | #37 of 53: The Complete Pandas Course - YouTube

June 14, 2022

08:50

YouTube

Python Pandas tutorial | Highlight differences between two dataframes ...

statology.org › home › pandas: how to compare two dataframes row by row

Pandas: How to Compare Two DataFrames Row by Row

January 23, 2023 - Note: The argument keep_equal=True tells pandas to keep values that are equal. Otherwise, equal values are shown as NaNs. The following code shows how to use the argument keep_shape=True to compare the two DataFrames row by row and keep all of the rows from the original DataFrames:

Educative

educative.io › answers › how-to-compare-two-dataframes-in-pandas

How to compare two DataFrames in pandas

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side.

Spark By {Examples}

sparkbyexamples.com › home › pandas › compare two dataframes row by row

Compare Two DataFrames Row by Row - Spark By {Examples}

June 10, 2025 - Pandas DataFrame.compare() function compares two equal sizes and dimensions of DataFrames row by row along with align_axis = 0 and returns The DataFrame with unequal values of given DataFrames. By default, it compares the DataFrames column by column.

Towards Data Science

towardsdatascience.com › home › latest › comparing pandas dataframes to one another

Comparing Pandas Dataframes To One Another | Towards Data Science

January 18, 2025 - It’s a bit more roundabout but by using merge, we can compare just the values of each entry, irrespective of the indexes. Merging using the Player and Rings columns allows us to match up rows with identical values between our existing and new dataframes (while ignoring differences in the dataframe’s indexes). We need to rename the Rings column in our new dataframe to get it to output as two separate columns (Rings for the old and Rings_new for new).

reddit.com › r/learnpython › is this the most efficient way to compare 2 pandas dataframes? finding rows unique to one dataframe.

r/learnpython on Reddit: Is this the most efficient way to compare 2 Pandas dataframes? Finding rows unique to one dataframe.

July 20, 2023 -

Still trying to wrap my head around Pandas (and continuing to be blown away by its capabilities every day...

Say I have 2 dataframes (lets call the left_test_df and right_test_df). And, I want to see which rows are only present in one of them. Is something like this the best way to go about doing it?

    merged_df=pd.merge(left_test_df, right_test_df, how='right', indicator=True)
    final_df=merged_df[merged_df['_merge']=='right_only']

Top answer

1 of 1

1

The pd.merge() function with the argument how='right' will perform a right join on the two dataframes, which means all the rows from the right_test_df and any common rows from the left_test_df will be in the result. The indicator=True parameter adds a column called _merge to the output DataFrame that can have three possible values: left_only, right_only, or both, indicating the source of each row. By filtering for rows where _merge is right_only, you get all rows that are unique to the right_test_df. However, if you want to find rows that are unique to either DataFrame, you would set how='outer' and then filter for rows where _merge is either right_only or left_only. Here is an example: merged_df = pd.merge(left_test_df, right_test_df, how='outer', indicator=True) final_df = merged_df[merged_df['_merge'] != 'both'] In this version of the code, final_df will contain all rows that are unique to either the left_test_df or the right_test_df. As with many things in programming (and Python/Pandas in particular), there are often multiple valid ways to accomplish a task. This is one efficient way, but depending on the specific requirements of your task, other methods may be more suitable.

Find elsewhere

Google Bing Mojeek

Hackers and Slackers

hackersandslackers.com › compare-rows-pandas-dataframes

Comparing Rows Between Two Pandas DataFrames

April 28, 2024 - Which rows were only present in the second DataFrame? ... For starters, our function dataframe_difference() will need to be passed two DataFrames to compare.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.equals.html

pandas.DataFrame.equals — pandas 3.0.1 documentation

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The row/column index do not need to have the same type, as long as the values are considered equal.

Data Science Discovery

discovery.cs.illinois.edu › guides › DataFrame-Fundamentals › Comparing-Two-DataFrames

Comparing Two DataFrames - Data Science Discovery - Illinois

October 10, 2022 - import pandas as pd\ndf1 = pd.DataFrame([\n {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},\n {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},\n {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},\n {'Name': 'Howard', 'Favorite Color': 'Green', 'Show': 'BCS'}\n])\ndf2 = pd.DataFrame([\n {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},\n {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},\n {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},\n {'Name': 'Jesse', 'Favorite Color': 'Maroon', 'Show': 'BB'},\n])\ndf1.compare(df2) ... By default t

IncludeHelp

includehelp.com › python › how-to-compare-two-dataframes-and-output-their-differences-side-by-side.aspx

How to compare two DataFrames and output their differences side-by-side?

For this purpose, we will check both the DataFrames if they are equal or not. To check if the DataFrames are equal or not, we will use pandas.DataFrame.compare() method. This method is used to compare two DataFrames and to find the differences between the rows of two DataFrames.

Saturn Cloud

saturncloud.io › blog › how-to-compare-two-pandas-dataframes-for-differences

How to Compare Two Pandas Dataframes for Differences | Saturn Cloud Blog

October 27, 2023 - Similarly, three values in column C of three rows 0, 1, 2 are different between the two dataframes. Note: Before comparing two DataFrames make sure that the number of records in the first DataFrame matches the number of records in the second DataFrame. If not so, you will be getting a value error which is : ValueError: Can only compare identically-labeled Series objects · Comparing two pandas dataframes is an essential task in data analysis.

Medium

medium.com › data-science › 3-easy-ways-to-compare-two-pandas-dataframes-b2a18169c876

Some Of The Ways To Compare Two Pandas DataFrames | TDS Archive

August 9, 2023 - Compare two dataframes in pandas cell by cell, based on column & using for loop. The compare method in pandas shows the differences between two DataFrames.

Data to Fish

datatofish.com › compare-values-dataframes

How to Compare Values Between Two pandas DataFrames

January 12, 2023 - In this tutorial, you will learn how to compare values between two DataFrames. Let's say, you have data on caught fish by two fishing boats: import pandas as pd boat1 = { 'fish': ['salmon', 'pufferfish', 'shark'], 'count': [99, 33, 11] } boat2 = { 'fish': ['salmon', 'pufferfish'], 'count': [88, 22] } df1 = pd.DataFrame(boat1) df2 = pd.DataFrame(boat2) print(df1) print(df2) df1: fish count 0 salmon 99 1 pufferfish 33 2 shark 11 df2: fish count 0 salmon 88 1 pufferfish 22 ·

Saturn Cloud

saturncloud.io › blog › how-to-efficiently-compare-rows-in-a-pandas-dataframe

How to Efficiently Compare Rows in a Pandas DataFrame | Saturn Cloud Blog

November 14, 2023 - We covered three methods: using the equals() method, using the isin() method, and using the apply() method. Depending on the specific use case, one method may be more efficient or appropriate than another.

Statistics Globe

statisticsglobe.com › home › python programming language for statistics & data science › compare two pandas dataframes in python (example) | find differences row by row

Compare Two pandas DataFrames in Python | Find Differences by Rows

July 28, 2020 - In this tutorial, I’ll show how to find different rows between two pandas DataFrames in the Python programming language.

Statology

statology.org › home › how to compare two dataframes in pandas

How to Compare Two DataFrames in Pandas

July 28, 2020 - Player D had 5 more points and 5 more assists in DataFrame 2 compared to DataFrame 1. Example 3: Find all rows that only exist in one DataFrame. We can use the following code to obtain a complete list of rows that only appear in one DataFrame: #outer merge the two DataFrames, adding an indicator column called 'Exist' diff_df = pd.merge(df1, df2, how='outer', indicator='Exist') #find which rows don't exist in both DataFrames diff_df = diff_df.loc[diff_df['Exist'] != 'both'] diff_df player points assists Exist 0 A 12 4 left_only 1 B 15 6 left_only 2 C 17 7 left_only 3 D 24 8 left_only 4 A 12 7 right_only 5 B 24 8 right_only 6 C 26 10 right_only 7 D 29 13 right_only

GeeksforGeeks

geeksforgeeks.org › pandas › how-to-compare-two-dataframes-with-pandas-compare

How To Compare Two Dataframes with Pandas compare? - GeeksforGeeks

July 23, 2025 - It is of bool type and the default value for it is "false", i.e. it displays all the values in the table by default. keep_equal : This is mainly for displaying same or equal values in the output when set to True. If it is made false then it will display the equal values as NANs. Returns another DataFrame with the differences between the two dataFrames. Before Starting, an important note is the pandas version must be at least 1.1.0.

TutorialsPoint

tutorialspoint.com › how-to-compare-two-dataframe-with-pandas-compare

How to Compare two Dataframe with Pandas Compare?

July 20, 2023 - Whether you're an experienced data analyst or a fresher, this article will provide you with the knowledge you need for pandas to use compare effectively and confidently. The basic syntax for the compare() function is as follows: ... where df1 ...

Linux Hint

linuxhint.com › pandas-compare-two-dataframes-row-by-row

Pandas Compare Two DataFrames Row by Row

September 29, 2023 - Linux Hint LLC, [email protected] 1210 Kelly Park Circle, Morgan Hill, CA 95037 Privacy Policy and Terms of Use