compare two dataframes pandas based on column value

How do I compare columns in different data frames?

datascience.stackexchange.com › questions › 33053 › how-do-i-compare-columns-in-different-data-frames

If you want to check equal values on a certain column, let's say Name, you can merge both DataFrames to a new one:

mergedStuff = pd.merge(df1, df2, on=['Name'], how='inner')
mergedStuff.head()

I think this is more efficient and faster than where if you have a big data set.

Answer from Tarek on Stack Exchange

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.compare.html

pandas.DataFrame.compare — pandas 3.0.1 documentation

Compare to another DataFrame and show the differences. ... Object to compare with. align_axis{0 or ‘index’, 1 or ‘columns’}, default 1

Stack Exchange

datascience.stackexchange.com › questions › 33053 › how-do-i-compare-columns-in-different-data-frames

pandas - How do I compare columns in different data frames? - Data Science Stack Exchange

Top answer

1 of 9

42

If you want to check equal values on a certain column, let's say Name, you can merge both DataFrames to a new one:

mergedStuff = pd.merge(df1, df2, on=['Name'], how='inner')
mergedStuff.head()

I think this is more efficient and faster than where if you have a big data set.

2 of 9

28

You can double check the exact number of common and different positions between two df by using isin and value_counts().

Like that:

df['your_column_name'].isin(df2['your_column_name']).value_counts()

Result:

True = common
False = different

Discussions

python 3.x - How to compare two dataframes based on certain column values and remove them in pandas - Stack Overflow

I have two data frames. df1: userID ID Sex Date Month Year Security John 45 Male 31 03 1975 Low Tom 22 Male 01 01 1990 High Ma... More on stackoverflow.com

stackoverflow.com

Is this the most efficient way to compare 2 Pandas dataframes? Finding rows unique to one dataframe.

The pd.merge() function with the argument how='right' will perform a right join on the two dataframes, which means all the rows from the right_test_df and any common rows from the left_test_df will be in the result. The indicator=True parameter adds a column called _merge to the output DataFrame that can have three possible values: left_only, right_only, or both, indicating the source of each row. By filtering for rows where _merge is right_only, you get all rows that are unique to the right_test_df. However, if you want to find rows that are unique to either DataFrame, you would set how='outer' and then filter for rows where _merge is either right_only or left_only. Here is an example: merged_df = pd.merge(left_test_df, right_test_df, how='outer', indicator=True) final_df = merged_df[merged_df['_merge'] != 'both'] In this version of the code, final_df will contain all rows that are unique to either the left_test_df or the right_test_df. As with many things in programming (and Python/Pandas in particular), there are often multiple valid ways to accomplish a task. This is one efficient way, but depending on the specific requirements of your task, other methods may be more suitable. More on reddit.com

r/learnpython

1

2

July 20, 2023

python - How to compare two columns of two dataframes and indicate the changes? - Data Science Stack Exchange

I have two dataframes, one is current week's information, one is of last week. I want to create a new dataset that lists all the changes during the week. Please see the following example: if there ... More on datascience.stackexchange.com

datascience.stackexchange.com

January 7, 2022

python - Compare two DataFrames and output their differences side-by-side - Stack Overflow

I am trying to highlight exactly what changed between two dataframes. Suppose I have two Python Pandas dataframes: "StudentRoster Jan-1": id Name score isEnrolled More on stackoverflow.com

stackoverflow.com

March 28, 2022

Videos

01:18

YouTube

How to Compare Two Pandas DataFrames and Calculate the Sum of ...

January 13, 2025

4

youtube.com

Compare Two pandas DataFrames in Python Explained With Example ...

May 20, 2023

02:03

YouTube

Pandas: Visually Compare Two DataFrames with a New DataFrame - YouTube

Compare Two pandas DataFrames in Python (Example) | Find Differences ...

July 20, 2022

13.4K

youtube.com

How to Properly Compare Two Pandas DataFrames with ...

02:13

YouTube

Comparing Two DataFrames with Different Shapes Using Pandas - YouTube

statology.org › home › pandas: how to compare columns in two different dataframes

Pandas: How to Compare Columns in Two Different DataFrames

July 17, 2022 - import numpy as np import pandas as pd #create first DataFrame df1 = pd.DataFrame({'team': ['Mavs', 'Rockets', 'Spurs', 'Heat', 'Nets'], 'points': [22, 30, 15, 17, 14]}) #view DataFrame print(df1) team points 0 Mavs 22 1 Rockets 30 2 Spurs 15 3 Heat 17 4 Nets 14 #create second DataFrame df2 = pd.DataFrame({'team': ['Mavs', 'Thunder', 'Spurs', 'Nets', 'Cavs'], 'points': [25, 40, 31, 32, 22]}) #view DataFrame print(df2) team points 0 Mavs 25 1 Thunder 40 2 Spurs 31 3 Nets 32 4 Cavs 22 · The following code shows how to count the number of matching values between the team columns in each DataFrame:

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.equals.html

pandas.DataFrame.equals — pandas 3.0.1 documentation

DataFrames df and different_column_type have the same element types and values, but have different types for the column labels, which will still return True. >>> different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]}) >>> different_column_type 1.0 2.0 0 10 20 >>> df.equals(different_column_type) True

Stack Overflow

stackoverflow.com › questions › 51366509 › how-to-compare-two-dataframes-based-on-certain-column-values-and-remove-them-in

python 3.x - How to compare two dataframes based on certain column values and remove them in pandas - Stack Overflow

Top answer

1 of 2

5

Just do with simple merge follow with dropna

df2.merge(df1,how='left').dropna().drop('Security',1)
Out[318]: 
  userID  ID   Sex  Date  Month  Year
4   John  45  Male    31      3  1975
5    Tom  22  Male     1      1  1990
7   Hary  56  Male    15      9  1970

2 of 2

1

Define the key columns which you want to merge on, and then perform an inner merge between df2 and only the key columns of df1. The default for merge is inner, so you don't need to specify it explicitly. Subsetting df1 to only these key columns ensures that you don't bring any of its columns over to df2 with the merge.

key_cols = ['userID', 'ID', 'Date', 'Month', 'Year']
df2.merge(df1.loc[:, df1.columns.isin(key_cols)])

Outputs:

  userID  ID   Sex  Date  Month  Year
0   John  45  Male    31      3  1975
1    Tom  22  Male     1      1  1990
2   Hary  56  Male    15      9  1970

reddit.com › r/learnpython › is this the most efficient way to compare 2 pandas dataframes? finding rows unique to one dataframe.

r/learnpython on Reddit: Is this the most efficient way to compare 2 Pandas dataframes? Finding rows unique to one dataframe.

July 20, 2023 -

Still trying to wrap my head around Pandas (and continuing to be blown away by its capabilities every day...

Say I have 2 dataframes (lets call the left_test_df and right_test_df). And, I want to see which rows are only present in one of them. Is something like this the best way to go about doing it?

    merged_df=pd.merge(left_test_df, right_test_df, how='right', indicator=True)
    final_df=merged_df[merged_df['_merge']=='right_only']

Top answer

1 of 1

1

The pd.merge() function with the argument how='right' will perform a right join on the two dataframes, which means all the rows from the right_test_df and any common rows from the left_test_df will be in the result. The indicator=True parameter adds a column called _merge to the output DataFrame that can have three possible values: left_only, right_only, or both, indicating the source of each row. By filtering for rows where _merge is right_only, you get all rows that are unique to the right_test_df. However, if you want to find rows that are unique to either DataFrame, you would set how='outer' and then filter for rows where _merge is either right_only or left_only. Here is an example: merged_df = pd.merge(left_test_df, right_test_df, how='outer', indicator=True) final_df = merged_df[merged_df['_merge'] != 'both'] In this version of the code, final_df will contain all rows that are unique to either the left_test_df or the right_test_df. As with many things in programming (and Python/Pandas in particular), there are often multiple valid ways to accomplish a task. This is one efficient way, but depending on the specific requirements of your task, other methods may be more suitable.

Statology

statology.org › home › pandas: how to compare two dataframes row by row

Pandas: How to Compare Two DataFrames Row by Row

January 23, 2023 - The following code shows how to compare the two DataFrames row by row and only keep the rows that have differences in at least one column: #compare DataFrames and only keep rows with differences df_diff = df1.compare(df2, keep_equal=True, align_axis=0) #view results print(df_diff) team points 1 self B 22 other B 30 3 self D 14 other E 20 · We can see that the DataFrames have two rows that are different. In particular, we can see that the rows in index positions 1 and 3 of each DataFrame have different values in at least one column.

Find elsewhere

Google Bing Mojeek

Data to Fish

datatofish.com › compare-values-dataframes

How to Compare Values Between Two pandas DataFrames

July 23, 2025 - In this tutorial, you will learn how to compare values between two DataFrames. Let's say, you have data on caught fish by two fishing boats: import pandas as pd boat1 = { 'fish': ['salmon', 'pufferfish', 'shark'], 'count': [99, 33, 11] } boat2 = { 'fish': ['salmon', 'pufferfish'], 'count': [88, 22] } df1 = pd.DataFrame(boat1) df2 = pd.DataFrame(boat2) print(df1) print(df2)

Stack Exchange

datascience.stackexchange.com › questions › 106809 › how-to-compare-two-columns-of-two-dataframes-and-indicate-the-changes

python - How to compare two columns of two dataframes and indicate the changes? - Data Science Stack Exchange

Top answer

1 of 3

2

import pandas as pd
import numpy as np

old = pd.DataFrame({
    "ID": ["AA", "BB", "CC"],
    "Rating": ["High", "Low", "Medium"],
    "Status": ["On track", "Monitor", "On track"]
})

new = pd.DataFrame({
    "ID": ["AA", "BB", "CC", "DD"],
    "Rating": ["Medium", "High", "Medium", "Low"],
    "Status": ["On track", "On track", "On track", "Monitor"]
})

(
    old
    # join the two dataframes used the ID column as a key
    .merge(new, how="outer", on="ID", suffixes=("_old", "_new"))
    # compare columns between old and new dataframe and assign new values
    .assign(
        Rating = lambda x: np.select(
            [x["Rating_new"].notna() & x["Rating_old"].isna(), x["Rating_new"] != x["Rating_old"]],
            ["New", "From '" + x["Rating_old"] +  "' To '" + x["Rating_new"] + "'"],
            default=np.nan
        ),
        Status = lambda x: np.select(
            [x["Status_new"].notna() & x["Status_old"].isna(), x["Status_new"] != x["Status_old"]],
            ["New", "From '" + x["Status_old"] +  "' To '" + x["Status_new"] + "'"],
            default=np.nan
        )
    )
    # select final columns
    .loc[:, ["ID", "Rating", "Status"]]
)

ID	Rating	Status
AA	From 'High' To 'Medium'	nan
BB	From 'Low' To 'High'	From 'Monitor' To 'On track'
CC	nan	nan
DD	New	New

2 of 3

0

Please merge (left Join) the current table to previous table, Now you will have all the 4 columns in one dataframe. You can apply concatenate of columns to get desired results.

Please share dataframe creation code if you need help with code creat

Medium

medium.com › @vfxbwrnnzb › 9-ways-to-compare-pandas-dataframes-5311d3b0653c

9 Ways to Compare Pandas DataFrames. Explained In Just ...

On Medium, anyone can share insightful perspectives, useful knowledge, and life wisdom with the world.

Saturn Cloud

saturncloud.io › blog › how-to-compare-two-pandas-dataframes-for-differences

How to Compare Two Pandas Dataframes for Differences | Saturn Cloud Blog

October 27, 2023 - When comparing two dataframes with the same shape and column names, we can use the equals() function provided by pandas. This function returns a boolean value indicating whether the two dataframes are equal or not.

Educative

educative.io › answers › how-to-compare-two-dataframes-in-pandas

How to compare two DataFrames in pandas

The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels. Note: To learn more about pandas, please visit this link. DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False) The compare method accepts the following parameters: ... align_axis: This indicates the axis of comparison, with 0 for rows, and 1, the default value, for columns...

Stack Overflow

stackoverflow.com › questions › 17095101 › compare-two-dataframes-and-output-their-differences-side-by-side

python - Compare two DataFrames and output their differences side-by-side - Stack Overflow

Top answer

1 of 16

177

The first part is similar to Constantine, you can get the boolean of which rows are empty*:

In [21]: ne = (df1 != df2).any(1)

In [22]: ne
Out[22]:
0    False
1     True
2     True
dtype: bool

Then we can see which entries have changed:

In [23]: ne_stacked = (df1 != df2).stack()

In [24]: changed = ne_stacked[ne_stacked]

In [25]: changed.index.names = ['id', 'col']

In [26]: changed
Out[26]:
id  col
1   score         True
2   isEnrolled    True
    Comment       True
dtype: bool

Here the first entry is the index and the second the columns which has been changed.

In [27]: difference_locations = np.where(df1 != df2)

In [28]: changed_from = df1.values[difference_locations]

In [29]: changed_to = df2.values[difference_locations]

In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
               from           to
id col
1  score       1.11         1.21
2  isEnrolled  True        False
   Comment     None  On vacation

* Note: it's important that df1 and df2 share the same index here. To overcome this ambiguity, you can ensure you only look at the shared labels using df1.index & df2.index, but I think I'll leave that as an exercise.

2 of 16

143

Highlighting the difference between two DataFrames

It is possible to use the DataFrame style property to highlight the background color of the cells where there is a difference.

Using the example data from the original question

The first step is to concatenate the DataFrames horizontally with the concat function and distinguish each frame with the keys parameter:

df_all = pd.concat([df.set_index('id'), df2.set_index('id')], 
                   axis='columns', keys=['First', 'Second'])
df_all

It's probably easier to swap the column levels and put the same column names next to each other:

df_final = df_all.swaplevel(axis='columns')[df.columns[1:]]
df_final

Now, its much easier to spot the differences in the frames. But, we can go further and use the style property to highlight the cells that are different. We define a custom function to do this which you can see in this part of the documentation.

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
                        index=data.index, columns=data.columns)

df_final.style.apply(highlight_diff, axis=None)

This will highlight cells that both have missing values. You can either fill them or provide extra logic so that they don't get highlighted.

TutorialsPoint

tutorialspoint.com › how-to-compare-two-dataframe-with-pandas-compare

How to Compare two Dataframe with Pandas Compare?

July 20, 2023 - Correspondingly, the entries of 7 and 1 under column B indicate that the value in df2 is 1 greater than df1. Here's a one more example of comparing two dataframes using the pandas compare() function:

GeeksforGeeks

geeksforgeeks.org › how-to-compare-two-columns-in-pandas

Compare Two Columns in Pandas - GeeksforGeeks

September 29, 2023 - We created a dictionary, and the values for each column are given. Then it is converted into a pandas dataframe. By using the Where() method in NumPy, we are given the condition to compare the columns.

Kanoki

kanoki.org › 2022 › 08 › 01 › pandas-compare-columns-in-two-dataframes

Pandas compare columns in two data frames | kanoki

August 1, 2022 - We have added a new column called as sales-diff to find the differences between the sales value in two dataframes where the Item values are similar otherwise difference is set to 0. numpy.where() is used to return choice depending on condition · df['sales-diff']=np.where(df['df1']['Items']==df['df2']['Items'], (df['df1']['Sale']-df['df2']['Sale']), 0) We’ve got a new column that shows exactly the difference between the Sales column between df2 and df1 · Let’s find the rows not matching between two dataframes(df1 and df2) based on column Items i.e.

Spark By {Examples}

sparkbyexamples.com › home › pandas › compare two dataframes row by row

Compare Two DataFrames Row by Row - Spark By {Examples}

June 10, 2025 - Pandas DataFrame.compare() function compares two equal sizes and dimensions of DataFrames row by row along with align_axis = 0 and returns The DataFrame with unequal values of given DataFrames. By default, it compares the DataFrames column by column.

GeeksforGeeks

geeksforgeeks.org › how-to-compare-values-in-two-pandas-dataframes

How to compare values in two Pandas Dataframes? - GeeksforGeeks

January 12, 2022 - Dataframe in use: Method 1: Direct MethodÂ This is the __getitem__ method syntax ([]), which lets you directly access the columns of the data frame using the column name. Example: Subtract two columns in Pand ... Pandas DataFrame is a Two-dimensional data structure of mutable size and heterogeneous tabular data.

Medium

medium.com › womenintechnology › different-ways-to-compare-two-columns-from-two-different-files-using-pandas-be7ff315ab17

Different Ways to Compare Two Columns From Two Different Files Using Pandas | by Indhumathy Chelliah | Women in Technology | Medium

July 24, 2023 - During file processing, we might have encountered many situations to compare two columns from two different files. In this article, let’s look at different methods to find common values between two columns in two different files using pandas dataframe.

Towards Data Science

towardsdatascience.com › home › latest › comparing pandas dataframes to one another

Comparing Pandas Dataframes To One Another | Towards Data Science

January 18, 2025 - # Data from friend array_2 = np.array([['LeBron',3], ['Kobe',3], ['Michael',6,], ['Larry',5], ['Magic',5], ['Tim',4]]) df_2 = pd.DataFrame(array_2, columns=['Player','Rings']) We can use the .eq method to quickly compare the dataframes.