pandas merge on multiple columns with different names

Pandas join on columns with different names [duplicate]

stackoverflow.com › questions › 40570143 › pandas-join-on-columns-with-different-names

When the names are different, use the xxx_on parameters instead of on=:

pd.merge(df1, df2, left_on=  ['userid', 'column1'],
                   right_on= ['username', 'column1'], 
                   how = 'left')

Answer from Zeugma on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 40570143 › pandas-join-on-columns-with-different-names

python - Pandas join on columns with different names - Stack Overflow

Top answer

1 of 2

When the names are different, use the xxx_on parameters instead of on=:

pd.merge(df1, df2, left_on=  ['userid', 'column1'],
                   right_on= ['username', 'column1'], 
                   how = 'left')

2 of 2

An alternative approach is to use join setting the index of the right hand side DataFrame to the columns ['username', 'column1']:

df1.join(df2.set_index(['username', 'column1']), on=['userid', 'column1'], how='left')

The output of this join merges the matched keys from the two differently named key columns, userid and username, into a single column named after the key column of df1, userid; whereas the output of the merge maintains the two as separate columns. To illustrate, consider the following example:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({'ID': [1,2,3,4,5,6], 'pID' : [21,22,23,24,25,26], 'Values' : [435,33,45,np.nan,np.nan,12]})
##    ID  Values  pID
## 0   1   435.0   21
## 1   2    33.0   22
## 2   3    45.0   23
## 3   4     NaN   24
## 4   5     NaN   25
## 5   6    12.0   26

df2 = pd.DataFrame({'ID' : [4,4,5], 'pid' : [24,25,25], 'Values' : [544, 545, 676]})
##    ID  Values  pid
## 0   4     544   24
## 1   4     545   25
## 2   5     676   25

pd.merge(df1, df2, how='left', left_on=['ID', 'pID'], right_on=['ID', 'pid']))
##    ID  Values_x  pID  Values_y   pid
## 0   1     435.0   21       NaN   NaN
## 1   2      33.0   22       NaN   NaN
## 2   3      45.0   23       NaN   NaN
## 3   4       NaN   24     544.0  24.0
## 4   5       NaN   25     676.0  25.0
## 5   6      12.0   26       NaN   NaN

df1.join(df2.set_index(['ID','pid']), how='left', on=['ID','pID'], lsuffix='_x', rsuffix='_y'))
##    ID  Values_x  pID  Values_y
## 0   1     435.0   21       NaN
## 1   2      33.0   22       NaN
## 2   3      45.0   23       NaN
## 3   4       NaN   24     544.0
## 4   5       NaN   25     676.0
## 5   6      12.0   26       NaN

Here, we also need to specify lsuffix and rsuffix in join to distinguish the overlapping column Value in the output. As one can see, the output of merge contains the extra pid column from the right hand side DataFrame, which IMHO is unnecessary given the context of the merge. Note also that the dtype for the pid column has changed to float64, which results from upcasting due to the NaNs introduced from the unmatched rows.

This aesthetic output is gained at a cost in performance as the call to set_index on the right hand side DataFrame incurs some overhead. However, a quick and dirty profile shows that this is not too horrible, roughly 30%, which may be worth it:

sz = 1000000 # one million rows
df1 = pd.DataFrame({'ID': np.arange(sz), 'pID' : np.arange(0,2*sz,2), 'Values' : np.random.random(sz)})
df2 = pd.DataFrame({'ID': np.concatenate([np.arange(sz/2),np.arange(sz/2)]), 'pid' : np.arange(0,2*sz,2), 'Values' : np.random.random(sz)})

%timeit pd.merge(df1, df2, how='left', left_on=['ID', 'pID'], right_on=['ID', 'pid'])
## 818 ms ± 33.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df1.join(df2.set_index(['ID','pid']), how='left', on=['ID','pID'], lsuffix='_x', rsuffix='_y')
## 1.04 s ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Pandas

pandas.pydata.org › docs › reference › api › pandas.merge.html

pandas.merge — pandas 3.0.1 documentation

Since pandas 3.0, this method always returns a new object using a lazy copy mechanism that defers copies until necessary (Copy-on-Write). See the user guide on Copy-on-Write for more details. ... If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name ...

Discussions

Pandas merge multiple dataframes with different columns

Do you mean merge or concatenate? They're two different things. Your concatenation code stacks the dataframes either side-by-side or one on top of the other, depending on your axis argument. Merging means you're taking a common column in the dataframes and putting rows together based on the column values. I cannot replicate your issue, as illustrated by my toy code below, so it you be helpful if you posted a minimal reproducible example . df1 = pandas.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) df2 = pandas.DataFrame({"a": [4, 5, 6], "c": [7, 8, 9]}) df3 = pandas.DataFrame({"b": [11, 12, 13], "d": [2, 2, 2]}) d = {1: df1, 2: df2, 3: df3} df = pandas.concat(d.values()) Gives: a b c d 0 1.0 4.0 NaN NaN 1 2.0 5.0 NaN NaN 2 3.0 6.0 NaN NaN 0 4.0 NaN 7.0 NaN 1 5.0 NaN 8.0 NaN 2 6.0 NaN 9.0 NaN 0 NaN 11.0 NaN 2.0 1 NaN 12.0 NaN 2.0 2 NaN 13.0 NaN 2.0 More on reddit.com

r/learnpython

December 16, 2022

python - Joining pandas DataFrames by Column names - Stack Overflow

I have two DataFrames with the following column names: frame_1: event_id, date, time, county_ID frame_2: countyid, state I would like to get a DataFrame with the following columns by left-joining... More on stackoverflow.com

stackoverflow.com

python - pandas merge on columns with different names and avoid duplicates - Stack Overflow

Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like More on stackoverflow.com

stackoverflow.com

April 25, 2017

python - pandas: merge (join) two data frames on multiple columns - Stack Overflow

In the example below, the code ... with B_col2 and A_col2 with B_col1. Evidently, the results are different. As can be seen from the above example, if the merge keys have different names, all keys will show up as their individual columns in the merged datafra... More on stackoverflow.com

stackoverflow.com

Videos

00:56

YouTube

How to Merge Pandas DataFrames with Different Column Names 🚀 ...

March 26, 2025

29:30

YouTube

Master Python Pandas Merge: The Ultimate Guide to Combining ...

April 2, 2025

00:48

YouTube

Pandas: Merge Dataframes with Different Column Names #python - YouTube

March 7, 2024

youtube.com

Merge multiple dataframes (2 or more) using python pandas and reduce ...

November 5, 2024

06:12

YouTube

Merging DataFrames on Multiple Columns | Merging on Common Key ...

Combine pandas DataFrames with Different Column Names in Python ...

April 29, 2022

View all

GeeksforGeeks

geeksforgeeks.org › pandas › merge-join-two-dataframes-on-multiple-columns-in-pandas

Merge/Join Two Dataframes on Multiple Columns in Pandas - GeeksforGeeks

July 23, 2025 - Explanation: The suffixes=('_df1', '_df2') ensures overlapping column names are differentiated aqnd this prevents confusion when both DataFrames have columns with identical names.

Statology

statology.org › home › pandas: how to merge two dataframes with different column names

Pandas: How to Merge Two DataFrames with Different Column Names

May 10, 2022 - pd.merge(df1, df2, left_on='left_column_name', right_on='right_column_name') The following example shows how to use this syntax in practice. Suppose we have the following two pandas DataFrames:

w3resource

w3resource.com › python-exercises › pandas › pandas-merge-dataframes-on-multiple-columns-with-different-names.php

Pandas - Merge DataFrames on multiple columns with different names

May 6, 2025 - Learn how to merge DataFrames on multiple columns where the column names differ in each DataFrame using pd.merge() with left_on and right_on.

Saturn Cloud

saturncloud.io › blog › how-to-perform-a-pandas-join-on-columns-with-different-names

How to Perform a Pandas Join on Columns with Different Names | Saturn Cloud Blog

November 14, 2023 - In summary, to perform a Pandas join on columns with different names: Identify the columns to use as the join keys. Use the left_on and right_on parameters of the merge() or join() function to specify the columns to use from the left and right ...

Saturn Cloud

saturncloud.io › blog › how-to-merge-pandas-dataframes-with-different-column-names-and-avoid-duplicates

How to Merge Pandas DataFrames with Different Column Names and Avoid Duplicates | Saturn Cloud Blog

December 19, 2023 - In this article, we have explored how to merge DataFrames with different column names and avoid duplicates. By using the left_on and right_on parameters, we can merge DataFrames with different column names.

Find elsewhere

Google Bing Mojeek

reddit.com › r/learnpython › pandas merge multiple dataframes with different columns

r/learnpython on Reddit: Pandas merge multiple dataframes with different columns

December 16, 2022 -

Using pandas, I'm trying to merge approx 150 dataframes from a dictionary into one dataframe. Not all of the files have the same number of columns. 90% of the column names are the same. My intention is to have Null/NaN where the data is absent for a given dataframe.

I tried examples shown here: https://stackoverflow.com/questions/28097222/pandas-merge-two-dataframes-with-different-columns and here: https://www.geeksforgeeks.org/pandas-merge-two-dataframes-with-different-columns/

df = pd.concat(d.values(), axis=0, ignore_index=True)

df = pd.concat(d.values(), ignore_index=True, sort = False)

But I keep getting the error below. Any help would be appreciated.

pandas.errors.ParserError: Error tokenizing data. C error: Expected 50 fields in line 5, saw 51

Top answer

1 of 5

2 of 5

Can you show examples of your data and what you'd expect the output to be for the sample? I think you're wanting to add rows, but the rows you add don't all have the same columns. Is that so?

Dasboardai

dasboardai.com › blog › pandas-merge-when-column-names-are-different-a-beginners-guide

Pandas Merge When Column Names Are Different

right_df.rename(columns={'client_id': 'customer_id'}, inplace=True) merged_df = pd.merge(left_df, right_df, on='customer_id', how='inner') Merging DataFrames with different column names is easy in Pandas!

Pandas

pandas.pydata.org › docs › user_guide › merging.html

Merge, join, concatenate and compare — pandas 3.0.2 documentation

The merge suffixes argument takes a tuple or list of strings to append to overlapping column names in the input DataFrame to disambiguate the result columns:

GeeksforGeeks

geeksforgeeks.org › python › pandas-merge-two-dataframes-with-different-columns

Pandas - Merge two dataframes with different columns - GeeksforGeeks

July 23, 2025 - # Merging based on 'Name' with an inner join result = pd.merge(df, df1, on='Name', how='inner') print(result) ... Empty DataFrame Columns: [Name, College_x, Physics_x, Chemistry_x, Data Science, College_y, Physics_y, Chemistry_y, Maths] Index: [] The join() method aligns DataFrames based on their index and is useful for side-by-side alignment. ... # DataFrame 1 import pandas as pd details = { 'Name': ['Sravan', 'Sai', 'Mohan', 'Ishitha'], 'College': ['Vignan', 'Vignan', 'Vignan', 'Vignan'], 'Physics': [99, 76, 71, 93], 'Chemistry': [97, 67, 65, 89], 'Data Science': [93, 65, 65, 85] } df = pd.D

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas merge dataframes on multiple columns

Pandas Merge DataFrames on Multiple Columns - Spark By {Examples}

June 13, 2025 - To merge two pandas DataFrames on multiple columns, you can use the merge() function and specify the columns to join on using the on parameter. This

KDnuggets

kdnuggets.com › 2023 › 01 › merge-pandas-dataframes.html

How to Merge Pandas DataFrames - KDnuggets

In the example above, we change the second DataFrame ‘Country’ column as ‘Index’ then we merge the dataset by specify the column name on each DataFrame. Left_on parameter is for the first DataFrame and the right_on for the second DataFrame. There are five different types merged in the Pandas merge method.

Stack Overflow

stackoverflow.com › questions › 20375561 › joining-pandas-dataframes-by-column-names

python - Joining pandas DataFrames by Column names - Stack Overflow

Top answer

1 of 3

296

You can use the left_on and right_on options of pd.merge as follows:

pd.merge(frame_1, frame_2, left_on='county_ID', right_on='countyid')

Or equivalently with DataFrame.merge:

frame_1.merge(frame_2, left_on='county_ID', right_on='countyid')

I was not sure from the question if you only wanted to merge if the key was in the left hand DataFrame. If that is the case then the following will do that (the above will in effect do a many to many merge)

pd.merge(frame_1, frame_2, how='left', left_on='county_ID', right_on='countyid')

frame_1.merge(frame_2, how='left', left_on='county_ID', right_on='countyid')

2 of 3

you need to make county_ID as index for the right frame:

frame_2.join ( frame_1.set_index( [ 'county_ID' ], verify_integrity=True ),
               on=[ 'countyid' ], how='left' )

for your information, in pandas left join breaks when the right frame has non unique values on the joining column. see this bug.

so you need to verify integrity before joining by , verify_integrity=True

Statology

statology.org › home › how to merge pandas dataframes on multiple columns

How to Merge Pandas DataFrames on Multiple Columns

August 27, 2020 - In this case we can simplify use on = [‘a’, ‘b’] since the column names are the same in both DataFrames: pd.merge(df1, df2, how='left', on=['a', 'b']) a b c d 0 0 0 11 22.0 1 0 0 8 22.0 2 1 1 10 33.0 3 1 1 6 33.0 4 2 1 6 NaN · How to Merge Two Pandas DataFrames on Index How to Stack Multiple Pandas DataFrames

Stack Overflow

stackoverflow.com › questions › 39985861 › pandas-merge-on-columns-with-different-names-and-avoid-duplicates

python - pandas merge on columns with different names and avoid duplicates - Stack Overflow

Top answer

1 of 2

How about set the UserID as index and then join on index for the second data frame?

pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)

#   Col1    UserName    Col2
# 0    a           1       d
# 1    b           2       e
# 2    c           3       f

2 of 2

There is nothing really nice in it: it's meant to be keeping the columns as the larger cases like left right or outer joins would bring additional information with two columns. Don't try to overengineer your merge line, be explicit as you suggest

Solution 1:

df2.columns = ['Col2', 'UserName']

pd.merge(df1, df2,on='UserName')
Out[67]: 
  Col1  UserName Col2
0    a         1    d
1    b         2    e
2    c         3    f

Solution 2:

pd.merge(df1, df2, left_on='UserName', right_on='UserID').drop('UserID', axis=1)
Out[71]: 
  Col1  UserName Col2
0    a         1    d
1    b         2    e
2    c         3    f

GeeksforGeeks

geeksforgeeks.org › pandas › pandas-merge-dataframe

Pandas Merge Dataframe - GeeksforGeeks

July 23, 2025 - This allows you to perform more complex merges when the matching criteria are based on multiple columns. Call the merge() function with a list of column names.

w3resource

w3resource.com › python-exercises › pandas › pandas-merge-dataframes-with-different-join-column-names.php

Pandas - Merge DataFrames with different join column names

May 6, 2025 - Learn how to merge two DataFrames with different column names for the join using pd.merge() with left_on and right_on.

Codepointtech

codepointtech.com › home › mastering pandas: how to merge dataframes with different column names

Mastering Pandas: How to Merge DataFrames with Different Column Names - codepointtech.com

January 17, 2026 - The most elegant and direct solution to merging DataFrames with different column names lies in the left_on and right_on parameters of the pd.merge() function. These parameters explicitly tell Pandas which column from the left DataFrame and which column from the right DataFrame to use for the ...

Stack Overflow

stackoverflow.com › questions › 41815079 › pandas-merge-join-two-data-frames-on-multiple-columns

python - pandas: merge (join) two data frames on multiple columns - Stack Overflow

Top answer

1 of 6

637

Try this

new_df = pd.merge(
    left=A_df, 
    right=B_df,
    how='left',
    left_on=['A_c1', 'c2'],
    right_on=['B_c1', 'c2'],
)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs

2 of 6

It merges according to the ordering of left_on and right_on, i.e., the i-th element of left_on will match with the i-th of right_on.

In the example below, the code on the top matches A_col1 with B_col1 and A_col2 with B_col2, while the code on the bottom matches A_col1 with B_col2 and A_col2 with B_col1. Evidently, the results are different.
As can be seen from the above example, if the merge keys have different names, all keys will show up as their individual columns in the merged dataframe. In the example above, in the top dataframe, A_col1 and B_col1 are identical and A_col2 and B_col2 are identical. In the bottom dataframe, A_col1 and B_col2 are identical and A_col2 and B_col1 are identical. Since these are duplicate columns, they are most likely not needed. One way to not have this problem from the beginning is to make the merge keys identical from the beginning. See bullet point #3 below.
If left_on and right_on are the same col1 and col2, we can use on=['col1', 'col2']. In this case, no merge keys are duplicated.
```
df1.merge(df2, on=['col1', 'col2'])
```
You can also merge one side on column names and the other side on index too. For example, in the example below, df1's columns are matched with df2's indices. If the indices are named, as in the example below, you can reference them by name but if not, you can also use right_index=True (or left_index=True if the left dataframe is the one being merged on index).
```
df1.merge(df2, left_on=['A_col1', 'A_col2'], right_index=True)
# or
df1.merge(df2, left_on=['A_col1', 'A_col2'], right_on=['B_col1', 'B_col2'])
```
By using the how= parameter, you can perform LEFT JOIN (how='left'), FULL OUTER JOIN (how='outer') and RIGHT JOIN (how='right') as well. The default is INNER JOIN (how='inner') as in the examples above.
If you have more than 2 dataframes to merge and the merge keys are the same across all of them, then join method is more efficient than merge because you can pass a list of dataframes and join on indices. Note that the index names are the same across all dataframes in the example below (col1 and col2). Note that the indices don't have to have names; if the indices don't have names, then the number of the multi-indices must match (in the case below there are 2 multi-indices). Again, as in bullet point #1, the match occurs according to the ordering of the indices.
```
df1.join([df2, df3], how='inner').reset_index()
```