pandas merge on multiple columns with or condition

How to merge dataframes based on an "OR" condition

stackoverflow.com › questions › 45869886 › how-to-merge-dataframes-based-on-an-or-condition

Use merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).

df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})

df1         df2
   A  B        A  B
0  3  7     0  1  4
1  2  8     1  5  1
2  1  9     2  6  8
3  4  5     3  4  5

With these data frames we should see:

df1.loc[2] matches A on df2.loc[0]
df1.loc[1] matches B on df2.loc[2]
df1.loc[3] matches both A and B on df2.loc[3]

We'll use suffixes to keep track of what matched where:

suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']

df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
                df1.merge(df2, on='B', suffixes=suff_B)])

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
1  4.0             NaN             NaN  NaN             5.0             5.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.

duplicates = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~duplicates]

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

Answer from andrew_reece on Stack Overflow

Pandas

pandas.pydata.org › docs › user_guide › merging.html

Merge, join, concatenate and compare — pandas 3.0.2 documentation

Users who are familiar with SQL but new to pandas can reference a comparison with SQL. merge() implements common SQL style joining operations. one-to-one: joining two DataFrame objects on their indexes which must contain unique values. many-to-one: joining a unique index to one or more columns in ...

Stack Overflow

stackoverflow.com › questions › 45869886 › how-to-merge-dataframes-based-on-an-or-condition

python - How to merge dataframes based on an "OR" condition - Stack Overflow

Top answer

1 of 3

22

Use merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).

df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})

df1         df2
   A  B        A  B
0  3  7     0  1  4
1  2  8     1  5  1
2  1  9     2  6  8
3  4  5     3  4  5

With these data frames we should see:

df1.loc[2] matches A on df2.loc[0]
df1.loc[1] matches B on df2.loc[2]
df1.loc[3] matches both A and B on df2.loc[3]

We'll use suffixes to keep track of what matched where:

suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']

df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
                df1.merge(df2, on='B', suffixes=suff_B)])

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
1  4.0             NaN             NaN  NaN             5.0             5.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.

duplicates = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~duplicates]

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
0  1.0             NaN             NaN  NaN             9.0             4.0
0  NaN             2.0             6.0  8.0             NaN             NaN
1  NaN             4.0             4.0  5.0             NaN             NaN

2 of 3

4

I would suggest this alternate way for doing merge like this. This seems easier for me.

table1["id_to_be_merged"] = table1.apply(
    lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1)

You can add the same column in table2 as well if needed and then use in left_in or right_on based on your requirement.

Discussions

python - Joining two pandas dataframes based on multiple conditions - Stack Overflow

df_a and df_b are two dataframes that looks like following df_a A B C D E x1 Apple 0.3 0.9 0.6 x1 Orange 0.1 0.5 0.2 x2 Apple 0.2 0.2 0.1 x2 Orange 0.3 ... More on stackoverflow.com

stackoverflow.com

how to combine two columns with an if/else in python pandas? - Stack Overflow

I am very new to Pandas (i.e., less than 2 days). However, I can't seem to figure out the right syntax for combining two columns with an if/else condition. Actually, I did figure out one way to d... More on stackoverflow.com

stackoverflow.com

python - pandas: merge (join) two data frames on multiple columns - Stack Overflow

It merges according to the ordering of left_on and right_on, i.e., the i-th element of left_on will match with the i-th of right_on. In the example below, the code on the top matches A_col1 with B_col1 and A_col2 with B_col2, while the code on the bottom matches A_col1 with B_col2 and A_col2 with B_col1. Evidently, the results are different. As can be seen from the above example, if the merge keys have different names, all keys will show up as their individual columns ... More on stackoverflow.com

stackoverflow.com

Can I merge two Pandas DataFrames with complex conditions.

For merge to work there must be matching key columns. You can extract the 'Name' values from 'Name_ID' to get a valid key column to merge on. from io import StringIO import pandas as pd a = """ Name Value A 1 B 2 C 3 """ b = """ Name_ID position_X position_Y A x1 y1 A(a) x2 y2 B x3 y3 C_1 x4 y4 C; C_1 x5 y5 """ df_a = pd.read_csv(StringIO(a), sep='\t') df_b = pd.read_csv(StringIO(b), sep='\t') df_b['Name'] = df_b['Name_ID'].str.extract(r'([A-Z])') df = df_a.merge(df_b, how='left') print(df) Name Value Name_ID position_X position_Y 0 A 1 A x1 y1 1 A 1 A(a) x2 y2 2 B 2 B x3 y3 3 C 3 C_1 x4 y4 4 C 3 C; C_1 x5 y5 More on reddit.com

r/learnpython

9

1

July 14, 2021

Videos

youtube.com

Merging DataFrames in Pandas | Python Pandas Tutorials

06:12

YouTube

Merging DataFrames on Multiple Columns | Merging on Common Key ...

Python Pandas - Combine 2 Columns of a DataFrame - YouTube

pandas.pydata.org › docs › reference › api › pandas.merge.html

pandas.merge — pandas 3.0.1 documentation

The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames. ... If specified, checks if merge is of specified type. “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.

GeeksforGeeks

geeksforgeeks.org › pandas › merge-join-two-dataframes-on-multiple-columns-in-pandas

Merge/Join Two Dataframes on Multiple Columns in Pandas - GeeksforGeeks

July 23, 2025 - Explanation: The common columns ... where both columns match in both DataFrames. The merge() function in Pandas is used to combine two DataFrames based on one or more keys....

Stack Overflow

stackoverflow.com › questions › 53549492 › joining-two-pandas-dataframes-based-on-multiple-conditions

python - Joining two pandas dataframes based on multiple conditions - Stack Overflow

Top answer

1 of 2

24

You need an inner merge, specifying both merge columns in each case:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3

2 of 2

2

You can still achieve this with a left join which is very ideal.
See below:

final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);

Bioinformatics Answers

biostars.org › p › 168010

Conditionally merge columns using Pandas

Top answer

1 of 1

1

This code works: · df['Type'] = np.where(df['Type'] == 4, "partial"+df['Program']+"_"+df['Breadth'],df['Type']) · Other solutions are welcome!

GeeksforGeeks

geeksforgeeks.org › python › merge-two-pandas-dataframes-with-complex-conditions

Merge two Pandas DataFrames with complex conditions - GeeksforGeeks

July 23, 2025 - # importing pandas as pd import pandas as pd # creating dataframes df1 = pd.DataFrame({'Name': ['John', 'Tom', 'Simon', 'Jose'], 'Age': [5, 6, 4, 5]}) df2 = pd.DataFrame({'Name': ['John', 'Tom', 'Jose'], 'Class': ['Second', 'Third', 'Fifth']}) # merging df1 and df2 with merge function # with the common column as Name df = pd.merge(df1, df2, on='Name') print(df) ... Merging two data frames with all the values in the first data frame and NaN for the not matched values from the second data frame. The same can be done to merge with all values of the second data frame what we have to do is just giv

Find elsewhere

Google Bing Mojeek

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas merge dataframes on multiple columns

Pandas Merge DataFrames on Multiple Columns - Spark By {Examples}

June 13, 2025 - To merge two pandas DataFrames on multiple columns, you can use the merge() function and specify the columns to join on using the on parameter. This

Real Python

realpython.com › pandas-merge-join-and-concat

Combining Data in pandas With merge(), .join(), and concat() – Real Python

February 7, 2023 - If you check on the original DataFrames, then you can verify whether the higher-level axis labels temp and precip were added to the appropriate rows. You’ve now learned the three most important techniques for combining data in pandas: merge() for combining data on common columns or indices

Nelsontang

nelsontang.com › blog › 2021-11-25-pandas-conditional-merging

Python Pandas - Merging With Wildcards and Conditions – banditkings

November 25, 2021 - # Instead of one long string, we can break up the query into multiple parts condition1 = "(brand_x == brand_y | brand_y=='ANY')" condition2 = "(start_date <= date <= end_date)" qry = condition1 + " & " + condition2 results = results.query(qry) Which gives us our result (skipping the part where you drop some columns for clarity) Since we had \(4\) rows in the left dataframe and \(2\) in the right, the result of the Cartesian Product is \(4 \times 2\) or \(8\) rows long.

GeeksforGeeks

geeksforgeeks.org › data analysis › how-to-merge-dataframes-based-on-an-or-condition

How to merge dataframes based on an "OR" condition - GeeksforGeeks

July 23, 2025 - Merge based on condition df1['A'] == df2['A']: A B_x C B_y D 0 1 a 10.0 NaN NaN 1 2 b 20.0 NaN NaN 2 3 c 30.0 c 300.0 3 4 d 40.0 d 400.0 4 5 NaN NaN e 500.0 5 6 NaN NaN f 600.0 Merge based on condition df1['B'] == df2['B']: A_x B C A_y D 0 1.0 a 10.0 NaN NaN 1 2.0 b 20.0 NaN NaN 2 3.0 c 30.0 3.0 300.0 3 4.0 d 40.0 4.0 400.0 4 NaN e NaN 5.0 500.0 5 NaN f NaN 6.0 600.0 · Next, we concatenate the results of the individual merges:

Pandas

pandas.pydata.org › docs › dev › user_guide › merging.html

Merge, join, concatenate and compare — pandas documentation

Users who are familiar with SQL but new to pandas can reference a comparison with SQL. merge() implements common SQL style joining operations. one-to-one: joining two DataFrame objects on their indexes which must contain unique values. many-to-one: joining a unique index to one or more columns in ...

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.merge.html

pandas.merge — pandas 3.0.2 documentation

Otherwise if joining indexes on ... a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed. ... If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results. ... First pandas object to ...

Stack Overflow

stackoverflow.com › questions › 13596419 › how-to-combine-two-columns-with-an-if-else-in-python-pandas

how to combine two columns with an if/else in python pandas? - Stack Overflow

Top answer

1 of 2

18

In pandas >= 0.10.0 try

df['year'] = df['year'].where(source_years!=0,df['year'])

and see:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)

2 of 2

10

Perhaps try np.where:

import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])

Stack Overflow

stackoverflow.com › questions › 41815079 › pandas-merge-join-two-data-frames-on-multiple-columns

python - pandas: merge (join) two data frames on multiple columns - Stack Overflow

Top answer

1 of 6

637

Try this

new_df = pd.merge(
    left=A_df, 
    right=B_df,
    how='left',
    left_on=['A_c1', 'c2'],
    right_on=['B_c1', 'c2'],
)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs

2 of 6

28

It merges according to the ordering of left_on and right_on, i.e., the i-th element of left_on will match with the i-th of right_on.

In the example below, the code on the top matches A_col1 with B_col1 and A_col2 with B_col2, while the code on the bottom matches A_col1 with B_col2 and A_col2 with B_col1. Evidently, the results are different.
As can be seen from the above example, if the merge keys have different names, all keys will show up as their individual columns in the merged dataframe. In the example above, in the top dataframe, A_col1 and B_col1 are identical and A_col2 and B_col2 are identical. In the bottom dataframe, A_col1 and B_col2 are identical and A_col2 and B_col1 are identical. Since these are duplicate columns, they are most likely not needed. One way to not have this problem from the beginning is to make the merge keys identical from the beginning. See bullet point #3 below.
If left_on and right_on are the same col1 and col2, we can use on=['col1', 'col2']. In this case, no merge keys are duplicated.
```
df1.merge(df2, on=['col1', 'col2'])
```
You can also merge one side on column names and the other side on index too. For example, in the example below, df1's columns are matched with df2's indices. If the indices are named, as in the example below, you can reference them by name but if not, you can also use right_index=True (or left_index=True if the left dataframe is the one being merged on index).
```
df1.merge(df2, left_on=['A_col1', 'A_col2'], right_index=True)
# or
df1.merge(df2, left_on=['A_col1', 'A_col2'], right_on=['B_col1', 'B_col2'])
```
By using the how= parameter, you can perform LEFT JOIN (how='left'), FULL OUTER JOIN (how='outer') and RIGHT JOIN (how='right') as well. The default is INNER JOIN (how='inner') as in the examples above.
If you have more than 2 dataframes to merge and the merge keys are the same across all of them, then join method is more efficient than merge because you can pass a list of dataframes and join on indices. Note that the index names are the same across all dataframes in the example below (col1 and col2). Note that the indices don't have to have names; if the indices don't have names, then the number of the multi-indices must match (in the case below there are 2 multi-indices). Again, as in bullet point #1, the match occurs according to the ordering of the indices.
```
df1.join([df2, df3], how='inner').reset_index()
```

reddit.com › r/learnpython › can i merge two pandas dataframes with complex conditions.

r/learnpython on Reddit: Can I merge two Pandas DataFrames with complex conditions.

July 14, 2021 -

I have 2 demo DataFrame as followings:

a =

Name	Value
A	1
B	2
C	3

b =

Name_ID	position_X	position_Y
A	x1	y1
A(a)	x2	y2
B	x3	y3
C_1	x4	y4
C; C_1	x5	y5

The elements in Names and Name_ID column actually represents the same thing.For better understanding I meanA = A, A(a) ;B = B ;C = C_1, C;C_1

When try to merge the dataframes, I use the code below and failed.

pd.merge(a, b, left_on='Name', right_on='NameID', how='left')

The result I want is actually:

Name	position_X	position_Y	Value
A	x1	y1	1
A(a)	x2	y2	1
B	x3	y3	2
C_1	x4	y4	3
C; C_1	x5	y5	3

Is there a way to write conditions in merge to say something like:

for x in a['Name']:
    for y in b['Name_ID']:
        if x in y:
            y = x

and merge into the same dataframe?
Thank you in advanced!

Top answer

1 of 3

2

For merge to work there must be matching key columns. You can extract the 'Name' values from 'Name_ID' to get a valid key column to merge on. from io import StringIO import pandas as pd a = """ Name Value A 1 B 2 C 3 """ b = """ Name_ID position_X position_Y A x1 y1 A(a) x2 y2 B x3 y3 C_1 x4 y4 C; C_1 x5 y5 """ df_a = pd.read_csv(StringIO(a), sep='\t') df_b = pd.read_csv(StringIO(b), sep='\t') df_b['Name'] = df_b['Name_ID'].str.extract(r'([A-Z])') df = df_a.merge(df_b, how='left') print(df) Name Value Name_ID position_X position_Y 0 A 1 A x1 y1 1 A 1 A(a) x2 y2 2 B 2 B x3 y3 3 C 3 C_1 x4 y4 4 C 3 C; C_1 x5 y5

2 of 3

1

firstly its 'Name_ID' not NameID :P. if u use "right_on='NameID' " it will explode. so what exactly are u trying to get ?

Stack Exchange

datascience.stackexchange.com › questions › 42131 › joining-two-dataframes-on-the-basis-of-specific-conditions

python - Joining two dataframes on the basis of specific conditions - Data Science Stack Exchange

Top answer

1 of 1

2

Follow mentioned link for your problems.

https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.combine.html

pandas.DataFrame.combine — pandas 3.0.2 documentation

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two. ... The DataFrame to merge column-wise.

GeeksforGeeks

geeksforgeeks.org › merge-two-pandas-dataframes-on-certain-columns

Merge two Pandas DataFrames on certain columns - GeeksforGeeks

November 28, 2024 - Solution #1: We can use simple indexing operation to select all those values in the column which satisfies the given condition.Â Python3 # importing pandas as pd import pandas as pd # Creat · 2 min read Using dictionary to remap values in Pandas DataFrame columns · While working with data in Pandas, we often need to modify or transform values in specific columns.

Saturn Cloud

saturncloud.io › blog › how-to-merge-join-two-data-frames-on-multiple-columns-using-pandas

How to Merge Two Data Frames on Multiple Columns using Pandas | Saturn Cloud Blog

December 19, 2023 - The key1 and key2 columns are the common columns that we will use for merging the data frames. Once we have created the data frames, we can merge them using the merge() function in Pandas.