Use merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).
df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})
df1 df2
A B A B
0 3 7 0 1 4
1 2 8 1 5 1
2 1 9 2 6 8
3 4 5 3 4 5
With these data frames we should see:
df1.loc[2]matchesAondf2.loc[0]df1.loc[1]matchesBondf2.loc[2]df1.loc[3]matches bothAandBondf2.loc[3]
We'll use suffixes to keep track of what matched where:
suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']
df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A),
df1.merge(df2, on='B', suffixes=suff_B)])
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
1 4.0 NaN NaN NaN 5.0 5.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.
duplicates = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~duplicates]
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
Answer from andrew_reece on Stack OverflowUse merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).
df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})
df1 df2
A B A B
0 3 7 0 1 4
1 2 8 1 5 1
2 1 9 2 6 8
3 4 5 3 4 5
With these data frames we should see:
df1.loc[2]matchesAondf2.loc[0]df1.loc[1]matchesBondf2.loc[2]df1.loc[3]matches bothAandBondf2.loc[3]
We'll use suffixes to keep track of what matched where:
suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']
df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A),
df1.merge(df2, on='B', suffixes=suff_B)])
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
1 4.0 NaN NaN NaN 5.0 5.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.
duplicates = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
df.loc[~duplicates]
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
0 1.0 NaN NaN NaN 9.0 4.0
0 NaN 2.0 6.0 8.0 NaN NaN
1 NaN 4.0 4.0 5.0 NaN NaN
I would suggest this alternate way for doing merge like this. This seems easier for me.
table1["id_to_be_merged"] = table1.apply(
lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1)
You can add the same column in table2 as well if needed and then use in left_in or right_on based on your requirement.
python - Joining two pandas dataframes based on multiple conditions - Stack Overflow
how to combine two columns with an if/else in python pandas? - Stack Overflow
python - pandas: merge (join) two data frames on multiple columns - Stack Overflow
Can I merge two Pandas DataFrames with complex conditions.
Videos
You need an inner merge, specifying both merge columns in each case:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
A B C D E B_new F
0 x1 Apple 0.3 0.9 0.6 Apple 0.3
1 x1 Orange 0.1 0.5 0.2 Orange 0.1
2 x2 Apple 0.2 0.2 0.1 Apple 0.2
3 x2 Orange 0.3 0.4 0.9 Orange 0.3
4 x2 Mango 0.1 0.2 0.3 Mango 0.1
5 x3 Orange 0.3 0.1 0.2 Orange 0.3
You can still achieve this with a left join which is very ideal.
See below:
final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);
In pandas >= 0.10.0 try
df['year'] = df['year'].where(source_years!=0,df['year'])
and see:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking
As noted in the comments, this DOES use np.where under the hood - the difference is that pandas aligns the series with the output (so for example you can only do a partial update)
Perhaps try np.where:
import numpy as np
df['year'] = np.where(source_years,source_years,df['year'])
Try this
new_df = pd.merge(
left=A_df,
right=B_df,
how='left',
left_on=['A_c1', 'c2'],
right_on=['B_c1', 'c2'],
)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs
It merges according to the ordering of
left_onandright_on, i.e., the i-th element ofleft_onwill match with the i-th ofright_on.In the example below, the code on the top matches
A_col1withB_col1andA_col2withB_col2, while the code on the bottom matchesA_col1withB_col2andA_col2withB_col1. Evidently, the results are different.
As can be seen from the above example, if the merge keys have different names, all keys will show up as their individual columns in the merged dataframe. In the example above, in the top dataframe,
A_col1andB_col1are identical andA_col2andB_col2are identical. In the bottom dataframe,A_col1andB_col2are identical andA_col2andB_col1are identical. Since these are duplicate columns, they are most likely not needed. One way to not have this problem from the beginning is to make the merge keys identical from the beginning. See bullet point #3 below.If
left_onandright_onare the samecol1andcol2, we can useon=['col1', 'col2']. In this case, no merge keys are duplicated.df1.merge(df2, on=['col1', 'col2'])
You can also merge one side on column names and the other side on index too. For example, in the example below,
df1's columns are matched withdf2's indices. If the indices are named, as in the example below, you can reference them by name but if not, you can also useright_index=True(orleft_index=Trueif the left dataframe is the one being merged on index).df1.merge(df2, left_on=['A_col1', 'A_col2'], right_index=True) # or df1.merge(df2, left_on=['A_col1', 'A_col2'], right_on=['B_col1', 'B_col2'])
By using the
how=parameter, you can performLEFT JOIN(how='left'),FULL OUTER JOIN(how='outer') andRIGHT JOIN(how='right') as well. The default isINNER JOIN(how='inner') as in the examples above.If you have more than 2 dataframes to merge and the merge keys are the same across all of them, then
joinmethod is more efficient thanmergebecause you can pass a list of dataframes and join on indices. Note that the index names are the same across all dataframes in the example below (col1andcol2). Note that the indices don't have to have names; if the indices don't have names, then the number of the multi-indices must match (in the case below there are 2 multi-indices). Again, as in bullet point #1, the match occurs according to the ordering of the indices.df1.join([df2, df3], how='inner').reset_index()
I have 2 demo DataFrame as followings:
a =
| Name | Value |
|---|---|
| A | 1 |
| B | 2 |
| C | 3 |
b =
| Name_ID | position_X | position_Y |
|---|---|---|
| A | x1 | y1 |
| A(a) | x2 | y2 |
| B | x3 | y3 |
| C_1 | x4 | y4 |
| C; C_1 | x5 | y5 |
The elements in Names and Name_ID column actually represents the same thing.For better understanding I meanA = A, A(a) ;B = B ;C = C_1, C;C_1
When try to merge the dataframes, I use the code below and failed.
pd.merge(a, b, left_on='Name', right_on='NameID', how='left')
The result I want is actually:
| Name | position_X | position_Y | Value |
|---|---|---|---|
| A | x1 | y1 | 1 |
| A(a) | x2 | y2 | 1 |
| B | x3 | y3 | 2 |
| C_1 | x4 | y4 | 3 |
| C; C_1 | x5 | y5 | 3 |
Is there a way to write conditions in merge to say something like:
for x in a['Name']:
for y in b['Name_ID']:
if x in y:
y = x
and merge into the same dataframe?
Thank you in advanced!