pandas.merge() is the underlying function used for all merge/join behavior.
DataFrames provide the pandas.DataFrame.merge() and pandas.DataFrame.join() methods as a convenient way to access the capabilities of pandas.merge(). For example, df1.merge(right=df2, ...) is equivalent to pandas.merge(left=df1, right=df2, ...).
These are the main differences between df.join() and df.merge():
- lookup on right table:
df1.join(df2)always joins via the index ofdf2, butdf1.merge(df2)can join to one or more columns ofdf2(default) or to the index ofdf2(withright_index=True). - lookup on left table: by default,
df1.join(df2)uses the index ofdf1anddf1.merge(df2)uses column(s) ofdf1. That can be overridden by specifyingdf1.join(df2, on=key_or_keys)ordf1.merge(df2, left_index=True). - left vs inner join:
df1.join(df2)does a left join by default (keeps all rows ofdf1), butdf.mergedoes an inner join by default (returns only matching rows ofdf1anddf2).
So, the generic approach is to use pandas.merge(df1, df2) or df1.merge(df2). But for a number of common situations (keeping all rows of df1 and joining to an index in df2), you can save some typing by using df1.join(df2) instead.
Some notes on these issues from the documentation at http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging:
mergeis a function in the pandas namespace, and it is also available as a DataFrame instance method, with the calling DataFrame being implicitly considered the left object in the join.The related
DataFrame.joinmethod, usesmergeinternally for the index-on-index and index-on-column(s) joins, but joins on indexes by default rather than trying to join on common columns (the default behavior formerge). If you are joining on index, you may wish to useDataFrame.jointo save yourself some typing.
...
Answer from Matthias Fripp on Stack OverflowThese two function calls are completely equivalent:
left.join(right, on=key_or_keys) pd.merge(left, right, left_on=key_or_keys, right_index=True, how='left', sort=False)
python - What is the difference between join and merge in Pandas? - Stack Overflow
Is it merge, concat or join? How to do that in Python?
Join, Merge, and Combine Multiple Datasets Using pandas
Pandas merge or concat
I think what you want is easier with merge semantics:
df1.merge(df2,how='outer',on='column1') column1 column2_x column2_y 0 A 0.973952 NaN 1 B 0.910973 -0.012804 2 C 0.122466 NaN 3 D 0.039503 -0.084434 4 E NaN 1.320398
To do it with concat semantics you probably want to set column1 as the index and join that way on axis 1:
df1.set_index('column1',inplace=True)
df2.set_index('column1',inplace=True)
pd.concat([df1,df2],join='outer',axis=1)
column2 column2
A 0.973952 NaN
B 0.910973 -0.012804
C 0.122466 NaN
D 0.039503 -0.084434
E NaN 1.320398 More on reddit.com Videos
pandas.merge() is the underlying function used for all merge/join behavior.
DataFrames provide the pandas.DataFrame.merge() and pandas.DataFrame.join() methods as a convenient way to access the capabilities of pandas.merge(). For example, df1.merge(right=df2, ...) is equivalent to pandas.merge(left=df1, right=df2, ...).
These are the main differences between df.join() and df.merge():
- lookup on right table:
df1.join(df2)always joins via the index ofdf2, butdf1.merge(df2)can join to one or more columns ofdf2(default) or to the index ofdf2(withright_index=True). - lookup on left table: by default,
df1.join(df2)uses the index ofdf1anddf1.merge(df2)uses column(s) ofdf1. That can be overridden by specifyingdf1.join(df2, on=key_or_keys)ordf1.merge(df2, left_index=True). - left vs inner join:
df1.join(df2)does a left join by default (keeps all rows ofdf1), butdf.mergedoes an inner join by default (returns only matching rows ofdf1anddf2).
So, the generic approach is to use pandas.merge(df1, df2) or df1.merge(df2). But for a number of common situations (keeping all rows of df1 and joining to an index in df2), you can save some typing by using df1.join(df2) instead.
Some notes on these issues from the documentation at http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging:
mergeis a function in the pandas namespace, and it is also available as a DataFrame instance method, with the calling DataFrame being implicitly considered the left object in the join.The related
DataFrame.joinmethod, usesmergeinternally for the index-on-index and index-on-column(s) joins, but joins on indexes by default rather than trying to join on common columns (the default behavior formerge). If you are joining on index, you may wish to useDataFrame.jointo save yourself some typing.
...
These two function calls are completely equivalent:
left.join(right, on=key_or_keys) pd.merge(left, right, left_on=key_or_keys, right_index=True, how='left', sort=False)
I always use join on indices:
import pandas as pd
left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]}).set_index('key')
right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]}).set_index('key')
left.join(right, lsuffix='_l', rsuffix='_r')
val_l val_r
key
foo 1 4
bar 2 5
The same functionality can be had by using merge on the columns follows:
left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]})
left.merge(right, on=('key'), suffixes=('_l', '_r'))
key val_l val_r
0 foo 1 4
1 bar 2 5
I have two similar dataframes for two different years. They have same columns. I want to combine that so that I have one big dataframe. How to do that in Python? Is it merge, concat or join? What type? Thanks