I have three Pandas dataframes, df1
, df2,
and df3
, as below:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id' : ['one', 'two', 'three'], 'score': [56, 45, 78]})
df2 = pd.DataFrame({'id' : ['one', 'five', 'four'], 'score': [35, 81, 90]})
df3 = pd.DataFrame({'id' : ['five', 'two', 'six'], 'score': [23, 66, 42]})
How can I join these dataframes based on id
and then concatenate their columns together? The desired output is as below:
#join_and_concatenate by id:
id score(df1) score(df2) score(df3)
one 56 35 NaN
two 45 NaN 66
three 78 NaN NaN
four NaN 90 NaN
five NaN 81 23
six NaN NaN 42
I found a relevant page that talks about merge()
, concatenate()
and join()
but I am not sure any of these give what I want.
There may be a better way with concat
, but this should work:
In [48]: pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')
Out[48]:
id score_x score_y score
0 one 56 35 NaN
1 two 45 NaN 66
2 three 78 NaN NaN
3 five NaN 81 23
4 four NaN 90 NaN
5 six NaN NaN 42
[6 rows x 4 columns]
To get your desired answer:
In [54]: merged = pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')
In [55]: merged.set_index('id').rename(columns={'score_x': 'score(df1)', 'score_y': 'score(df2)
', 'score': 'score(df3)'})
Out[55]:
score(df1) score(df2) score(df3)
id
one 56 35 NaN
two 45 NaN 66
three 78 NaN NaN
five NaN 81 23
four NaN 90 NaN
six NaN NaN 42
[6 rows x 3 columns]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments