Python DataFrame 全连接(full join)的方法及示例代码

本文主要介绍Python中,将两个DataFrame数据全连接(full join)的方法,以及相关示例代码。

示例数据:

df_first = pd.DataFrame([[1, 'A',1000], [2, 'B',np.NaN],[3,np.NaN,3000],[4, 'D',8000],[5, 'E',6000]], columns=['EmpID', 'Name','Salary'])
df_second = pd.DataFrame([[1, 'A','HR','Delhi'], [8, 'B','Admin','Mumbai'],[3,'C','Finance',np.NaN],[9, 'D','Ops','Banglore'],[5, 'E','Programming',np.NaN],[10, 'K','Analytics','Mumbai']], columns=['EmpID', 'Name','Department','Location'])
print df_first
print df_second
   EmpID Name  Salary
0      1    A  1000.0
1      2    B     NaN
2      3  NaN  3000.0
3      4    D  8000.0
4      5    E  6000.0
   EmpID Name   Department  Location
0      1    A           HR     Delhi
1      8    B        Admin    Mumbai
2      3    C      Finance       NaN
3      9    D          Ops  Banglore
4      5    E  Programming       NaN
5     10    K    Analytics    Mumbai

需要combine_first和set_index来匹配由EmpID列创建:

df = df_first.set_index('EmpID').combine_first(df_second.set_index('EmpID')).reset_index()
print (df)
   EmpID   Department  Location Name  Salary
0      1           HR     Delhi    A  1000.0
1      2          NaN       NaN    B     NaN
2      3      Finance       NaN    C  3000.0
3      4          NaN       NaN    D  8000.0
4      5  Programming       NaN    E  6000.0
5      8        Admin    Mumbai    B     NaN
6      9          Ops  Banglore    D     NaN
7     10    Analytics    Mumbai    K     NaN

对于某些顺序的列需要reindex():

ColNames = pd.Index(np.concatenate([df_second.columns, df_first.columns])).drop_duplicates()
print (ColNames)
Index(['EmpID', 'Name', 'Department', 'Location', 'Salary'], dtype='object')
df = (df_first.set_index('EmpID')
      .combine_first(df_second.set_index('EmpID'))
      .reset_index()
      .reindex(columns=ColNames))
print (df)

   EmpID Name   Department  Location  Salary
0      1    A           HR     Delhi  1000.0
1      2    B          NaN       NaN     NaN
2      3    C      Finance       NaN  3000.0
3      4    D          NaN       NaN  8000.0
4      5    E  Programming       NaN  6000.0
5      8    B        Admin    Mumbai     NaN
6      9    D          Ops  Banglore     NaN
7     10    K    Analytics    Mumbai     NaN

推荐阅读
cjavapy编程之路首页