Pandas:按列合併資料集--merge函式
阿新 • • 發佈:2019-02-12
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
df1 = DataFrame({'key':['b','b','a','c','a','a','b'],
'data1':range(7)})
df2 = DataFrame({'key':['a','b','d'],
'data2':range(3)})
print(df1)
data1 key 0 0 b 1 1 b 2 2 a 3 3 c 4 4 a 5 5 a 6 6 b
print(df2)
data2 key
0 0 a
1 1 b
2 2 d
一、預設情況下按相同列名合併
print(pd.merge(df1,df2))
data1 key data2
0 0 b 1
1 1 b 1
2 6 b 1
3 2 a 0
4 4 a 0
5 5 a 0
二、顯示指定進行合併的列名
1.雙方有公共列名
print(pd.merge(df1,df2,on='key' ))
data1 key data2
0 0 b 1
1 1 b 1
2 6 b 1
3 2 a 0
4 4 a 0
5 5 a 0
2.雙方無公共列名
df3 = DataFrame({'lkey':['b','b','a','c','a','a','b'],
'data1':range(7)})
df4 = DataFrame({'rkey':['a','b','d'],
'data2' :range(3)})
print(pd.merge(df3,df4,left_on='lkey',right_on='rkey'))
data1 lkey data2 rkey
0 0 b 1 b
1 1 b 1 b
2 6 b 1 b
3 2 a 0 a
4 4 a 0 a
5 5 a 0 a
三、根據多個列名進行合併
只有多個列完全相同的行才進行合併
left = DataFrame({'key1':['foo','foo','bar'],
'key2':['one','two','one'],
'lval':[1,2,3]})
right = DataFrame({'key1':['foo','foo','bar','bar'],
'key2':['one','one','one','two'],
'rval':[4,5,6,7]})
print(pd.merge(left,right,on=['key1','key2']))
key1 key2 lval rval
0 foo one 1 4
1 foo one 1 5
2 bar one 3 6
四、指定合併的方式:inner、outer、left、right
print(pd.merge(df1,df2,how='outer'))
data1 key data2
0 0.0 b 1.0
1 1.0 b 1.0
2 6.0 b 1.0
3 2.0 a 0.0
4 4.0 a 0.0
5 5.0 a 0.0
6 3.0 c NaN
7 NaN d 2.0
五、處理重複列名
1.預設情況下
print(pd.merge(left,right,on='key1')) # key2是重複列名
key1 key2_x lval key2_y rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7
2.使用shuffixes引數
print(pd.merge(left,right,on='key1',suffixes=['_left','_right']))
key1 key2_left lval key2_right rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7