1. 程式人生 > >Python之DataFrame資料處理

Python之DataFrame資料處理

1. 說明

 DataFrame是Pandas庫中處理表的資料結構,可看作是python中的類似資料庫的操作,是Python資料探勘中最常用的工具。下面介紹DataFrame的一些常用方法。

2. 遍歷

1) 程式碼

import pandas as pd
import math

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
print(df)
for idx,item in df.iterrows():
    print(idx)
    print(item)

2) 結果

   data1  data2 key
0      1      4   a
1      2      5   b
2      3      6   c
0
data1    1
data2    4
key      a
Name: 0, dtype: object
… 略

3. 同時遍歷兩個資料表

1) 程式碼

import pandas as pd
import math

df1=pd.DataFrame({'key':['a','b'],'data1':[1,2]})  
df2=pd.DataFrame({'key':['c','d'],'data2':[4,5]})  
for (idx1,item1),(idx2,item2) in zip(df1.iterrows(),df2.iterrows()):
    print("idx1",idx1)
    print(item1)
    print("idx2",idx2)
    print(item2)

2) 結果

('idx1', 0)
data1    1
key      a
Name: 0, dtype: object
('idx2', 0)
data2    4
key      c
Name: 0, dtype: object
('idx1', 1)
data1    2
key      b
Name: 1, dtype: object
('idx2', 1)
data2    5
key      d
Name: 1, dtype: object

4. 取一行或多行

1) 程式碼

import pandas as pd
import math

df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
df2=df1[:1]
print(df2)

2) 結果

   data1 key
0      1   a

5. 取一列或多列

1) 程式碼

import pandas as pd
import math

df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
df2=pd.DataFrame()
df2['key2']=df1['key']
print(df2)

2) 結果

  key2
0    a
1    b
2    c

6. 列連線(橫向:變寬):merge

1) 程式碼

import pandas as pd

df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
df2=pd.DataFrame({'key':['a','b','c'],'data2':[4,5,6]}) 
df3=pd.merge(df1,df2)

2) 結果

   data1 key
0      1   a
1      2   b
2      3   c
   data2 key
0      4   a
1      5   b
2      6   c
   data1 key  data2
0      1   a      4
1      2   b      5
2      3   c      6

7. 行連線(縱向:變長):concat

1) 程式碼

import pandas as pd

df1=pd.DataFrame({'key':['a','b','c'],'data':[1,2,3]})  
df2=pd.DataFrame({'key':['d','e','f'],'data':[4,5,6]}) 
df3=pd.concat([df1,df2])

2) 結果

   data key
0     1   a
1     2   b
2     3   c
   data key
0     4   d
1     5   e
2     6   f
   data key
0     1   a
1     2   b
2     3   c
0     4   d
1     5   e
2     6   f

8. 對某列做簡單變換

1) 程式碼

import pandas as pd

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
print(df)
df['data1']=df['data1']+1
print(df)

2) 結果

   data1 key
0      1   a
1      2   b
2      3   c
   data1 key
0      2   a
1      3   b
2      4   c

9. 對某列做複雜變換

1) 程式碼

import pandas as pd
import math

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
print(df)
df['data1']=df['data1'].apply(lambda x: math.sin(x))
print(df)

2) 結果

   data1 key
0      1   a
1      2   b
2      3   c
      data1 key
0  0.841471   a
1  0.909297   b
2  0.141120   c

10. 對某列做函式處理

1) 程式碼

import pandas as pd

def testme(x):
    print("???",x)
    y = x + 3000
    return y

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
print(df)
df['data1']=df['data1'].apply(testme)
print(df)

2) 結果

   data1 key
0      1   a
1      2   b
2      3   c
('???', 1)
('???', 2)
('???', 3)
   data1 key
0   3001   a
1   3002   b
2   3003   c

11. 用某幾列計算生成新列

1) 程式碼

import pandas as pd

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
print(df)
df['data3']=df['data1']+df['data2']
print(df)

2) 結果

   data1  data2 key
0      1      4   a
1      2      5   b
2      3      6   c
   data1  data2 key  data3
0      1      4   a      5
1      2      5   b      7
2      3      6   c      9

12. 用某幾列用函式生成新列

1) 程式碼

import pandas as pd
import math

def testme(x):
    print(x['data1'],x['data2'])
    return x['data1'] + x['data2']

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
print(df)
df['data3']=df.apply(testme, axis=1)
print(df)

2) 結果

   data1  data2 key
0      1      4   a
1      2      5   b
2      3      6   c
(1, 4)
(2, 5)
(3, 6)
   data1  data2 key  data3
0      1      4   a      5
1      2      5   b      7
2      3      6   c      9

13. 刪除列

1) 程式碼

import pandas as pd
import math

df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
print(df)
df=df.drop(['data2'],axis=1)
print(df)

2) 結果

   data1  data2 key
0      1      4   a
1      2      5   b
2      3      6   c
   data1 key
0      1   a
1      2   b
2      3   c

14. One-Hot變換

(把一列列舉型變為多列數值型)

1) 程式碼

import pandas as pd
import math

df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
print(df1)
df2=pd.get_dummies(df1['key'])
print(df2)
df3=pd.get_dummies(df1)
print(df3)

2) 結果

   data1 key
0      1   a
1      2   b
2      3   c
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
   data1  key_a  key_b  key_c
0      1      1      0      0
1      2      0      1      0
2      3      0      0      1

15. 其它常用方法

1) 求均值方差,中位數等

df[f].describe()

2) 求均值

df[f].mean()

3) 求方差

df[f].std()

4) 清除空值

df.dropna()

5) 填充空值

df.fillna()

技術文章定時推送
請關注公眾號:演算法學習分享