1. 程式人生 > >python pandas 聚合與分組函式

python pandas 聚合與分組函式

1 主要內容

  1. DataFrame.groupby().sum()
  2. DataFrame.groupby().agg()
  3. pandas.concat([DataFrame1,DataFrame2])
  4. pandas.merge(DataFrame1,DataFrame2,parameters….)
  5. DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
  6. 幫助文件的獲取

2 例項

  1. 構造dataframe如下所示:
  food  food_id  number     price  user_id weather
0
soup 4 6 1.818250 3 cold 1 soup 8 6 1.834045 4 hot 2 iceream 8 7 3.042422 2 cold 3 chocolate 3 6 5.247564 4 hot 4 iceream 6 3 4.319450 4 cold 5 iceream 5
4 2.912291 1 cold 6 iceream 2 7 6.118529 2 cold 7 soup 8 4 1.394939 2 hot 8 soup 6 8 2.921446 2 hot 9 chocolate 2 1 3.663618 4 hot

實現程式如下所示:

import pandas as pd
from
numpy import random from numpy.random import rand import numpy as np random.seed(42) df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10), 'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'], 'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'], 'price':10 * rand(10),'number':random.randint(1,9,10)}) print df

2 groupby函式應用
程式碼

groupby1 = df.groupby(['user_id'])
i = 0
for user_id,group in groupby1:
    i = i + 1
    print "group", i , user_id
    print group

結果

group 1 1
      food  food_id  number     price  user_id weather
5  iceream        5       4  2.912291        1    cold
group 2 2
      food  food_id  number     price  user_id weather
2  iceream        8       7  3.042422        2    cold
6  iceream        2       7  6.118529        2    cold
7     soup        8       4  1.394939        2     hot
8     soup        6       8  2.921446        2     hot
group 3 3
   food  food_id  number    price  user_id weather
0  soup        4       6  1.81825        3    cold
group 4 4
        food  food_id  number     price  user_id weather
1       soup        8       6  1.834045        4     hot
3  chocolate        3       6  5.247564        4     hot
4    iceream        6       3  4.319450        4    cold
9  chocolate        2       1  3.663618        4     hot

3 groupby和sum等函式結合使用
程式碼

print groupby1.sum()#對除了groupby索引以外的每個數值列進行求和 
print groupby1['food_id','number'].sum()#對除了groupby索引以外的特定數值列進行求和 
print df.groupby(['user_id'],as_index=False).sum()#預設as_index=True
#當然除了sum,還有mean,min,max,median,mode,std,mad等等,操作方法同理
#groupby()中的形參可用help(df.groupby)來檢視
#常用的引數axis=0,表示對行進行操作,即指定列中不同值進行分組;axis=1,表示對列進行分組

結果

output[1]:
  food_id  number      price
user_id                            
1              5       4   2.912291
2             24      26  13.477336
3              4       6   1.818250
4             19      16  15.064678
output[2]:
         food_id  number
user_id                 
1              5       4
2             24      26
3              4       6
4             19      16
output[3]:
   user_id  food_id  number      price
0        1        5       4   2.912291
1        2       24      26  13.477336
2        3        4       6   1.818250
3        4       19      16  15.064678

4 agg函式
程式碼

print df.groupby(['weather','food']).agg([np.mean,np.median])

結果

 output[4]: 
                   food_id        number            price            \
                       mean median   mean median      mean    median   
weather food                                                           
cold    iceream    5.250000    5.5   5.25    5.5  4.098173  3.680936   
        soup       4.000000    4.0   6.00    6.0  1.818250  1.818250   
hot     chocolate  2.500000    2.5   3.50    3.5  4.455591  4.455591   
        soup       7.333333    8.0   6.00    6.0  2.050143  1.834045   

                    user_id         
                       mean median  
weather food                        
cold    iceream    2.250000      2  
        soup       3.000000      3  
hot     chocolate  4.000000      4  
        soup       2.666667      2  

5 concat()
程式碼

print "df :3\n",df[:3]
print "df :4\n",df[6:]
print pd.concat([df[:3],df[6:]],axis=0)

結果

df :3
      food  food_id  number     price  user_id weather
0     soup        4       6  1.818250        3    cold
1     soup        8       6  1.834045        4     hot
2  iceream        8       7  3.042422        2    cold
df :4
        food  food_id  number     price  user_id weather
6    iceream        2       7  6.118529        2    cold
7       soup        8       4  1.394939        2     hot
8       soup        6       8  2.921446        2     hot
9  chocolate        2       1  3.663618        4     hot
df.concat
        food  food_id  number     price  user_id weather
0       soup        4       6  1.818250        3    cold
1       soup        8       6  1.834045        4     hot
2    iceream        8       7  3.042422        2    cold
6    iceream        2       7  6.118529        2    cold
7       soup        8       4  1.394939        2     hot
8       soup        6       8  2.921446        2     hot
9  chocolate        2       1  3.663618        4     hot

6 merge()和join()
程式碼

df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})
df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})

print "df1\n",df1
print "df2\n",df2
print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr')
print "inner join with Merge()\n",pd.merge(df1,df2,how='inner')
print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')

結果

df1
        Dest  EmpNr
0  The Hague      5
1  Amsterdam      3
2  Rotterdam      9
df2
   Amount  EmpNr
0    10.0      5
1     5.0      9
2     2.5      7
Merge() on Key
        Dest  EmpNr  Amount
0  The Hague      5    10.0
1  Rotterdam      9     5.0
inner join with Merge()
        Dest  EmpNr  Amount
0  The Hague      5    10.0
1  Rotterdam      9     5.0
Dests join tips
        Dest  EmpNrDest  Amount  EmpNrTips
0  The Hague          5    10.0          5
1  Amsterdam          3     5.0          9
2  Rotterdam          9     2.5          7

6幫助文件獲取方式

1.help(pd.concat)
2.dir(pd.concat)
3.pd.concat?
...

7 參考文獻
利用python進行資料分析筆記
python資料分析,Ivan Idris著