pandas中的pd.pivot_table()透視表功能
阿新 • • 發佈:2018-12-14
和excel一樣,pandas也有一個透視表的功能,具體demo如下:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
#顯示所有列
pd.set_option('display.max_columns', None)
#顯示所有行
pd.set_option('display.max_rows', None)
#設定value的顯示長度為100,預設為50
pd.set_option('max_colwidth',100)
df = pd.read_excel('./sales-funnel.xlsx' )
print(df.head())
'''
Account Name Rep Manager \
0 714466 Trantow-Barrows Craig Booker Debra Henley
1 714466 Trantow-Barrows Craig Booker Debra Henley
2 714466 Trantow-Barrows Craig Booker Debra Henley
3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley
4 146832 Kiehn-Spinka Daniel Hilton Debra Henley
Product Quantity Price Status
0 CPU 1 30000 presented
1 Software 1 10000 presented
2 Maintenance 2 5000 pending
3 CPU 1 35000 declined
4 CPU 2 65000 won
'''
print(pd.pivot_table(df, index=['Name']))
'''
Account Price Quantity
Name
Barton LLC 740150 35000 1.000000
Fritsch, Russel and Anderson 737550 35000 1.000000
Herman LLC 141962 65000 2.000000
Jerde-Hilpert 412290 5000 2.000000
Kassulke, Ondricka and Metz 307599 7000 3.000000
Keeling LLC 688981 100000 5.000000
Kiehn-Spinka 146832 65000 2.000000
Koepp Ltd 729833 35000 2.000000
Kulas Inc 218895 25000 1.500000
Purdy-Kunde 163416 30000 1.000000
Stokes LLC 239344 7500 1.000000
Trantow-Barrows 714466 15000 1.333333
對名字進行了去重,將每個人的銷售記錄取進行統計,上例是求了均值。
這是由aggfunc引數來決定的。
'''
print(pd.pivot_table(df, index=['Name'], aggfunc='sum'))
'''
Account Price Quantity
Name
Barton LLC 740150 35000 1
Fritsch, Russel and Anderson 737550 35000 1
Herman LLC 141962 65000 2
Jerde-Hilpert 412290 5000 2
Kassulke, Ondricka and Metz 307599 7000 3
Keeling LLC 688981 100000 5
Kiehn-Spinka 146832 65000 2
Koepp Ltd 1459666 70000 4
Kulas Inc 437790 50000 3
Purdy-Kunde 163416 30000 1
Stokes LLC 478688 15000 2
Trantow-Barrows 2143398 45000 4
'''
print(pd.pivot_table(df, index=['Name', 'Rep', 'Manager']))
'''
Account ... Quantity
Name Rep Manager ...
Barton LLC John Smith Debra Henley 740150 ... 1.000000
Fritsch, Russel and Anderson Craig Booker Debra Henley 737550 ... 1.000000
Herman LLC Cedric Moss Fred Anderson 141962 ... 2.000000
Jerde-Hilpert John Smith Debra Henley 412290 ... 2.000000
Kassulke, Ondricka and Metz Wendy Yule Fred Anderson 307599 ... 3.000000
Keeling LLC Wendy Yule Fred Anderson 688981 ... 5.000000
Kiehn-Spinka Daniel Hilton Debra Henley 146832 ... 2.000000
Koepp Ltd Wendy Yule Fred Anderson 729833 ... 2.000000
Kulas Inc Daniel Hilton Debra Henley 218895 ... 1.500000
Purdy-Kunde Cedric Moss Fred Anderson 163416 ... 1.000000
Stokes LLC Cedric Moss Fred Anderson 239344 ... 1.000000
Trantow-Barrows Craig Booker Debra Henley 714466 ... 1.333333
'''
print(pd.pivot_table(df, index=['Manager', 'Rep']))
# manager 和 rep 之間 存在 一對多的 關係
'''
Account Price Quantity
Manager Rep
Debra Henley Craig Booker 720237.0 20000.000000 1.250000
Daniel Hilton 194874.0 38333.333333 1.666667
John Smith 576220.0 20000.000000 1.500000
Fred Anderson Cedric Moss 196016.5 27500.000000 1.250000
Wendy Yule 614061.5 44250.000000 3.000000
'''
print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity']))
'''
Price Quantity
Manager Rep
Debra Henley Craig Booker 20000.000000 1.250000
Daniel Hilton 38333.333333 1.666667
John Smith 20000.000000 1.500000
Fred Anderson Cedric Moss 27500.000000 1.250000
Wendy Yule 44250.000000 3.000000
'''
print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity'], columns=['Product']))
'''
Price ... Quantity
Product CPU Maintenance ... Monitor Software
Manager Rep ...
Debra Henley Craig Booker 32500.0 5000.0 ... NaN 1.0
Daniel Hilton 52500.0 NaN ... NaN 1.0
John Smith 35000.0 5000.0 ... NaN NaN
Fred Anderson Cedric Moss 47500.0 5000.0 ... NaN 1.0
Wendy Yule 82500.0 7000.0 ... 2.0 NaN
由以上輸出可以看出,當column指定為product之後,price和quantity進行了細分,將每個product的詳情列出。
另外還可以設定一個fill_value的引數,可以將nan填充為某個值。
'''
'''
總結:
使用透視表之前,需要對原始資料有一個大概的瞭解,這樣生成的透視表才能夠有意義。
'''