1. 程式人生 > >pandas中的pd.pivot_table()透視表功能

pandas中的pd.pivot_table()透視表功能

和excel一樣,pandas也有一個透視表的功能,具體demo如下:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame


#顯示所有列
pd.set_option('display.max_columns', None)
#顯示所有行
pd.set_option('display.max_rows', None)
#設定value的顯示長度為100,預設為50
pd.set_option('max_colwidth',100)


df = pd.read_excel('./sales-funnel.xlsx'
) print(df.head()) ''' Account Name Rep Manager \ 0 714466 Trantow-Barrows Craig Booker Debra Henley 1 714466 Trantow-Barrows Craig Booker Debra Henley 2 714466 Trantow-Barrows Craig Booker Debra Henley 3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley 4 146832 Kiehn-Spinka Daniel Hilton Debra Henley Product Quantity Price Status 0 CPU 1 30000 presented 1 Software 1 10000 presented 2 Maintenance 2 5000 pending 3 CPU 1 35000 declined 4 CPU 2 65000 won '''
print(pd.pivot_table(df, index=['Name'])) ''' Account Price Quantity Name Barton LLC 740150 35000 1.000000 Fritsch, Russel and Anderson 737550 35000 1.000000 Herman LLC 141962 65000 2.000000 Jerde-Hilpert 412290 5000 2.000000 Kassulke, Ondricka and Metz 307599 7000 3.000000 Keeling LLC 688981 100000 5.000000 Kiehn-Spinka 146832 65000 2.000000 Koepp Ltd 729833 35000 2.000000 Kulas Inc 218895 25000 1.500000 Purdy-Kunde 163416 30000 1.000000 Stokes LLC 239344 7500 1.000000 Trantow-Barrows 714466 15000 1.333333 對名字進行了去重,將每個人的銷售記錄取進行統計,上例是求了均值。 這是由aggfunc引數來決定的。 '''
print(pd.pivot_table(df, index=['Name'], aggfunc='sum')) ''' Account Price Quantity Name Barton LLC 740150 35000 1 Fritsch, Russel and Anderson 737550 35000 1 Herman LLC 141962 65000 2 Jerde-Hilpert 412290 5000 2 Kassulke, Ondricka and Metz 307599 7000 3 Keeling LLC 688981 100000 5 Kiehn-Spinka 146832 65000 2 Koepp Ltd 1459666 70000 4 Kulas Inc 437790 50000 3 Purdy-Kunde 163416 30000 1 Stokes LLC 478688 15000 2 Trantow-Barrows 2143398 45000 4 ''' print(pd.pivot_table(df, index=['Name', 'Rep', 'Manager'])) ''' Account ... Quantity Name Rep Manager ... Barton LLC John Smith Debra Henley 740150 ... 1.000000 Fritsch, Russel and Anderson Craig Booker Debra Henley 737550 ... 1.000000 Herman LLC Cedric Moss Fred Anderson 141962 ... 2.000000 Jerde-Hilpert John Smith Debra Henley 412290 ... 2.000000 Kassulke, Ondricka and Metz Wendy Yule Fred Anderson 307599 ... 3.000000 Keeling LLC Wendy Yule Fred Anderson 688981 ... 5.000000 Kiehn-Spinka Daniel Hilton Debra Henley 146832 ... 2.000000 Koepp Ltd Wendy Yule Fred Anderson 729833 ... 2.000000 Kulas Inc Daniel Hilton Debra Henley 218895 ... 1.500000 Purdy-Kunde Cedric Moss Fred Anderson 163416 ... 1.000000 Stokes LLC Cedric Moss Fred Anderson 239344 ... 1.000000 Trantow-Barrows Craig Booker Debra Henley 714466 ... 1.333333 ''' print(pd.pivot_table(df, index=['Manager', 'Rep'])) # manager 和 rep 之間 存在 一對多的 關係 ''' Account Price Quantity Manager Rep Debra Henley Craig Booker 720237.0 20000.000000 1.250000 Daniel Hilton 194874.0 38333.333333 1.666667 John Smith 576220.0 20000.000000 1.500000 Fred Anderson Cedric Moss 196016.5 27500.000000 1.250000 Wendy Yule 614061.5 44250.000000 3.000000 ''' print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity'])) ''' Price Quantity Manager Rep Debra Henley Craig Booker 20000.000000 1.250000 Daniel Hilton 38333.333333 1.666667 John Smith 20000.000000 1.500000 Fred Anderson Cedric Moss 27500.000000 1.250000 Wendy Yule 44250.000000 3.000000 ''' print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity'], columns=['Product'])) ''' Price ... Quantity Product CPU Maintenance ... Monitor Software Manager Rep ... Debra Henley Craig Booker 32500.0 5000.0 ... NaN 1.0 Daniel Hilton 52500.0 NaN ... NaN 1.0 John Smith 35000.0 5000.0 ... NaN NaN Fred Anderson Cedric Moss 47500.0 5000.0 ... NaN 1.0 Wendy Yule 82500.0 7000.0 ... 2.0 NaN 由以上輸出可以看出,當column指定為product之後,price和quantity進行了細分,將每個product的詳情列出。 另外還可以設定一個fill_value的引數,可以將nan填充為某個值。 ''' ''' 總結: 使用透視表之前,需要對原始資料有一個大概的瞭解,這樣生成的透視表才能夠有意義。 '''