1. 程式人生 > >【python資料處理】pandas行列操作及聚合

【python資料處理】pandas行列操作及聚合

1.列操作 apply 

df.coulumn.function()  (df.count.mean()這種)

例子:

將Name列全部大寫 

from string import upper

df['Name'] = df.Name.apply(upper)

用lambda操作列

例子:建立一列email的供應商

df['Email Provider'] = df.Email.apply(
    lambda x: x.split('@')[-1]
    )

 2.行操作 lambda 
  

if前一行結尾\ if結尾加\ 記得要axis=1

在使用lambda操作行的時候只要不加列名就是操作行

比如列操作( df.Email.apply)而行操作(df.apply)

則使用行操作 記得要axis=1

一個簡單的判斷方法是列操作只操作自己這列,行操作一般要用好幾列的資料

 例子1: 40小時以下和40小時以上不同薪,計算出每個人總薪

import codecademylib
import pandas as pd

df = pd.read_csv('employees.csv')

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
	if row.hours_worked > 40 \
  else row.hourly_wage * row.hours_worked

  
df['total_earned'] = df.apply(total_earned, axis = 1)

print(df)

例子2 分別進行列操作和行操作

import codecademylib
import pandas as pd

orders = pd.read_csv('shoefly.csv')

print(orders.head(5))


#列
source=lambda x:'animal' \
if (x=='leather')\
else 'vegan'

orders['shoe_source']=orders.shoe_material.apply(source)
print(orders.head(5))

#行
get_lastname=lambda row:'Dear Mr. '+row.last_name\
if row.gender=='male'\
else 'Dear Ms. '+row.last_name

orders['salutation']=orders.apply(get_lastname,axis=1)
print(orders.head(5))

 例子3

import codecademylib
import pandas as pd

inventory=pd.read_csv('inventory.csv')
print(inventory.head(10))

staten_island=inventory[0:10]

product_request=staten_island.product_description
print(inventory.info())
seed_request=inventory[(inventory.product_type=='seeds')&(inventory.location=='Brooklyn')]
print(seed_request)

inventory['in_stock']=inventory.quantity.apply(lambda x:False \
                                               if(x==0)\
                                               else True
                                              )
#print(inventory.head(10))


inventory['total_value']=inventory.apply(lambda row:row.quantity*row.price,axis=1)
#print(inventory.head(10))

combine_lambda = lambda row: \
    '{} - {}'.format(row.product_type,
                     row.product_description)
inventory['full_description']=inventory.apply(combine_lambda,axis=1)
print(inventory.head(10))

3.Aggregates in Pandas 聚集

1.已經可以使用apply對每個value操作了,這一節主要是如何把一整個column的value操作得到一個值 用法一般是df.column.command

例子:cuisine_options_count=restaurants['cuisine'].nunique() 統計有多少種cuisine

mean

Average of all values in column

std

Standard deviation

median

Median

max

Maximum value in column

min

Minimum value in column

count

Number of values in column

nunique

Number of unique values in column

unique

List of unique values in column

2. df.groupby('column1').column2.measurement().reset_index()

column1是你想同值合併的,column2是你進行函式操作的列,measurement()是想apply的方法   注意:得到的型別是Series

例子1.:

得到每種鞋型的最高價

orders = pd.read_csv('orders.csv')

pricey_shoes=orders.groupby('shoe_type').price.max()

因為上一種方法得到的是series型別,索引不是index,想轉變成dataframe形式,使用reset_index()方法,一般groupby()後用

例子2: 這時型別是dataframe

pricey_shoes = orders.groupby('shoe_type').price.max().reset_index()
print(pricey_shoes)

如果簡單的函式無法達到要求 再次引入apply(lambda 函式)

例子3: 返回每種顏色的鞋子價格列表中25%處的價格

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

print(orders)
cheap_shoes=orders.groupby('shoe_color').price.apply(lambda x:np.percentile(x,25))
print(cheap_shoes)

 有時想要groupby多列

例子4:統計  擁有相同鞋型和鞋色的鞋子的訂單量

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

shoe_counts=orders.groupby(['shoe_type','shoe_color']).id.count().reset_index()
print(shoe_counts)

shoe_counts.rename(columns={'id': 'count'}, inplace=True) 

#shoe_counts.columns = ['shoe_type', 'shoe_color','count']

print(shoe_counts)

3.改變表的形態  privot 和使用groupby一樣也要reset_index

例子:

import codecademylib
import numpy as np
import pandas as pd

orders = pd.read_csv('orders.csv')

shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).id.count().reset_index()

print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True) 

shoe_counts_pivot=shoe_counts.pivot(columns='shoe_color',index='shoe_type',values='count').reset_index()

print(shoe_counts_pivot)

 

shoe_type shoe_color  
0 ballet flats black 2
1 ballet flats brown 11
2 ballet flats navy 17
3 ballet flats red 13
4 ballet flats white 7
5 sandals black 3
6 sandals brown 10
7 sandals navy 13
8 sandals red 14
9 sandals white 10
10 stilettos black 8