Pandas基礎(三):資料的篩選
阿新 • • 發佈:2018-12-21
匯入pandas庫後,初始一個DataFrame:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
輸出:
one two three four Ohio 0 1 2 3 Colorado 4 5 6 7 Utah 8 9 10 11 New York 12 13 14 15
1.簡單的列的檢視
df.three
或者:
df['three']
多列的檢視:
df[['one','three']]
2.使用loc和iloc選擇資料:
loc和iloc允許我們使用軸標籤(loc)或整數標籤(iloc)以numpy風格的語法從DataFrame中篩選出想要檢視的資料。
通過標籤篩選出單行多列的資料:
data.loc['Ohio',['two','three']]
輸出:
two 1
three 2
Name: Ohio, dtype: int64
通過整數標籤iloc選擇資料:
data.iloc[2,[3,0,1]]
輸出:
four 11
one 8
two 9
Name: Utah, dtype: int64
data.iloc[2]
輸出:
one 8
two 9
three 10
four 11
Name: Utah, dtype: int64
索引功能還可以用於切片
data.loc[:'Utah','two']
輸出:
Ohio 1
Colorado 5
Utah 9
Name: two, dtype: int64
data.iloc[:,:3]
輸出:
one two three
Ohio 0 1 2
Colorado 4 5 6
Utah 8 9 10
New York 12 13 14
data.iloc[:,:3][data.three > 5]
輸出:
one two three
Colorado 4 5 6
Utah 8 9 10
New York 12 13 14
DataFrame索引選項如下圖:
3.多個條件篩選:
data[(data.one > 4) & (data.four == 11)]
輸出:
one two three four
Utah 8 9 10 11
4.特殊條件篩選資料
我們新建一個數據表df,表結構如下:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
5 Nevada 2003 3.2
6 Oland 2004 3.2
選擇state中以’O’開始的資料:
df[df.state.str.startswith('O')]
輸出:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
6 Oland 2004 3.2
選擇Ohio和Nevada的pop資料:
df.loc[df.state.isin(['Nevada','Ohio']),['state','pop']]
輸出:
state pop
0 Ohio 1.5
1 Ohio 1.7
2 Ohio 3.6
3 Nevada 2.4
4 Nevada 2.9
5 Nevada 3.2
總之:loc是以行列的名字為索引做資料篩選;而iloc則是以行列的整數位置(index)為索引進行資料篩選。