小白學 Python 資料分析（4）：Pandas （三）資料結構 DataFrame

阿新 • • 發佈：2020-02-17

在家為國家做貢獻太無聊，不如跟我一起學點 Python

人生苦短，我用 Python

前文傳送門：

小白學 Python 資料分析（1）：資料分析基礎

小白學 Python 資料分析（2）：Pandas （一）概述

小白學 Python 資料分析（3）：Pandas （二）資料結構 Series

引言

DataFrame 是由多種型別的列構成的二維標籤資料結構。

簡單理解是類似於 Excel 、 SQL 表的結構。

DataFrame 是最常用的 Pandas 物件，與 Series 一樣，DataFrame 支援多種型別的輸入資料：

一維 ndarray、列表、字典、Series 字典

二維 numpy.ndarray
結構多維陣列或記錄多維陣列
Series
DataFrame

構建 DataFrame

同 Excel 一樣， DataFrame 擁有行標籤（ index ）和列標籤（ columns ），可以理解為 Excel 的行和列。

在構建 DataFrame 的時候，可以有選擇的傳遞 index 和 columns 引數。

這樣可以確保生成的 DataFrame 裡包含索引或列。

注意： Python > = 3.6，且 Pandas > = 0.23，資料是字典，且未指定 columns 引數時，DataFrame 的列按字典的插入順序排序。

Python < 3.6 或 Pandas < 0.23，且未指定 columns 引數時，DataFrame 的列按字典鍵的字母排序。

Series 字典或字典構建 DataFrame

先看一個簡單的示例：

d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

結果如下：

   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

在通過 Series 構建 DataFrame 的時候，生成的 index （索引）是每個 Series 索引的並集。

先把巢狀字典轉換為 Series 。如果沒有指定列， DataFrame 的列就是字典鍵的有序列表。

這裡我們在字典中使用兩個字串 one 和 two 作為字典的 key ，在構造 DataFrame 時會自動的使用我們的字典的 key 作為自己的 columns （列）。

如果我們在構造 DataFrame 手動指定索引，那麼將會使用我們自行指定的索引，示例如下：

df1 = pd.DataFrame(d, index=['d', 'b', 'a'])
print(df1)

結果如下：

   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0

如果我們同時指定 index 和 column ，那麼 DataFrame 也將會使用我們指定的索引和列，如果我們指定的 index 或者 column 不存在，將會使用 NaN 進行預設值填充，示例如下：

df2 = pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
print(df2)

結果如下：

   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN

注意：這裡有一點需要注意，指定列與資料字典一起傳遞時，傳遞的列會覆蓋字典的鍵。

在使用 Series 構建 DataFrame 時， DataFrame 會自動繼承 Series 的索引，如果沒有指定列名，預設列名是輸入 Series 的名稱。

多維陣列字典構建 DataFrame

首先，多維陣列的長度必須相同。

如果傳遞了索引引數，index 的長度必須與陣列一致。

如果沒有傳遞索引引數，那麼將會按照序列從 0 開始，自動生成，示例如下：

d1 = {'one': [1., 2., 3., 4.],
      'two': [4., 3., 2., 1.]}

df3 = pd.DataFrame(d1)
print(df3)

df4 = pd.DataFrame(d1, index=['a', 'b', 'c', 'd'])
print(df4)

結果如下：

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

列表字典構建 DataFrame

d2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

df5 = pd.DataFrame(d2)
print(df5)

df6 = pd.DataFrame(d2, index=['first', 'second'], columns=['a', 'b'])
print(df6)

結果如下：

   a   b     c
0  1   2   NaN
1  5  10  20.0

        a   b
first   1   2
second  5  10

元組字典構建 DataFrame

元組字典可以自動建立多層索引 DataFrame。

d3 = ({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
       ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
       ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
       ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
       ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

df7 = pd.DataFrame(d3)
print(df7)

結果如下：

       a              b      
       b    a    c    a     b
A B  1.0  4.0  5.0  8.0  10.0
  C  2.0  3.0  6.0  7.0   NaN
  D  NaN  NaN  NaN  NaN   9.0

提取、新增、刪除

建立好了 DataFrame 以後，我們自然是希望可以動態的操作它，那麼標準的 CRUD 操作必不可少。

獲取資料示例如下，這裡我們使用 df4 做演示：

提取

# 獲取資料
print(df4)
# 按列獲取
print(df4['one'])
# 按行獲取
print(df4.loc['a'])
print(df4.iloc[0])

df4['three'] = df4['one'] * df4['two']
df4['flag'] = df4['one'] > 2
print(df4)

結果如下：

   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

a    1.0
b    2.0
c    3.0
d    4.0
Name: one, dtype: float64

one    1.0
two    4.0
Name: a, dtype: float64

one    1.0
two    4.0
Name: a, dtype: float64

   one  two  three   flag
a  1.0  4.0    4.0  False
b  2.0  3.0    6.0  False
c  3.0  2.0    6.0   True
d  4.0  1.0    4.0   True

刪除

# 刪除資料
del df4['two']
df4.pop('three')
print(df4)

結果如下：

   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  4.0   True

增加

插入標量值，將會全部的列都插入，如下：

# 插入資料
df4['foo'] = 'bar'
print(df4)

結果如下

   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  4.0   True  bar

插入與 DataFrame 索引不同的 Series 時，以 DataFrame 的索引為準：

df4['one_trunc'] = df4['one'][:2]
print(df4)

結果如下：

   one   flag  foo  one_trunc
a  1.0  False  bar        1.0
b  2.0  False  bar        2.0
c  3.0   True  bar        NaN
d  4.0   True  bar        NaN

可以插入原生多維陣列，但長度必須與 DataFrame 索引長度一致。

可以使用 insert 方法插入資料，預設在 DataFrame 尾部插入列，但是可以手動指定插入列的位置，從 0 起算，示例如下：

df4.insert(1, 'bar', df4['one'])
print(df4)

結果如下：

   one  bar   flag  foo  one_trunc
a  1.0  1.0  False  bar        1.0
b  2.0  2.0  False  bar        2.0
c  3.0  3.0   True  bar        NaN
d  4.0  4.0   True  bar        NaN

示例程式碼

老規矩，所有的示例程式碼都會上傳至程式碼管理倉庫 Github 和 Gitee 上，方便大家取用。

示例程式碼-Github

示例程式碼-Gitee

參考

https://www.pypandas.cn/docs/getting_started/dsintro.h

小白學 Python 資料分析（4）：Pandas （三）資料結構 DataFrame

在家為國家做貢獻太無聊，不如跟我一起學點 Python 人生苦短，我用 Python 前文傳送門：小白學 Python 資料分析（1）：資料分析基礎小白學 Python 資料分析（2）：Pandas （一）概述小白學 Python 資料分析（3）：Pandas （二）資料結構 Series

小白學 Python 資料分析（1）：資料分析基礎

各位同學好，小編接下來為大家分享一些有關 Python 資料分析方面的內容，希望大家能夠喜歡。人工植入廣告： PS：小編最近兩天偷了點懶，好久沒有發原創了，最近是在 CSDN 開通了一個付費專欄，用來發布去年寫的沒有出版的書稿，感興趣的同學可以去看下（已經上傳了一部分，第一章設定為了試讀章節），主要是

小白學 Python 資料分析（2）：Pandas （一）概述

人生苦短，我用 Python 前文傳送門：小白學 Python 資料分析（1）：資料分析基礎概覽首先還是幾個官方連結放一下： Pandas 官網：https://pandas.pydata.org/ Pandas 中文網：https://www.pypandas.cn/ Pandas Githu

小白學 Python 資料分析（3）：Pandas （二）資料結構 Series

在家為國家做貢獻太無聊，不如跟我一起學點 Python 順便問一下，你們都喜歡什麼什麼樣的文章封面圖，老用這一張感覺有點醜人生苦短，我用 Python 前文傳送門：小白學 Python 資料分析（1）：資料分析基礎小白學 Python 資料分析（2）：Pandas （一）概述引言先介

小白學 Python 資料分析（5）：Pandas （四）基礎操作（1）檢視資料

小白學 Python 資料分析（6）：Pandas （五）基礎操作（2）資料選擇

人生苦短，我用 Python 前文傳送門：小白學 Python 資料分析（1）：資料分析基礎小白學 Python 資料分析（2）：Pandas （一）概述小白學 Python 資料分析（3）：Pandas （二）資料結構 Series 小白學 Python 資料分析（4）：Pandas （三）資

小白學 Python 資料分析（7）：Pandas （六）資料匯入

小白學 Python 資料分析（8）：Pandas （七）資料預處理

小白學 Python 資料分析（9）：Pandas （八）資料預處理（2）

小白學 Python 資料分析（10）：Pandas （九）資料運算

![](https://cdn.geekdigging.com/python/spider-blog/Python_logo.jpg) > 人生苦短，我用 Python 前文傳送門： [小白學 Python 資料分析（1）：資料分析基礎](https://www.geekdigging.com/2020

小白學 Python 資料分析（4）：Pandas （三）資料結構 DataFrame

引言

構建 DataFrame

Series 字典或字典構建 DataFrame

多維陣列字典構建 DataFrame

列表字典構建 DataFrame

元組字典構建 DataFrame

提取、新增、刪除

提取

刪除

增加

示例程式碼

參考

小白學 Python 資料分析（4）：Pandas （三）資料結構 DataFrame

小白學 Python 資料分析（1）：資料分析基礎

小白學 Python 資料分析（2）：Pandas （一）概述

小白學 Python 資料分析（3）：Pandas （二）資料結構 Series

小白學 Python 資料分析（5）：Pandas （四）基礎操作（1）檢視資料

小白學 Python 資料分析（6）：Pandas （五）基礎操作（2）資料選擇

小白學 Python 資料分析（7）：Pandas （六）資料匯入

小白學 Python 資料分析（8）：Pandas （七）資料預處理

小白學 Python 資料分析（9）：Pandas （八）資料預處理（2）

小白學 Python 資料分析（10）：Pandas （九）資料運算

小白學 Python 資料分析（11）：Pandas （十）資料分組

小白學 Python 資料分析（12）：Pandas （十一）資料透視表（pivot_table）

小白學 Python 資料分析（13）：Pandas （十二）資料表拼接

小白學 Python 資料分析（15）：資料視覺化概述

小白學 Python 資料分析（16）：Matplotlib（一）座標系

小白學 Python 資料分析（17）：Matplotlib（二）基礎操作

小白學 Python 資料分析（18）：Matplotlib（三）常用圖表（上）

小白學 Python 資料分析（19）：Matplotlib（四）常用圖表（下）

小白學 Python 資料分析（20）：pyecharts 概述

小白學 Python 資料分析（21）：pyecharts 好玩的圖表（系列終篇）

小白學 Python 資料分析（4）：Pandas （三）資料結構 DataFrame

引言

構建 DataFrame

Series 字典或字典構建 DataFrame

多維陣列字典構建 DataFrame

列表字典構建 DataFrame

元組字典構建 DataFrame

提取、新增、刪除

提取

刪除

增加

示例程式碼

參考

相關推薦