1. 程式人生 > >Python程式設計入門學習筆記(九)

Python程式設計入門學習筆記(九)

## Python第四課

### 新的資料格式:CSV

- 純文字,使用某個字符集,比如ACSII,Unicode,EBCDIC或GB2312(簡體中文環境)等;
- 由記錄組成(典型的是每行一條記錄);
- 每條記錄被分隔符(英語:Delimiter)分隔為欄位(英語:Field(computer science))(典型分隔符有逗號、分號或製表符;有時分隔符可以包括可選的空格);
- 每條記錄都有同樣的欄位序列。

#### pandas


```python
import pandas as pd
import numpy as np
```


```python
f = open('K:/Code/jupyter-notebook/Python Study/成績表.csv')
df = pd.read_csv(f)
```


```python
#head預設讀取前5行
df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孫明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陳平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>劉東</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
    </tr>
  </tbody>
</table>
</div>




```python
type(df)
```




    pandas.core.frame.DataFrame



### DataFrame


```python
# 列名
print(df.columns)
# 索引
print(df.index)
```

    Index(['學號', '姓名', '性別', '年齡', '班級', '計算機', '英語', '數學', '語文', '物理', '化學'], dtype='object')
    RangeIndex(start=0, stop=8, step=1)
    


```python
df.loc[0]
```




    學號        1
    姓名      張小文
    性別        男
    年齡       20
    班級     1002
    計算機      56
    英語       62
    數學       86
    語文       85
    物理       86
    化學       75
    Name: 0, dtype: object




```python
# 篩選數學成績大於80的
df[df.數學 > 80]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
  </tbody>
</table>
</div>




```python
df[df.數學 < 70]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黃佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
    </tr>
  </tbody>
</table>
</div>




```python
# 複雜篩選
df[(df.語文 >= 80) & (df.數學 >= 80) & (df.英語 >= 80)]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孫明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>



### 排序


```python
df.sort_values(['數學','語文']).head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黃佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
    </tr>
    <tr>
      <th>6</th>
      <td>7</td>
      <td>王大力</td>
      <td>男</td>
      <td>18</td>
      <td>1003</td>
      <td>85</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>84</td>
      <td>69</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>劉東</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
    </tr>
    <tr>
      <th>5</th>
      <td>6</td>
      <td>嚴雲峰</td>
      <td>男</td>
      <td>19</td>
      <td>1001</td>
      <td>84</td>
      <td>87</td>
      <td>77</td>
      <td>80</td>
      <td>70</td>
      <td>81</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陳平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
    </tr>
  </tbody>
</table>
</div>



### 訪問


```python
# 按照索引定位
df.loc[1]
```




    學號        2
    姓名       李清
    性別        女
    年齡       19
    班級     1001
    計算機      94
    英語       65
    數學       85
    語文       90
    物理       84
    化學       75
    Name: 1, dtype: object



### 索引


```python
scores = {
    '英語': [90,70,89],
    '數學': [64,78,48],
    '姓名': ['wang','li','sun']
}
df = pd.DataFrame(scores, index = ['one','two','three'])
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>英語</th>
      <th>數學</th>
      <th>姓名</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>one</th>
      <td>90</td>
      <td>64</td>
      <td>wang</td>
    </tr>
    <tr>
      <th>two</th>
      <td>70</td>
      <td>78</td>
      <td>li</td>
    </tr>
    <tr>
      <th>three</th>
      <td>89</td>
      <td>48</td>
      <td>sun</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.index
```




    Index(['one', 'two', 'three'], dtype='object')




```python
df.loc['one']
```




    英語      90
    數學      64
    姓名    wang
    Name: one, dtype: object




```python
# 實實在在的所謂的第幾行,當索引不是數字索引時使用
df.iloc[0]
```




    英語      90
    數學      64
    姓名    wang
    Name: one, dtype: object




```python
# 合併了loc和iloc的功能
df.ix[0]
```

    c:\python\python36\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
    .ix is deprecated. Please use
    .loc for label based indexing or
    .iloc for positional indexing
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
      """Entry point for launching an IPython kernel.
    




    英語      90
    數學      64
    姓名    wang
    Name: one, dtype: object




```python
df.loc[:2]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孫明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.iloc[:3]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孫明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
    </tr>
  </tbody>
</table>
</div>




```python
# 訪問某一行,是錯誤的
# df[0]

#訪問多行資料是可以使用切片的
df[:2]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
    </tr>
  </tbody>
</table>
</div>




```python
# dataFrame中的陣列
df.values
```




    array([[1, '張小文', '男', 20, 1002, 56, 62, 86, 85, 86, 75],
           [2, '李清', '女', 19, 1001, 94, 65, 85, 90, 84, 75],
           [3, '孫明', '男', 19, 1003, 74, 85, 80, 84, 86, 91],
           [4, '陳平', '男', 8, 1003, 85, 75, 78, 73, 86, 81],
           [5, '劉東', '男', 20, 1001, 88, 74, 77, 65, 85, 71],
           [6, '嚴雲峰', '男', 19, 1001, 84, 87, 77, 80, 70, 81],
           [7, '王大力', '男', 18, 1003, 85, 85, 75, 78, 84, 69],
           [8, '黃佳', '女', 20, 1002, 81, 78, 58, 84, 90, 82]], dtype=object)




```python
df.數學.values
```




    array([86, 85, 80, 78, 77, 77, 75, 58], dtype=int64)




```python
# 簡單的統計
df.數學.value_counts()
```




    77    2
    78    1
    75    1
    58    1
    86    1
    85    1
    80    1
    Name: 數學, dtype: int64




```python
new = df[['數學','語文']].head()
new
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>數學</th>
      <th>語文</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>86</td>
      <td>85</td>
    </tr>
    <tr>
      <th>1</th>
      <td>85</td>
      <td>90</td>
    </tr>
    <tr>
      <th>2</th>
      <td>80</td>
      <td>84</td>
    </tr>
    <tr>
      <th>3</th>
      <td>78</td>
      <td>73</td>
    </tr>
    <tr>
      <th>4</th>
      <td>77</td>
      <td>65</td>
    </tr>
  </tbody>
</table>
</div>




```python
new * 2
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>數學</th>
      <th>語文</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>172</td>
      <td>170</td>
    </tr>
    <tr>
      <th>1</th>
      <td>170</td>
      <td>180</td>
    </tr>
    <tr>
      <th>2</th>
      <td>160</td>
      <td>168</td>
    </tr>
    <tr>
      <th>3</th>
      <td>156</td>
      <td>146</td>
    </tr>
    <tr>
      <th>4</th>
      <td>154</td>
      <td>130</td>
    </tr>
  </tbody>
</table>
</div>



### 重點


```python
def func(score):
    if score>=80:
        return '優秀'
    elif score>=70:
        return '良'
    elif score>=60:
        return '及格'
    else:
        return '不及格'
df['數學分類'] = df.數學.map(func)
```


```python
df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
      <th>數學分類</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
      <td>優秀</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
      <td>優秀</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>孫明</td>
      <td>男</td>
      <td>19</td>
      <td>1003</td>
      <td>74</td>
      <td>85</td>
      <td>80</td>
      <td>84</td>
      <td>86</td>
      <td>91</td>
      <td>優秀</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>陳平</td>
      <td>男</td>
      <td>8</td>
      <td>1003</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>73</td>
      <td>86</td>
      <td>81</td>
      <td>良</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>劉東</td>
      <td>男</td>
      <td>20</td>
      <td>1001</td>
      <td>88</td>
      <td>74</td>
      <td>77</td>
      <td>65</td>
      <td>85</td>
      <td>71</td>
      <td>良</td>
    </tr>
  </tbody>
</table>
</div>




```python
# applymap對dataFrame中所有的資料進行操作的一個函式,非常重要
def func(number):
    return number + 10
# 等價
func = lambda number: number + 10

df.applymap(lambda x: str(x) + ' -').head(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
      <th>數學分類</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1 -</td>
      <td>張小文 -</td>
      <td>男 -</td>
      <td>20 -</td>
      <td>1002 -</td>
      <td>56 -</td>
      <td>62 -</td>
      <td>86 -</td>
      <td>85 -</td>
      <td>86 -</td>
      <td>75 -</td>
      <td>優秀 -</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2 -</td>
      <td>李清 -</td>
      <td>女 -</td>
      <td>19 -</td>
      <td>1001 -</td>
      <td>94 -</td>
      <td>65 -</td>
      <td>85 -</td>
      <td>90 -</td>
      <td>84 -</td>
      <td>75 -</td>
      <td>優秀 -</td>
    </tr>
  </tbody>
</table>
</div>



### 匿名函式


```python
[i+ 100 for i in range(10)]
```




    [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]




```python
def func(x):
    return x + 100
```


```python
list(map(func,range(10)))
# 函式太簡單,不經常使用,或者沒有必要取名字就可以使用匿名函式lambda
list(map(lambda x: x + 100,range(10)))
```




    [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]




```python
# 根據多列生成新的一個列的操作,用apply函式
df['new_score'] = df.apply(lambda x: x.數學 + x.語文, axis = 1)
```


```python
#前幾行
df.head(2)
#最後幾行
df.tail(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
      <th>數學分類</th>
      <th>new_score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>6</th>
      <td>7</td>
      <td>王大力</td>
      <td>男</td>
      <td>18</td>
      <td>1003</td>
      <td>85</td>
      <td>85</td>
      <td>75</td>
      <td>78</td>
      <td>84</td>
      <td>69</td>
      <td>良</td>
      <td>153</td>
    </tr>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>黃佳</td>
      <td>女</td>
      <td>20</td>
      <td>1002</td>
      <td>81</td>
      <td>78</td>
      <td>58</td>
      <td>84</td>
      <td>90</td>
      <td>82</td>
      <td>不及格</td>
      <td>142</td>
    </tr>
  </tbody>
</table>
</div>



### pandas中的dataFrame的操作,很大一部分和numpy中的二維陣列的操作是近似的

<h1 style="text-align:center">matplotlib繪圖 </h1>


```python
df = df.drop(['new_score'],axis = 1)
```


```python
df.head(2)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>學號</th>
      <th>姓名</th>
      <th>性別</th>
      <th>年齡</th>
      <th>班級</th>
      <th>計算機</th>
      <th>英語</th>
      <th>數學</th>
      <th>語文</th>
      <th>物理</th>
      <th>化學</th>
      <th>數學分類</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>張小文</td>
      <td>男</td>
      <td>20</td>
      <td>1002</td>
      <td>56</td>
      <td>62</td>
      <td>86</td>
      <td>85</td>
      <td>86</td>
      <td>75</td>
      <td>優秀</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>李清</td>
      <td>女</td>
      <td>19</td>
      <td>1001</td>
      <td>94</td>
      <td>65</td>
      <td>85</td>
      <td>90</td>
      <td>84</td>
      <td>75</td>
      <td>優秀</td>
    </tr>
  </tbody>
</table>
</div>



### 繪圖


```python
import numpy as np
import matplotlib.pyplot as plt
#這一行是必不可少的
%matplotlib inline 
```


```python
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.plot(x, np.cos(x))
```




    [<matplotlib.lines.Line2D at 0x1b3061cc7f0>]




![png](output_48_1.png)



```python
plt.plot(x, y, '--')
```




    [<matplotlib.lines.Line2D at 0x1b3082c71d0>]




![png](output_49_1.png)



```python
fig = plt.figure()
plt.plot(x, y, '--')
```




    [<matplotlib.lines.Line2D at 0x1b30832ca58>]




![png](output_50_1.png)



```python
fig.savefig('K:/Code/jupyter-notebook/Python Study/first_figure.png')
```


```python
# 虛線樣式
plt.subplot(2,1,1)
plt.plot(x, np.sin(x),'--')

plt.subplot(2,1,2)
plt.plot(x, np.cos(x),)
```




    [<matplotlib.lines.Line2D at 0x1b308395198>]




![png](output_52_1.png)



```python
# 點狀樣式
x = np.linspace(0,10,20)
plt.plot(x, np.sin(x),'o')
```




    [<matplotlib.lines.Line2D at 0x1b3084f4940>]




![png](output_53_1.png)



```python
# color控制顏色
x = np.linspace(0,10,20)
plt.plot(x, np.sin(x),'o',color= 'red')
```




    [<matplotlib.lines.Line2D at 0x1b30855bef0>]




![png](output_54_1.png)



```python
# 加label標籤
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y,'--',label='sin(x)')
plt.plot(x, np.cos(x),'o',label='cos(x)')
# legend控制label的顯示效果,loc是控制label的位置的顯示
plt.legend(loc= 1 )
```




    <matplotlib.legend.Legend at 0x1b309907198>




![png](output_55_1.png)



```python
plt.legend?
##當遇到一個不熟悉的函式的時候,多使用?號,檢視函式的文件
```


```python
# plot函式,可定義的引數非常多
x = np.linspace(0, 10, 20)
y = np.sin(x)
plt.plot(x,y,'-p',color = 'green',
        markersize = 10,linewidth = 4,
        markeredgecolor = 'orange',
        markeredgewidth=2)
plt.ylim(-0.5,0.8)
```




    (-0.5, 0.8)




![png](output_57_1.png)



```python
# 具體引數可檢視文件
plt.plot?
```


```python
# ylim,xlim限定函式
plt.plot(x,y,'-p',color = 'green',
        markersize = 10,linewidth = 4,
        markeredgecolor = 'orange',
        markeredgewidth=2)
plt.ylim(-0.5,1.2)
plt.xlim(2,8)
```




    (2, 8)




![png](output_59_1.png)



```python
#散點圖函式
plt.scatter(x,y,s=100,c='red')
```




    <matplotlib.collections.PathCollection at 0x1b309da0c88>




![png](output_60_1.png)



```python
plt.style.use('classic')

x = np.random.randn(100)
y = np.random.randn(100)
colors = np.random.randn(100)
sizes = 1000 * np.random.randn(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.4)
plt.colorbar()
```

    c:\python\python36\lib\site-packages\matplotlib\collections.py:902: RuntimeWarning: invalid value encountered in sqrt
      scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
    




    <matplotlib.colorbar.Colorbar at 0x1b309fe4f98>




![png](output_61_2.png)


### pandas本身自帶繪圖

### 線性圖形


```python
import pandas as pd
df = pd.DataFrame(np.random.randn(100,4).cumsum(0),columns=['A','B','C','D'])
df.plot()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c0c88d0>




![png](output_64_1.png)


### 柱狀圖形


```python
df = pd.DataFrame(np.random.randint(10,50,(3,4)),columns=['A','B','C','D'],index = ['one','two','three'])
df.plot.bar()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c284898>




![png](output_66_1.png)



```python
df.B.plot.bar()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c16c9b0>




![png](output_67_1.png)



```python
# 等價於上面的繪製
df.plot(kind = 'bar')
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c190898>




![png](output_68_1.png)



```python
# 進行累加
df.plot(kind = 'bar',stacked = True)
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30c223978>




![png](output_69_1.png)


### 直方圖


```python
df = pd.DataFrame(np.random.randn(100,4),columns=['A','B','C','D'])
df.hist(column='A',grid=True,figsize=(10,5))
```




    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001B30DE24DD8>]],
          dtype=object)




![png](output_71_1.png)


### 密度圖


```python
# 等價於df.plot(kind = 'kde')
# 提示:執行前,需要安裝scipy庫,用pip install scipy命令,否則提示:ModuleNotFoundError: No module named 'scipy'
df.plot.kde()
```




    <matplotlib.axes._subplots.AxesSubplot at 0x1b30e082d30>




![png](output_73_1.png)


### matplotlib 繪製三維圖


```python
from mpl_toolkits.mplot3d import Axes3D  
from matplotlib import cm  
from matplotlib.ticker import LinearLocator, FormatStrFormatter  
import matplotlib.pyplot as plt  
import numpy as np  
 
fig = plt.figure()  
ax = fig.gca(projection='3d') 
#橫座標區間,內部不能重複
X = np.arange(-5, 5, 0.25)
#縱座標區間,內部不能重複
Y = np.arange(-5, 5, 0.25)
#生成網格
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)  
Z = np.sin(R)  

#plot the surface z axis
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,  
        linewidth=0, antialiased=False)  

#Customize the 
ax.set_zlim(-1.01, 1.01)  
ax.zaxis.set_major_locator(LinearLocator(10))  
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))  
 
# Add a color bar which maps values to colors
fig.colorbar(surf, shrink=0.5, aspect=5)  
 
plt.show() 
```


![png](output_75_0.png)