Cris 的 Python 資料分析筆記 01：NumPy 基本知識

阿新 • • 發佈：2018-11-19

01. NumPy基本知識

文章目錄

01. NumPy基本知識

1. numpy 的第一個函式 genfromtxt
2. numpy 的第二個函式 array
3. numpy 的第三個函式 shape
4. numpy 的 ndarray 資料型別的 dtype 屬性
5. numpy 的 ndarray 資料型別如何取值
6. numpy 的 ndarray 切片
7. numpy 的二維陣列切片

1. numpy 的第一個函式 genfromtxt

import numpy as np

world_alcohol = np.genfromtxt('world_alcohol.txt',delimiter=',',dtype='str')
# <class 'numpy.ndarray'>
print(type(world_alcohol))
print(world_alcohol)
print(help(np.genfromtxt))

<class 'numpy.ndarray'>
[['Year' 'WHO region' 'Country' 'Beverage Types' 'Display Value']
 ['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ...
 ['1987' 'Africa' 'Malawi' 'Other' '0.75']
 ['1989' 'Americas' 'Bahamas' 'Wine' '1.5']
 ['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]
Help on function genfromtxt in module numpy.lib.npyio:

genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')
    Load data from a text file, with missing values handled as specified.
    
    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.
    
    Parameters
    ----------
    fname : file, str, pathlib.Path, list of str, generator
        File, filename, list, or generator to read.  If the filename
        extension is `.gz` or `.bz2`, the file is first decompressed. Note
        that generators must return byte strings in Python 3k.  The strings
        in a list or produced by a generator are treated as lines.
    dtype : dtype, optional
        Data type of the resulting array.
        If None, the dtypes will be determined by the contents of each
        column, individually.
    comments : str, optional
        The character used to indicate the start of a comment.
        All the characters occurring on a line after a comment are discarded
    delimiter : str, int, or sequence, optional
        The string used to separate values.  By default, any consecutive
        whitespaces act as delimiter.  An integer or sequence of integers
        can also be provided as width(s) of each field.
    skiprows : int, optional
        `skiprows` was removed in numpy 1.10. Please use `skip_header` instead.
    skip_header : int, optional
        The number of lines to skip at the beginning of the file.
    skip_footer : int, optional
        The number of lines to skip at the end of the file.
    converters : variable, optional
        The set of functions that convert the data of a column to a value.
        The converters can also be used to provide a default value
        for missing data: ``converters = {3: lambda s: float(s or 0)}``.
    missing : variable, optional
        `missing` was removed in numpy 1.10. Please use `missing_values`
        instead.
    missing_values : variable, optional
        The set of strings corresponding to missing data.
    filling_values : variable, optional
        The set of values to be used as default when the data are missing.
    usecols : sequence, optional
        Which columns to read, with 0 being the first.  For example,
        ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
    names : {None, True, str, sequence}, optional
        If `names` is True, the field names are read from the first line after
        the first `skip_header` lines.  This line can optionally be proceeded
        by a comment delimeter. If `names` is a sequence or a single-string of
        comma-separated names, the names will be used to define the field names
        in a structured dtype. If `names` is None, the names of the dtype
        fields will be used, if any.
    excludelist : sequence, optional
        A list of names to exclude. This list is appended to the default list
        ['return','file','print']. Excluded names are appended an underscore:
        for example, `file` would become `file_`.
    deletechars : str, optional
        A string combining invalid characters that must be deleted from the
        names.
    defaultfmt : str, optional
        A format used to define default field names, such as "f%i" or "f_%02i".
    autostrip : bool, optional
        Whether to automatically strip white spaces from the variables.
    replace_space : char, optional
        Character(s) used in replacement of white spaces in the variables
        names. By default, use a '_'.
    case_sensitive : {True, False, 'upper', 'lower'}, optional
        If True, field names are case sensitive.
        If False or 'upper', field names are converted to upper case.
        If 'lower', field names are converted to lower case.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = loadtxt(...)``
    usemask : bool, optional
        If True, return a masked array.
        If False, return a regular array.
    loose : bool, optional
        If True, do not raise errors for invalid values.
    invalid_raise : bool, optional
        If True, an exception is raised if an inconsistency is detected in the
        number of columns.
        If False, a warning is emitted and the offending lines are skipped.
    max_rows : int,  optional
        The maximum number of rows to read. Must not be used with skip_footer
        at the same time.  If given, the value must be at least 1. Default is
        to read the entire file.
    
        .. versionadded:: 1.10.0
    encoding : str, optional
        Encoding used to decode the inputfile. Does not apply when `fname` is
        a file object.  The special value 'bytes' enables backward compatibility
        workarounds that ensure that you receive byte arrays when possible
        and passes latin1 encoded strings to converters. Override this value to
        receive unicode arrays and pass strings as input to converters.  If set
        to None the system default is used. The default value is 'bytes'.
    
        .. versionadded:: 1.14.0
    
    Returns
    -------
    out : ndarray
        Data read from the text file. If `usemask` is True, this is a
        masked array.
    
    See Also
    --------
    numpy.loadtxt : equivalent function when no data is missing.
    
    Notes
    -----
    * When spaces are used as delimiters, or when no delimiter has been given
      as input, there should not be any missing data between two fields.
    * When the variables are named (either by a flexible dtype or with `names`,
      there must not be any header in the file (else a ValueError
      exception is raised).
    * Individual values are not stripped of spaces by default.
      When using a custom converter, make sure the function does remove spaces.
    
    References
    ----------
    .. [1] NumPy User Guide, section `I/O with NumPy
           <http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
    
    Examples
    ---------
    >>> from io import StringIO
    >>> import numpy as np
    
    Comma delimited file with mixed dtype
    
    >>> s = StringIO("1,1.3,abcde")
    >>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
    ... ('mystring','S5')], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Using dtype = None
    
    >>> s.seek(0) # needed for StringIO example only
    >>> data = np.genfromtxt(s, dtype=None,
    ... names = ['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    Specifying dtype and names
    
    >>> s.seek(0)
    >>> data = np.genfromtxt(s, dtype="i8,f8,S5",
    ... names=['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
    
    An example with fixed-width columns
    
    >>> s = StringIO("11.3abcde")
    >>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
    ...     delimiter=[1,3,5])
    >>> data
    array((1, 1.3, 'abcde'),
          dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', '|S5')])

None

2. numpy 的第二個函式 array

import numpy as np

vector = np.array([1,2,3])
# [1 2 3]
print(vector)
# <class 'numpy.ndarray'> numpy 中特殊的資料型別，可以理解為矩陣
print(type(vector))

matrix = np.array([[11,22,33],['cris','james','小哥哥'],[11.11,True,False,]])
'''
    array 方法裡面的元素必須為同一個型別，否則將會把資料往更加通用的資料型別上轉換（自動型別轉換），例如 int-->float,其他資料型別-->str
    [['11' '22' '33']
     ['cris' 'james' '小哥哥']
     ['11.11' 'True' 'False']]
''' 

print(matrix)
# <class 'numpy.ndarray'>
print(type(matrix))
print(help(np.array))

[1 2 3]
<class 'numpy.ndarray'>
[['11' '22' '33']
 ['cris' 'james' '小哥哥']
 ['11.11' 'True' 'False']]
<class 'numpy.ndarray'>
Help on built-in function array in module numpy.core.multiarray:

array(...)
    array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.  This argument can only be used to 'upcast' the array.  For
        downcasting, use the .astype(t) method.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array, the
        newly created array will be in C order (row major) unless 'F' is
        specified, in which case it will be in Fortran order (column major).
        If object is an array the following holds.
    
        ===== ========= ===================================================
        order  no copy                     copy=True
        ===== ========= ===================================================
        'K'   unchanged F & C order preserved, otherwise most similar order
        'A'   unchanged F order if input is F and not C, otherwise C order
        'C'   C order   C order
        'F'   F order   F order
        ===== ========= ===================================================
    
        When ``copy=False`` and a copy is made for other reasons, the result is
        the same as if ``copy=True``, with some exceptions for `A`, see the
        Notes section. The default order is 'K'.
    subok : bool, optional
        If True, then sub-classes will be passed-through, otherwise
        the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
        Specifies the minimum number of dimensions that the resulting
        array should have.  Ones will be pre-pended to the shape as
        needed to meet this requirement.
    
    Returns
    -------
    out : ndarray
        An array object satisfying the specified requirements.
    
    See Also
    --------
    empty, empty_like, zeros, zeros_like, ones, ones_like, full, full_like
    
    Notes
    -----
    When order is 'A' and `object` is an array in neither 'C' nor 'F' order,
    and a copy is forced by a change in dtype, then the order of the result is
    not necessarily 'C' as expected. This is likely a bug.
    
    Examples
    --------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])
    
    Upcasting:
    
    >>> np.array([1, 2, 3.0])
    array([ 1.,  2.,  3.])
    
    More than one dimension:
    
    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
           [3, 4]])
    
    Minimum dimensions 2:
    
    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])
    
    Type provided:
    
    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j,  2.+0.j,  3.+0.j])
    
    Data-type consisting of more than one element:
    
    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])
    
    Creating an array from sub-classes:
    
    >>> np.array(np.mat('1 2; 3 4'))
    array([[1, 2],
           [3, 4]])
    
    >>> np.array(np.mat('1 2; 3 4'), subok=True)
    matrix([[1, 2],
            [3, 4]])

None

3. numpy 的第三個函式 shape

import numpy as np

'''
    通過 shape 函式可以檢視變數的資料型別，例如下面程式碼的(3,) 表示有3個元素的列表；（2，3）表示兩行三列的矩陣
'''
vector = [1,2,3]
result = np.shape(element)
print(result)
# (3,)
matrix = np.shape([[1,2,3],['cris',False,True]])
print(matrix)
# (2, 3)

(3,)
(2, 3)

4. numpy 的 ndarray 資料型別的 dtype 屬性

import numpy as np

'''
    經過 numpy 的 array 函式後，資料就變成了 ndarray 資料型別（type函式），而 dtype 屬性可以檢視當前 ndarray 裡的每一個元素的資料型別
    （注意元素的自動資料型別轉換）
'''

vector = np.array([1,2,3,'jj'])
# ['1' '2' '3' 'jj']
print(vector)
# <class 'numpy.ndarray'>
print(type(vector))
# <U11
print(vector.dtype)

['1' '2' '3' 'jj']
<class 'numpy.ndarray'>
<U11

5. numpy 的 ndarray 資料型別如何取值

import numpy as np

data = np.genfromtxt('world_alcohol.txt', delimiter=',',dtype=str,skip_header=1)
print(data)
# 類似 Python 的序列資料型別，可以指定取出二維矩陣位置的元素，第一個引數為行，第二個引數為列
# 預設索引都是從 0 開始
data_01 = data[1,4]
data_02 = data[2,3]
print(data_01)
print(data_02)

[['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ['1985' 'Africa' "Cte d'Ivoire" 'Wine' '1.62']
 ...
 ['1987' 'Africa' 'Malawi' 'Other' '0.75']
 ['1989' 'Americas' 'Bahamas' 'Wine' '1.5']
 ['1985' 'Africa' 'Malawi' 'Spirits' '0.31']]
0.5
Wine

6. numpy 的 ndarray 切片

import numpy as np

# 其實和 Python 中序列切片一模一樣，前包後不包
data = np.array([1,2,3,4,5])
# [1 2 3]
print(data[0:3])

[1 2 3]

7. numpy 的二維陣列切片

import numpy as np

matrix = np.array([['james','USA',45],['cris','CHINA',33],['大帥','UK',11]])
# ['USA' 'CHINA' 'UK'] 可以對二維陣列取出所有行的制定列的值，：表示所有行
print(matrix[:,1])

'''
    可以通過切片指定取指定的那幾列的所有行的值
    [['james' 'USA']
     ['cris' 'CHINA']
     ['大帥' 'UK']]
 '''
print(matrix[:,0:2])

'''
    同理，可以取指定行的指定列的值，也就是說二維陣列變數可以通過切片的方式取出任意位置的值，切片的第一個引數是行，第二個引數代表列，並且這兩個引數
    都是可以使用切片形式的
    [['james' 'USA']
     ['cris' 'CHINA']]
'''
print(matrix[0:2,0:2])

['USA' 'CHINA' 'UK']
[['james' 'USA']
 ['cris' 'CHINA']
 ['大帥' 'UK']]
[['james' 'USA']
 ['cris' 'CHINA']]

Cris 的 Python 資料分析筆記 01：NumPy 基本知識

01. NumPy基本知識文章目錄 01. NumPy基本知識 1. numpy 的第一個函式 genfromtxt 2. numpy 的第二個函式 array 3. numpy 的第三個函式 shape

Cris 的 Python 資料分析筆記 04：NumPy 矩陣的複製，排序，拓展

04. 矩陣的複製，排序，拓展文章目錄 04. 矩陣的複製，排序，拓展 1. NumPy 的引用問題 2. 淺複製 3 深複製 4. 索引求最值 5. title 擴充

Cris 的 Python 資料分析筆記 03：NumPy 矩陣運算和常用函式（重點）

03. 矩陣運算和常用函式（重點）文章目錄 03. 矩陣運算和常用函式（重點） 1. numpy 矩陣判斷和計算 1.1 與運算 1.2 或運算 1.3 或運算作為矩陣索引賦值

Cris 的 Python 資料分析筆記 02：NumPy 資料定位

02. NumPy 資料定位文章目錄 02. NumPy 資料定位 1. numpy 快速判斷每個元素 2. numpy 判斷並返回對應的元素 1. numpy 快速判斷每個元素 i

Cris 的 Python 資料分析筆記 07：Pandas 中的 Series 資料結構

文章目錄 1. DataFrame 和 Series 關係 2. 新建 Series 資料結構（key 和 value） 3. Series 的排序 4. 區間求值 5. 根據 in

Cris 的 Python 資料分析筆記 06：Pandas 常見的資料預處理

文章目錄 1. Pandas 對指定列排序 2. 泰坦尼克經典入門案例 3. Pandas 常用資料預處理函式 3.1 缺失值處理 3.2 Pandas 預處理函式自動過濾缺失值

Cris 的 Python 資料分析筆記 05：Pandas 資料讀取，索引，切片，計算，列整合，過濾，最值

Pandas 資料讀取，索引，切片，計算，列整合，過濾，最值文章目錄 Pandas 資料讀取，索引，切片，計算，列整合，過濾，最值 1. read_csv 函式 2. DataFrame 資料結構的常用

Python資料分析基礎教程：NumPy學習指南（第2版） pdf 下載

罕見的NumPy中文入門教程，Python資料分析優選從基礎的知識講起，手把手帶你進入大資料探勘領域囊括大量具有啟發性與實用價值的實戰案例。內容簡介　　《圖靈程式設計叢書;Python資料分析基礎教程：NumPy學習指南（第2版）》是NumPy的入門教程，主要介紹NumPy以及相關

分享《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+英文PDF+原始碼

下載：https://pan.baidu.com/s/1YSD97Gd3gmmPmNkvuG0eew更多資料分享：http://blog.51cto.com/3215120 《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+高清英文PDF+原始碼高清中文版PDF，249頁，帶

分享《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+高清英文PDF+原始碼

下載：https://pan.baidu.com/s/1YSD97Gd3gmmPmNkvuG0eew 更多分享資料：https://www.cnblogs.com/javapythonstudy/ 《Python資料分析基礎教程：NumPy學習指南(第2版)》高清中文PDF+高清英文PDF+原始碼高清

Python資料分析基礎教程：NumPy學習指南第二章常用函式

目錄第二章常用函式 1 檔案讀寫示例建立對角矩陣: np.eye(2) 儲存為txt檔案：np.savetxt("eye.txt", i2) 2 CSV檔案讀取: loadtxt() 3 &nb

Python資料分析基礎教程：NumPy學習指南第一章 NumPy基礎

目錄第一章 NumPy基礎 1.1 NumPy陣列物件關鍵字：array、arange、ndarray、type、dtype、shape、下標 1.2 NumPy資料型別

Python資料分析基礎教程：Numpy學習指南

第二章 Numpy基礎2.6 改變陣列維度ravel()、flatten() 將多維陣列展平b.transpose() 矩陣轉置，等同於b.T，一維陣列不變reshape() 改變陣列維度2.8 組合陣列hstack((a, b)) 水平組合，等同於 concatenate(

《Python資料分析基礎教程：Numpy學習指南》- 速記

3.2 讀寫檔案 savetxt import numpy as np i2 = np.eye(2) np.savetxt("eye.txt", i2) 3.4 讀入CSV檔案 # AAPL,28-01-2011, ,344.17,344.

python-資料分析與展示（Numpy、matplotlib、pandas）---2

筆記內容整理自mooc上北京理工大學嵩天老師python系列課程資料分析與展示，本人小白一枚，如有不對，多加指正 1.python自帶的影象庫PIL 1.1常用API Image.open() Image.fromarray() im.save()

Python資料分析與機器學習-Numpy

import numpy world_alcohol = numpy.genfromtxt("world_alcohol.txt", delimiter=",", dtype=str) print(type(world_alcohol)) print(world_alco

Python資料分析與挖掘第一篇—基本介紹及環境搭建

一，資料分析與挖掘簡介　　所謂資料分析，是對已有的資料進行分析，提取一些有價值的資訊，比如平均數，標準差等。而資料探勘，是對大量的資訊進行分析和挖掘，得到一些未知的，有價值的資訊。如今日頭條類的新聞推送就是通過對使用者的資訊進行分析和挖掘，從而達到精準推送使用者感興趣的新聞。資料分析和資料探勘往往是密不可

docker筆記01：docker預備知識

1. 虛擬化 1.1 虛擬化概念虛擬化，是指通過虛擬化技術將一臺計算機虛擬為多臺邏輯計算機。在一臺計算機上同時執行多個邏輯計算機，每個邏輯計算機可執行不同的作業系統，並且應用程式都可以在相互獨立的空間內執行而互不影響，從而顯著提高計算機的工作效率。

Python資料分析及視覺化的基本環境

首先搭建基本環境，假設已經有Python執行環境。然後需要裝上一些通用的基本庫，如numpy, scipy用以數值計算，pandas用以資料分析，matplotlib/Bokeh/Seaborn用來資料視覺化。再按需裝上資料獲取的庫，如Tushare（http://pyth

Python 資料分析與展示筆記1 -- Numpy 基礎

Python 資料分析與展示筆記1 – NumPy 基礎 Python 資料分析與展示系列筆記是筆者學習、實踐Python 資料分析與展示的相關筆記課程連結： Python 資料分析與展示參考文件： NumPy 官方文件（英文） NumPy 官方文件（中文） PIL

Cris 的 Python 資料分析筆記 01：NumPy 基本知識

01. NumPy基本知識

文章目錄

1. numpy 的第一個函式 genfromtxt

2. numpy 的第二個函式 array

3. numpy 的第三個函式 shape

4. numpy 的 ndarray 資料型別的 dtype 屬性

5. numpy 的 ndarray 資料型別如何取值

6. numpy 的 ndarray 切片

7. numpy 的 二維陣列切片

相關推薦

7. numpy 的二維陣列切片