數據分析與展示——NumPy數據存取與函數
NumPy庫入門
NumPy數據存取和函數
數據的CSV文件存取
CSV文件
CSV(Comma-Separated Value,逗號分隔值)是一種常見的文件格式,用來存儲批量數據。
np.savetxt(frame,array,fmt=‘%.18e‘,delimiter=None)
- frame:文件、字符串或產生器,可以是.gz或.bz2的壓縮文件。
- array:存入文件的數組。
- fmt:寫入文件的格式,例如:%d %.2f %.18e。
- delimiter:分割字符串,默認是任何空格。
範例:savetxt()保存文件
In [1]: import numpy as np In [2]: a = np.arange(100).reshape(5,20) In [3]: np.savetxt(‘a.csv‘, a, fmt=‘%d‘, delimiter=‘,‘)
"a.csv"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59 60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [4]: np.savetxt(‘a1.csv‘, a, fmt=‘%.1f‘, delimiter=‘,‘)
"a1.csv"文件信息如下:
0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0 20.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0 40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,59.0 60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0 80.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0
np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
- frame:文件、字符串或產生器,可以是.gz或.bz2的壓縮文件。
- dtype:數據類型,可選。
- delimiter:分割字符串,默認是任何空格。
- unpack:如果True,讀入屬性將分別寫入不同變量。
範例:loadtxt()讀取文件
In [5]: b = np.loadtxt(‘a1.csv‘, delimiter=‘,‘) In [6]: b Out[6]: array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.], [ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.], [ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.], [ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.], [ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]]) In [7]: b = np.loadtxt(‘a1.csv‘, dtype=np.int, delimiter=‘,‘) In [8]: b Out[8]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
CSV文件的局限性
CSV只能有效存儲一維和二維數組。np.savetxt()、np.loadtxt()只能有效存取一維和二維數組。
多維數據的存取
a.tofile(frame, sep=‘‘, format=‘%s‘)
- frame:文件、字符串。
- sep:數據分割字符串,如果是空串,寫入文件為二進制。
- format:寫入數據的格式。
範例:tofile()存儲多維數據
In [9]: a = np.arange(100).reshape(5,10,2) In [10]: a.tofile(‘b.dat‘, sep=‘,‘, format=‘%d‘)
"b.dat"文件信息如下:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
In [11]: a.tofile(‘b1.dat‘, format=‘%d‘)
"b1.dat"文件信息(二進制文件)如下:
0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000b1.dat
np.fromfile(frame, dtype=float, count=-1, sep=‘‘)
- frame:文件、字符串。
- dtype:讀取的數據類型。
- count:讀取元素個數,-1表示讀入整個文件。
- sep:數據分割字符串,如果是空串,寫入文件為二進制。
範例:fromfile()函數讀取多維數據
In [9]: c = np.fromfile(‘b.dat‘, dtype=np.int, sep=‘,‘) In [10]: c Out[10]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) In [11]: c = np.fromfile(‘b.dat‘, dtype=np.int, sep=‘,‘).reshape(5,10,2) In [12]: c Out[12]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])Out[12]:
In [13]: c = np.fromfile(‘b1.dat‘,dtype=np.int).reshape(5,10,2)
In [14]: c
Out[14]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])Out[14]:
需要註意:
該方法需要讀取時知道存入文件時數組的維度和元素類型。a.tofile()和np.fromfile()需要配合使用。
可以通過元素據文件來存儲額外信息。也可以通過文件名來保存數組維度和元素類型(例:b1_int_5_10_2.dat)
Numpy的便捷文件存取
np.save(fname,array) 或 np.savez(fname,array)
- fname:文件名,以.npy為擴展名,壓縮擴展名為.npz
- array:數組變量
np.load(fname)
- fname:文件名,以.npy為擴展名,壓縮擴展名為.npz
範例:使用save()、load()
In [15]: np.save(‘a.npy‘,a)
"a.npy"文件信息如下:
934e 554d 5059 0100 4600 7b27 6465 7363 7227 3a20 273c 6934 272c 2027 666f 7274 7261 6e5f 6f72 6465 7227 3a20 4661 6c73 652c 2027 7368 6170 6527 3a20 2835 2c20 3130 2c20 3229 2c20 7d20 2020 2020 200a 0000 0000 0100 0000 0200 0000 0300 0000 0400 0000 0500 0000 0600 0000 0700 0000 0800 0000 0900 0000 0a00 0000 0b00 0000 0c00 0000 0d00 0000 0e00 0000 0f00 0000 1000 0000 1100 0000 1200 0000 1300 0000 1400 0000 1500 0000 1600 0000 1700 0000 1800 0000 1900 0000 1a00 0000 1b00 0000 1c00 0000 1d00 0000 1e00 0000 1f00 0000 2000 0000 2100 0000 2200 0000 2300 0000 2400 0000 2500 0000 2600 0000 2700 0000 2800 0000 2900 0000 2a00 0000 2b00 0000 2c00 0000 2d00 0000 2e00 0000 2f00 0000 3000 0000 3100 0000 3200 0000 3300 0000 3400 0000 3500 0000 3600 0000 3700 0000 3800 0000 3900 0000 3a00 0000 3b00 0000 3c00 0000 3d00 0000 3e00 0000 3f00 0000 4000 0000 4100 0000 4200 0000 4300 0000 4400 0000 4500 0000 4600 0000 4700 0000 4800 0000 4900 0000 4a00 0000 4b00 0000 4c00 0000 4d00 0000 4e00 0000 4f00 0000 5000 0000 5100 0000 5200 0000 5300 0000 5400 0000 5500 0000 5600 0000 5700 0000 5800 0000 5900 0000 5a00 0000 5b00 0000 5c00 0000 5d00 0000 5e00 0000 5f00 0000 6000 0000 6100 0000 6200 0000 6300 0000a.npy
通過讀取二進制文件發現np.load()方法除了將數據存放到.npy文件,還增加了額外的信息。
In [16]: b = np.load(‘a.npy‘) In [17]: b Out[17]:
array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11], [12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29], [30, 31], [32, 33], [34, 35], [36, 37], [38, 39]], [[40, 41], [42, 43], [44, 45], [46, 47], [48, 49], [50, 51], [52, 53], [54, 55], [56, 57], [58, 59]], [[60, 61], [62, 63], [64, 65], [66, 67], [68, 69], [70, 71], [72, 73], [74, 75], [76, 77], [78, 79]], [[80, 81], [82, 83], [84, 85], [86, 87], [88, 89], [90, 91], [92, 93], [94, 95], [96, 97], [98, 99]]])Out[17]:
Numpy的隨機數函數
Numpy的random子庫
基本格式:np.random.*
np.random.rand()、np.random.randn()、np.random.randint()
np.random的隨機數函數
函數 | 說明 |
---|---|
rand(d0,d1, ... ,dn) | 根據d0 - dn 創建隨機數組,浮點數,[0,1),均勻分布 |
randn(d0,d1, ... ,dn) | 根據d0 - dn創建隨機數組,標準正態分布 |
randint(low,[,high,shape]) | 根據shape創建隨機整數或整數數組,範圍是[low,high] |
seed(s) | 隨機數種子,s是給定的種子值 |
範例:函數測試
In [18]: a = np.random.rand(3,4,5) In [19]: a Out[19]: array([[[ 0.97845512, 0.90466706, 0.92576248, 0.77775142, 0.84334893], [ 0.39599821, 0.31917683, 0.7961439 , 0.01324569, 0.97660396], [ 0.5049603 , 0.80952265, 0.67359257, 0.89334316, 0.94496225], [ 0.04840473, 0.04665257, 0.20956817, 0.62255095, 0.36600489]], [[ 0.58059326, 0.28464266, 0.23596248, 0.16677631, 0.86467069], [ 0.14691968, 0.60863245, 0.71725038, 0.69206766, 0.18301705], [ 0.73197901, 0.99051723, 0.10489076, 0.33979432, 0.0354286 ], [ 0.73696453, 0.48268632, 0.99294233, 0.06285961, 0.93090147]], [[ 0.07853777, 0.827061 , 0.66325364, 0.52289669, 0.96894828], [ 0.41912388, 0.01883408, 0.80978245, 0.93082898, 0.98095581], [ 0.58614214, 0.55996867, 0.37734444, 0.79280598, 0.03626233], [ 0.233132 , 0.22514788, 0.32245147, 0.13739658, 0.18866422]]]) In [20]: sn = np.random.randn(3,4,5) In [21]: sn Out[21]: array([[[-0.54821321, 0.35733947, 0.74102173, -1.26679716, -0.75072289], [ 0.13182283, 2.32578442, -0.52208189, 2.5041796 , -0.96995644], [ 1.00171095, 0.97037733, 1.55386206, -0.94515087, 0.75707273], [-1.2481768 , 0.53095038, 0.92527818, -0.17261088, -0.13667463]], [[ 2.18760173, -0.93813162, 0.19032109, -1.59605908, -0.96802666], [ 0.30649913, 1.32375007, 0.72547761, -1.59253182, -0.72385311], [-2.22923637, -1.05462649, 1.82672301, 0.47343961, -0.9786459 ], [-0.36857965, 0.59003624, 1.80140997, 1.00965744, 1.9037593 ]], [[ 0.36273071, -0.0447364 , 1.27120325, 0.21076423, -0.40820945], [-1.22315321, -1.94670543, 0.17959233, -1.1020581 , 0.17423733], [-1.16368644, 0.00589158, 1.19701291, -0.4255035 , -0.7508364 ], [-1.61788168, 0.50386607, 0.15993032, 0.36881486, -0.41457221]]]) In [22]: b = np.random.randint(100,200,(3,4)) In [23]: b Out[23]: array([[163, 171, 163, 168], [166, 127, 160, 109], [135, 111, 196, 190]]) In [24]: np.random.seed(10) In [25]: np.random.randint(100,200,(3,4)) Out[25]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]]) In [26]: np.random.seed(10) In [27]: np.random.randint(100,200,(3,4)) Out[27]: array([[109, 115, 164, 128], [189, 193, 129, 108], [173, 100, 140, 136]])
np.random的隨機數函數
函數 | 說明 |
---|---|
shuffle(a) | 根據數組a的第1軸進行隨機排列,改變數組x |
permutation(a) | 根據數組a的第1軸產生一個新的亂序數組,不改變數組x |
choice(a,[,size,replace,p]) | 從一維數組a中以概率p抽取元素,形成size形狀新數組 replace表示是否可以重用元素,默認為False |
範例:函數測試
In [28]: a = np.random.randint(100,200,(3,4)) In [29]: a Out[29]: array([[116, 111, 154, 188], [162, 133, 172, 178], [149, 151, 154, 177]]) In [30]: np.random.shuffle(a) In [31]: a Out[31]: array([[116, 111, 154, 188], [149, 151, 154, 177], [162, 133, 172, 178]]) In [32]: np.random.shuffle(a) In [33]: a Out[33]: array([[162, 133, 172, 178], [116, 111, 154, 188], [149, 151, 154, 177]]) In [34]: a = np.random.randint(100,200,(3,4)) In [35]: a Out[35]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [36]: np.random.permutation(a) Out[36]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [37]: a Out[37]: array([[113, 192, 186, 130], [130, 189, 112, 165], [131, 157, 136, 127]]) In [38]: b = np.random.randint(100,200,(8,)) In [39]: b Out[39]: array([177, 122, 123, 194, 111, 128, 174, 188]) In [40]: np.random.choice(b,(3,2)) Out[40]: array([[122, 188], [123, 177], [174, 188]]) In [41]: np.random.choice(b,(3,2),replace=False) Out[41]: array([[123, 111], [128, 188], [174, 122]]) In [42]: np.random.choice(b,(3,2),p= b/np.sum(b)) Out[42]: array([[174, 122], [188, 194], [174, 123]])
函數 | 說明 |
---|---|
uniform(low,high,size) | 產生具有均勻分布的數組,low起始值,high結束值,size形狀 |
normal(loc,scale,size) | 產生具有正態分布的數組,loc均值,scale標準差,size形狀 |
poisson(lam,size) | 產生具有泊松分布的數組,lam隨機事件發生率,size形狀 |
In [43]: u = np.random.uniform(0,10,(3,4)) In [44]: u Out[44]: array([[ 8.8393648 , 3.25511638, 1.65015898, 3.92529244], [ 0.93460375, 8.21105658, 1.5115202 , 3.84114449], [ 9.44260712, 9.87625475, 4.56304547, 8.26122844]]) In [45]: n = np.random.normal(10,5,(3,4)) In [46]: n Out[46]: array([[ 12.8882903 , 2.6251256 , 10.39394227, 14.59206826], [ 7.5365132 , 10.48231186, 6.73620032, 8.89118781], [ 4.65856717, 3.86153973, 1.00713488, 6.5739633 ]])
NumPy的統計函數
Numpy直接提供的統計類函數
基本格式:np.*
np.std()、np.var()、np.average()
np.random的統計函數
函數 | 說明 |
---|---|
sum(a,axis=None) | 根據給定軸axis計算數組a相關元素之和,axis整數或元組 |
mean(a,axis=None) | 根據給定軸axis計算數組a相關元素的期望,axis整數或元組 |
average(a,axis=None,weights=None) | 根據給定軸axis計算數組a相關元素的加權平均值 |
std(a,axis=None) | 根據給定軸axis計算數組a相關元素的標準差 |
var(a,axis=None) | 根據給定軸axis計算數組a相關元素的方差 |
axis=None是統計函數的標配參數,表示對每個元素進行計算。
In [47]: a = np.arange(15).reshape(3,5) In [48]: a Out[48]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [49]: np.sum(a) Out[49]: 105 In [50]: np.mean(a,axis=1) # 2. = (0+5+10)/3 Out[50]: array([ 2., 7., 12.]) In [51]: np.mean(a,axis=0) Out[51]: array([ 5., 6., 7., 8., 9.]) # 7. = (2+7+12)/3 In [52]: np.average(a, axis=0, weights=[10,5,1]) # 加權平均: 4.1875 = (2*10+7*5+1*12)/(10+5+1) Out[52]: array([ 2.1875, 3.1875, 4.1875, 5.1875, 6.1875]) In [53]: np.std(a) Out[53]: 4.3204937989385739 In [54]: np.var(a) Out[54]: 18.666666666666668
函數 | 說明 |
---|---|
min(a) max(a) | 計算數組a中元素的最小值、最大值 |
argmin(a) argmax(a) | 計算數組a中元素最小值、最大值的降一維後下標 |
unravel_index(index,shape) | 根據shape將一維下標index轉換成多維下標 |
ptp(a) | 計算數組a中元素最大值與最小值的差 |
median(a) | 計算數組a中元素的中位數(中值) |
In [55]: b = np.arange(15,0,-1).reshape(3,5) In [56]: b Out[56]: array([[15, 14, 13, 12, 11], [10, 9, 8, 7, 6], [ 5, 4, 3, 2, 1]]) In [57]: np.max(b) Out[57]: 15 In [58]: np.argmax(b) # 扁平化後的下標 Out[58]: 0 In [59]: np.unravel_index(np.argmax(b), b.shape) # 重塑成多維下標 Out[59]: (0, 0) In [60]: np.ptp(b) Out[60]: 14 In [61]: np.median(b) Out[61]: 8.0
Numpy的梯度函數
np.random的梯度函數
函數 | 說明 |
np.gradient | 計算數組f中元素的梯度,當f為多維時,返回每個維度梯度 |
梯度:連續值之間的變化率,即斜率。 XY坐標軸連續X坐標對應的Y軸值:a,b,c,其中b的梯度是:(c-a)/2
In [62]: a = np.random.randint(0,20,(5)) In [63]: a Out[63]: array([14, 16, 10, 17, 0]) In [64]: np.gradient(a) # 存在兩側值:-2. = (10-14)/2 Out[64]: array([ 2. , -2. , 0.5, -5. , -17. ]) In [65]: b = np.random.randint(0,20,(5)) In [66]: b Out[66]: array([17, 9, 16, 9, 12]) In [67]: np.gradient(b) # 只有一側值:-8. = (9-17)/1 Out[67]: array([-8. , -0.5, 0. , -2. , 3. ]) In [68]: c = np.random.randint(0, 50, (3,5)) In [69]: c Out[69]: array([[30, 17, 17, 16, 0], [31, 37, 9, 0, 38], [22, 32, 2, 3, 31]]) In [70]: np.gradient(c) Out[70]: [array([[ 1. , 20. , -8. , -16. , 38. ], [ -4. , 7.5, -7.5, -6.5, 15.5], [ -9. , -5. , -7. , 3. , -7. ]]), array([[-13. , -6.5, -0.5, -8.5, -16. ], [ 6. , -11. , -18.5, 14.5, 38. ], [ 10. , -10. , -14.5, 14.5, 28. ]])]
數據分析與展示——NumPy數據存取與函數