numpy教程：排序、搜尋和計數

阿新 • • 發佈：2019-01-14

numpy排序、搜尋和計數函式和方法。（重新整合過的）

排序Sorting

`sort`(a[, axis, kind, order])	Return a sorted copy of an array.
`lexsort`(keys[, axis])	Perform an indirect sort using a sequence of keys.
`argsort`(a[, axis, kind, order])	Returns the indices that would sort an array.
`ndarray.sort`([axis, kind, order])	Sort an array, in-place.
`msort`(a)	Return a copy of an array sorted along the first axis.
Sort a complex array using the real part first, then the imaginary part.
`partition`(a, kth[, axis, kind, order])	Return a partitioned copy of an array.
`argpartition`(a, kth[, axis, kind, order])	Perform an indirect partition along the given axis using the algorithm specified by the kind keyword.

numpy多維陣列排序

python列表排序

list.sort()一般用法：list.sort(axis = None, key=lambdax:x[1],reverse = True)

或者使用內建函式sorted():

sorted(data.tolist(), key=lambda x: x[split])

用ndarray.sort內建函式排序

陣列的sort()方法用於對陣列進行排序，它將改變陣列的內容。

ndarray.sort()沒有key引數，那怎麼編寫比較函式comparator?

示例

list1 = [[1, 3, 2], [3, 5, 4]]
array = numpy.array(list1)
array.sort(axis 
=1)
print(array)

[[1 2 3]
 [3 4 5]]

sort內建函式是就地排序，會改變原有陣列，不同於python中自帶的sorted函式和numpy.sort通用函式，引數也不一樣。

sort內建函式返回值為None,所以不能有這樣的語法：array.sort(axis=1)[：5]，這相當於是對None型別進行切片操作

矩陣按其第一列元素大小順序來對整個矩陣進行行排序

mat1=mat1[mat1[:,0].argsort()]

用numpy.sort通用函式排序

np.sort()函式則返回一個新陣列，不改變原始陣列(類似於python中自帶的sorted函式，但numpy中沒有sorted函式，引數也不一樣)。

它們的axis引數預設值都為-1,即沿著陣列的最後一個軸進行排序。 np.sort()函式的axis引數可以設定為None,此時它將得到平坦化之後進行排序的新陣列。

>>> np.sort(a) #對每行的資料進行排序
array([[1, 3, 6, 7, 9],
[1, 2, 3,5, 8],
[0, 4,8, 9, 9]，
[0, 1,5, 7, 9]])
>>> np.sort(a, axis=0) #對每列的資料進行排序 array([[5,1,1, 4, 0],
[7, 1, 3, 6, 0]，
[9, 5, 9, 7, 2],
[9, 8, 9'8, 3]])

升序排序的實現:

list1 = [[1,3,2], [3,5,4]]
array = numpy.array(list1)

array = sort(array, axis=1)   #對第1維升序排序

#array = sort(array, axis=0)   #對第0維
print(array)
[[1 2 3]

[3 4 5]]

降序排序的實現:

#array = -sort(-array, axis=1)   #降序

[[3 2 1]
 [5 4 3]]

lexsort: 使用一列鍵來執行間接排序

這樣就可以對兩個列表一同進行排序。

示例：Sort two columns of numbers:

>>> a = [1,5,1,4,3,4,4] # First column
>>> b = [9,4,0,4,0,2,1] # Second column
>>> ind = np.lexsort((b,a)) # 先對a排序，再對b排序
>>> print(ind)
[2 0 4 6 5 3 1]

>>> [(a[i],b[i]) for i in ind]
[(1, 0), (1, 9), (3, 0), (4, 1), (4, 2), (4, 4), (5, 4)]

用numpy.argsort通用函式排序

argsort函式用法(numpy-ref-1.8.1P1240)

argsort()返冋陣列的排序下標，axis引數的預設值為-1。

argsort(a, axis=-1, kind='quicksort', order=None)
Returns the indices that would sort an array.

argsort函式返回的是陣列值從小到大的索引值

Examples

--------
One dimensional array:一維陣列
>>> x = np.array([3, 1, 2])
>>> np.argsort(x)
array([1, 2, 0])

Two-dimensional array:二維陣列
>>> x = np.array([[0, 3], [2, 2]])
>>> x
array([[0, 3],
[2, 2]])
>>> np.argsort(x, axis=0) #按列排序
array([[0, 1],
[1, 0]])
>>> np.argsort(x, axis=1) #按行排序
array([[0, 1],
[0, 1]])

>>> x = np.array([3, 1, 2])
>>> np.argsort(x) #按升序排列
array([1, 2, 0])

>>> np.argsort(-x) #按降序排列
array([0, 2, 1])

Note: 當然也可以升序排序，在處理的時候處理成降序也行，如np.argsort(index[c])[:-MAX_K:-1]

另一種方式實現按降序排序（不能用於多維陣列）
>>> a
array([1, 2, 3])
>>> a[::-1]
array([3, 2, 1])

>>> x[np.argsort(x)] #通過索引值排序後的陣列
array([1, 2, 3])
>>> x[np.argsort(-x)] #不能用於二維存取！！
array([3, 2, 1])

多維陣列的降序排序

list1 = [[1, 3, 2], [3, 1, 4]]
a = numpy.array(list1)
a = numpy.array([a[line_id,i] for line_id, i in enumerate(argsort(-a, axis=1))])
print(a)

[[3 2 1]
 [4 3 1]]

list1 = [[1, 3, 2], [3, 1, 4]]
a = numpy.array(list1)
sindx = argsort(-a, axis=1)
indx = numpy.meshgrid(*[numpy.arange(x) for x in a.shape], sparse=True,
                   indexing='ij')
indx[1] = sindx
a = a[indx]
print(a)

[[3 2 1]
 [4 3 1]]

list1 = [[1, 3, 2], [3, 1, 4]]
a = numpy.array(list1)
a = -sort(-a, axis=1)
print(a)

[[3 2 1]
 [4 3 1]]

搜尋Searching

一般numpy陣列搜尋到某些值後都要進行另外一些操作（如賦值、替換）。

比如替換numpy陣列中值為0的元素為1， a[a == 0] = 1

更復雜的篩選可以通過np.minimum(arr, 255)或者result = np.clip(arr, 0, 255)實現。

`argmax`(a[, axis, out])	Returns the indices of the maximum values along an axis.
Return the indices of the maximum values in the specified axis ignoring NaNs.
`argmin`(a[, axis, out])	Returns the indices of the minimum values along an axis.
Return the indices of the minimum values in the specified axis ignoring NaNs.
Find the indices of array elements that are non-zero, grouped by element.
Return the indices of the elements that are non-zero.
Return indices that are non-zero in the flattened version of a.
`where`(condition, [x, y])	Return elements, either from x or y, depending on condition.
`searchsorted`(a, v[, side, sorter])	Find indices where elements should be inserted to maintain order.
`extract`(condition, arr)	Return the elements of an array that satisfy some condition.

最值

用min()和max()可以計算陣列的最大值和最小值，而ptp()計算最大值和最小值之間的差。

它們都有axis和out兩個引數。

用argmax()和argmin()可以求最大值和最小值的下標。如果不指定axis引數，就返回平坦化之後的陣列下標。

>>> np.argmax(a) #找到陣列a中最大值的下標，有多個最值時得到第一個最值的下標
2
>>> a.ravel()[2] #求平坦化之後的陣列中的第二個元素
9
可以通過unravel_index()將一維下標轉換為多維陣列中的下標，它的第一個引數為一維下標值，第二個引數是多維陣列的形狀。
>>> idx = np.unravel_index(2, a.shape)
>>> idx
(0, 2)
>>> a[idx]
9

當使用axis引數時，可以沿著指定的軸計算最大值的下標。
例如下面的結果表示，在陣列 a中，第0行中最大值的下標為2,第1行中最大值的下標為3:
>>> idx = np.argmax(a, axis=1)
>>> idx
array([2, 3, 0, 0])
使用idx選擇出每行的最大值:
>>> a[xrange(a.shape[0]),idx]
array([9, 8, 9, 9])

nonzero(a)

返回非0元素的下標位置

其實不就是a != 0嗎？

元素查詢where

查詢某個元素的位置

given a Numpy array, array, and a value, item, to search for.

itemindex = numpy.where(array==item)

The result is a tuple with first all the row indices, then all the column indices.

只查詢一維array的第一個位置

array.tolist().index(1)

itemindex = np.argwhere(array==item)[0]; array[tuple(itemindex)]

Note:np.argwhere(a) is the same as np.transpose(np.nonzero(a)).The output of argwhere is not suitable for indexing arrays.For this purpose use where(a) instead.index = numpy.nonzero(first_array == item)[0][0]

分段函式

{像python中的x = y if condition else z 或者 C語言裡面的 condition？a：b，判斷條件是否正確，正確則執行a，否則b}

where函式

where(condition, [x, y])

例1：計算兩個矩陣的差，然後將殘差進行平方

def f_norm_1(data, estimate):
   residule = 0
   for row_index in range(data.shape[0]):
     for column_index in range(data.shape[1]):
       if data[row_index][column_index] != 0:
         residule += (data[row_index][column_index] - estimate[row_index][column_index]) ** 2
   return residule

def f_norm_2(data, estimate)

return sum(where(data != 0, (data-estimate) **2, 0))

因為我需要的是考慮矩陣稀疏性，所以不能用內建的norm，函式1是用普通的python寫的，不太複雜，對於規模10*10的矩陣，計算200次耗時0.15s，函式2使用了where函式和sum函式，這兩個函式都是為向量計算優化過的，不僅簡潔，而且耗時僅0.03s, 快了有五倍，不僅如此，有人將NumPy和matlab做過比較，NumPy稍快一些，這已經是很讓人興奮的結果。

例2：

>>> x=np.arange(10)
>>> np.where(x<5,9-x,x)
array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9]) 表示的是產生一個數組0～9，然後得到另一個數組，這個陣列滿足：當x<5的時候它的值變為9-x,否則保持為x)。

select函式

out = select(condlist, choicelist, default=0)
其中，condlist是一個長度為N的布林陣列列表，choicelist是一個長度為N的儲存候選值的陣列列表，所有陣列的長度都為M.如果列表元素不是陣列而是單個數值，那麼它相當於元素值都相同且長度為M的陣列。對於從0到M-1的陣列下標i,從布林陣列列表中找出滿足條件“condlist[j][i]=True”的 j的最小值，則“out[i]=choicelist[j][i]”，其中out是select()的返回陣列。choicelist的最後一個元素為True,表示前面所有條件都不滿足時，將使用choicelist的最後一個數組中的值。也可以用default引數指定條件都不滿足時的候選值陣列。

>>> np.select([x<2,x>6,True],[7-x,x,2*x])
array([ 7, 6, 4, 6, 8, 10, 12, 7, 8, 9]) 表示的是當x滿足第一個條件時，執行7-x,當x滿足第二個條件事執行x,當二者都不滿足的時候執行2*x。

piecewise()

piecewise(x, condlist, funclist)

前面兩個函式都比較耗記憶體，所以引入piecewise()，因為它只有在滿足條件的時候才計算。也就是where()和select()的所有引數都需要在呼叫它們之前完成計算，因此下面的例項中NumPy會計算下面4個數組：x>=c, x<c0, x/c0*hc, (c-x)/(c-c0)*hc。在計算時還會產生許多儲存中間結果的陣列，因此如果輸入的陣列x很大，將會發生大量的記憶體分配和釋放。為了解決這個問題,可以使用piecewise()專門用於計算分段函式。

引數x是一個儲存自變數值的陣列.condlist是一個長度為M的布林陣列列表，其中的每個布林陣列的長度都和陣列x相同。funclist是一個長度為M或M+1的函式列表，這些函式的輸入和輸出都是陣列。它們計算分段函式中的每個片段。如果不是函式而是數值，就相當於返回此數值的函式。每個函式與condlist中下標相同的布林陣列對應，如果funclist的長度為M+l, 那麼最後一個函式對應於所有條件都為False時。

np.piecewise(x, [x < 0, x >= 0], [-1, 1])

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.piecewise(x, [x<2,x>6], [lambda x:7-x,lambda x:x,lambda x:2*x])
array([7, 6, 0, 2, 4, 6, 8, 0, 1, 2])

Note: piecewise中funclist如果不是數值而是函式時要使用lambda表示式，不能使用簡單表示式7-x，否則會出錯，如ValueError: NumPy boolean array indexing assignment cannot assign 10 input values to the 2 output values where the mask is true。

例項

用一個分段函式描述三角波，三角波的樣子如下

def triangle_wave(x, c, c0, hc):
x = x - x.astype(np.int) #三角波的週期為1，因此只取x座標的小數部分進行計算
return np.where(x>=c,0,np.where(x<c0, x/c0*hc, (c-x)/(c-c0)*hc))

由於三角波形分為三段，因此需要兩個巢狀的where()進行計算.由於所有的運算和迴圈都在C語言級別完成，因此它的計算效率比frompyfunc()高。
隨著分段函式的分段數量的增加，需要巢狀更多層where(),但這樣做不便於程式的編寫和閱讀。可以用select()解決這個問題。

def triangle._wave2(x, c, c0, hc):
x = x - x.astype(np.int)
return np.select([x>=c, x<c0, True], [0, x/c0*hc, (c-x)/(c-c0)*hc])

也可以使用default:return np.select([x>=c, x<c0], [0, x/c0*hc], default=(c-x)/(c-c0)*hc)

使用piecewise()計算三角波形

def triangle_wave3(x, c, c0, hc):
x = x - x.astype(np.int)
return np.piecewise(x,
[x>=c, x<c0],
[0, # x>=c
lambda x: x/c0*hc, # x<c0
lambda x: (c-x)/(c-c0)*hc]) # else

使用piecewise()的好處在於它只計算需要計算的值.因此在上面的例子中，表示式 “x/c0*hc”和“(c-x)/(c-c0)*hc”只對輸入陣列x中滿足條件的部分進行計算。

呼叫

x = np.linspace(0, 2, 1000)
y4= triangle_wave3(x,0.6, 0.4, 1.0)

計數Counting

Counts the number of non-zero values in the array a.

統計numpy陣列中非0元素的個數。

0-1array統計1個數

統計0-1array有多少個1，兩種方式

np.count_nonzero(fs_predict_array)
fs_predict_array.sum()

count_nonzero速度更快，大概1.6倍快。

統計多維陣列所有元素出現次數

使用pandas頂級函式pd.value_counts，value_counts是一個頂級pandas方法，可用於任何陣列或序列：
>>> pd.value_counts(obj.values, sort=False)