影象的卷積和池化操作

阿新 • • 發佈：2018-12-27

離散域

卷積分為連續函式卷積及離散函式卷積，因為我們實際工作中多數情況下是數字化的場景，所以我們需要離散域的卷積操作。對於兩個離散函式f和g，卷積運算是將連續函式的積分操作轉換為等效求和：

卷積核

對於影象而言，它屬於二維資料，那麼它對應的就是2D函式，我們可以通過一個過濾器來過濾影象，這個過濾器即是卷積核。一般來說過濾器的每個維度可以包含2到5個元素，不同的過濾器有不同的處理效果。

對於影象來說，經過特定的卷積核處理後將得到與原來等效的影象，但卻能夠突出影象中的某些元素，比如線條和邊緣，此外它還能隱藏影象中的某些元素。

圖片卷積

我們定義Identity、Laplacian、Left Sobel、Upper Sobel、Blur`五個過濾器，都是3 x 3的卷積核，不同的卷積核將突出原始影象的不同特徵屬性。

Blur濾波器相當於計算3x3內鄰居點的平均值。Identity濾波器只是按原樣返回畫素值。Laplacian濾波器是用於突出邊緣的微分濾波器。Left Sobel濾波器用於檢測水平邊緣，而Upper Sobel濾波器用於檢測垂直邊緣。

kernels = OrderedDict({"Identity": [[0, 0, 0], [0., 1., 0.], [0., 0., 0.]],
                       "Laplacian": [[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]],
                       "Left Sobel" 
: [[1., 0., -1.], [2., 0., -2.], [1., 0., -1.]],
                       "Upper Sobel": [[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]],
                       "Blur": [[1. / 16., 1. / 8., 1. / 16.], [1. / 8., 1. / 4., 1. / 8.],
                                [1. / 16., 1. / 8., 1. / 16.]]})


def apply3x3kernel(image, kernel):
    newimage = np.array(image)
    for 
 m in range(1, image.shape[0] - 2):
        for n in range(1, image.shape[1] - 2):
            newelement = 0
            for i in range(0, 3):
                for j in range(0, 3):
                    newelement = newelement + image[m - 1 + i][n - 1 + j] * kernel[i][j]
            newimage[m][n] = newelement
    return newimage


arr = imageio.imread("data/dog.jpg")[:, :, 0].astype(np.float)

plt.figure(1)
j = 0
positions = [321, 322, 323, 324, 325, 326]
for key, value in kernels.items():
    plt.subplot(positions[j])
    out = apply3x3kernel(arr, value)
    plt.imshow(out, cmap=plt.get_cmap('binary_r'))
    j = j + 1
plt.show()
複製程式碼

在生成的圖表中，第一個子影象是未改變的影象，因為這裡我們使用了Identity濾波器，接著的分別是Laplacian邊緣檢測、水平邊緣檢測和垂直邊緣檢測，最後是進行模糊運算。

圖片池化

池化操作主要是通過一個核來達到減少引數的效果，比較有名的池化操作是最大值（最大值池化）、平均值（平均值池化）和最小值（最小值池化）。它能減少前面網路層進來的資訊量，從而降低複雜度，同時保留最重要的資訊元素。換句話說，它們構建了資訊的緊湊表示。

def apply2x2pooling(image, stride):
    newimage = np.zeros((int(image.shape[0] / 2), int(image.shape[1] / 2)), np.float32)
    for m in range(1, image.shape[0] - 2, 2):
        for n in range(1, image.shape[1] - 2, 2):
            newimage[int(m / 2), int(n / 2)] = np.max(image[m:m + 2, n:n + 2])
    return (newimage)


arr = imageio.imread("data/dog.jpg")[:, :, 0].astype(np.float)
plt.figure(1)
plt.subplot(121)
plt.imshow(arr, cmap=plt.get_cmap('binary_r'))
out = apply2x2pooling(arr, 1)
plt.subplot(122)
plt.imshow(out, cmap=plt.get_cmap('binary_r'))
plt.show()
複製程式碼

可以看到影象池化前後的一些差異，最終生成的影象的解析度較低，總體的畫素數量大約為原來的四分之一。