基於圖的影象分割(Effective graph-based image segmentation)python實現

阿新 • • 發佈：2018-11-19

基於圖的影象分割Effective graph-based image segmentation

前言
簡介
程式碼實現與解讀

圖的構建
影象分割

問題

前言

最近在學習區域卷積神經網路(RCNN)時，候選框產生使用了選擇搜尋(selective search)，選擇搜尋中第一步影象分割又使用了基於圖的影象分割(Effective graph-based image segmentation)。所以從底層開始研究基於圖的影象分割(Effective graph-based image segmentation)。

簡介

該影象分割演算法的詳細介紹可以參見以下部落格：
https://blog.csdn.net/surgewong/article/details/39008861

在這裡主要結合自己的理解作簡要總結和梳理：

首先，將影象(image)表達成圖論中的圖(graph)。具體說來就是，把影象中的每一個畫素點看成一個頂點vi ∈ V（node或vertex），每個畫素與相鄰8個畫素(8-領域)構成圖的一條邊ei ∈ E，這樣就構建好了一個圖 G = (V,E)。圖每條邊的權值是畫素與相鄰畫素的關係(灰度圖的話是灰度值差的絕對值，RGB影象為3個通道值差平方和開根號)，表達了相鄰畫素之間的相似度。
$\left| {grey[x1,y1] - grey[x2,y2]} \right|$
$\sqrt {{{(rgb[0][x1,y1] - rgb[0][x2,y2])}^2} + {{(rgb[1][x1,y1] - rgb[1][x2,y2])}^2} + {{(rgb[2][x1,y1] - rgb[2][x2,y2])}^2}}$
將每個節點（畫素點）看成單一的區域，然後進行合併。
(1) 對所有邊根據權值從小到大排序，權值越小，兩畫素的相似度越大。
(2) S[0]是一個原始分割，相當於每個頂點當做是一個分割區域。
(3) 從小到大遍歷所有邊，如果這條邊(vi,vj)的兩個頂點屬於不同的分割區域，並且權值不大於兩個區域的內部差(區域內左右邊最大權值)，那麼合併這兩個區域。更新合併區域的引數和內部差。
最後對所有區域中，畫素數都小於min_size的兩個相鄰區域，進行合併得到最後的分割。

程式碼實現與解讀

圖的構建

class Node:
    def __init__(self, parent, rank=0, size=1):
        self.parent = parent
        self.rank = rank
        self.size = size

    def __repr__(self):
        return '(parent=%s, rank=%s, size=%s)' % (self.parent, self.rank, self.size)

定義頂點(node)類，把每個畫素定義為節點(頂點)。頂點有三個性質：
(1) parent，該頂點對應分割區域的母頂點，可以認為的分割區域的編號或者索引。後面初始化時，把影象每個畫素當成一個分割區域，所以每個畫素的母頂點就是他們本身。
(2) rank，母頂點的優先順序（每個頂點初始化為0），用來兩個區域合併時，確定唯一的母頂點。
(3) size（每個頂點初始化為1），表示每個頂點作為母頂點時，所在分割區域的頂點數量。當它被其他區域合併，不再是母頂點時，它的size不再改變。

class Forest:
    def __init__(self, num_nodes):
        # 節點列表
        self.nodes = [Node(i) for i in range(num_nodes)]
        # 節點數
        self.num_sets = num_nodes

    def size_of(self, i):
        return self.nodes[i].size

    def find(self, n):
        temp = n
        while temp != self.nodes[temp].parent:
            temp = self.nodes[temp].parent
        self.nodes[n].parent = temp
        return temp

    # 節點a和節點b合併
    def merge(self, a, b):
        if self.nodes[a].rank > self.nodes[b].rank:
            self.nodes[b].parent = a
            self.nodes[a].size = self.nodes[a].size + self.nodes[b].size
        else:
            self.nodes[a].parent = b
            self.nodes[b].size = self.nodes[b].size + self.nodes[a].size
            if self.nodes[a].rank == self.nodes[b].rank:
                self.nodes[b].rank = self.nodes[b].rank + 1
        self.num_sets = self.num_sets - 1

    def print_nodes(self):
        for node in self.nodes:
            print(node)

(1) self.nodes初始化forest類的所有頂點列表，初始化時把影象每個畫素作為頂點，當成一個分割區域，每個畫素的母頂點就是他們本身。forest.num_sets表示該影象當前的分割區域數量。
(2) size_of()，獲取某個頂點的size，一般用來獲得某個母頂點的size，即為母頂點所在分割區域的頂點數量。
(3) find()，獲得該頂點所在區域的母頂點編號(索引)
(4) merge(self, a, b)，合併頂點a所在區域和頂點b所在區域，找到a,b的母頂點，根據其優先順序rank確定新的母頂點，更新合併後新區域的頂點數量size，新母頂點的優先順序rank，分割區域的數量num_sets。

# 建立邊，方向由(x,y)指向(x1,y1)，大小為梯度值
def create_edge(img, width, x, y, x1, y1, diff):
    # lamda:函式輸入是x和y，輸出是x * width + y
    vertex_id = lambda x, y: x * width + y
    w = diff(img, x, y, x1, y1)
    return (vertex_id(x, y), vertex_id(x1, y1), w)

建立邊(頂點1，頂點2，權值)

def build_graph(img, width, height, diff, neighborhood_8=False):
    graph = []
    for x in range(height):
        for y in range(width):
            if x > 0:
                graph.append(create_edge(img, width, x, y, x-1, y, diff))
            if y > 0:
                graph.append(create_edge(img, width, x, y, x, y-1, diff))
            if neighborhood_8:
                if x > 0 and y > 0:
                    graph.append(create_edge(img, width, x, y, x-1, y-1, diff))

                if x > 0 and y < width-1:
                    graph.append(create_edge(img, width, x, y, x-1, y+1, diff))
    return graph

建立圖，對每個頂點，←↑↖↗建立四條邊，達到8-鄰域的效果，自此完成圖的構建。

影象分割

def segment_graph(graph, num_nodes, const, min_size, threshold_func):
    weight = lambda edge: edge[2]
    forest = Forest(num_nodes)
    sorted_graph = sorted(graph, key=weight)
    threshold = [threshold_func(1, const)] * num_nodes

    for edge in sorted_graph:
        parent_a = forest.find(edge[0])
        parent_b = forest.find(edge[1])
        a_condition = weight(edge) <= threshold[parent_a]
        b_condition = weight(edge) <= threshold[parent_b]

        if parent_a != parent_b and a_condition and b_condition:
            # print(parent_a)
            # print(parent_b)
            # print(weight(edge))
            # print(threshold[parent_a])
            # print(threshold[parent_b])
            forest.merge(parent_a, parent_b)
            a = forest.find(parent_a)
            # print(a)
            threshold[a] = weight(edge) + threshold_func(forest.nodes[a].size, const)

    return remove_small_components(forest, sorted_graph, min_size)

(1) 首先初始化forest
(2) 對所有邊，根據其權值從小到大排序
(3) 初始化區域內部差列表
(4) 從小到大遍歷所有邊，如果頂點在兩個區域，且權值小於兩個頂點所在區域的內部差(threshold[])，則合併這兩個區域，找到合併後區域的新母頂點，更新該頂點對應的區域內部差（threshold[]）: ${\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}}$
$Int({C_i})$ 為頂點i所在區域的內部差； $\left| {{C_i}} \right|$ 為該區域的頂點數量；k為可調引數，k過大，導致更新時區域內部差過大，導致過多的區域進行合併，最終造成影象分割粗糙，反之，k過小，容易導致影象分割太精細。
因為遍歷時是從小到大遍歷，所以如果合併，這條邊的權值一定是新區域所有邊最大的權值，即為該新區域的內部差，因此 $Int({C_i}) = weight(edge)$ 。

def remove_small_components(forest, graph, min_size):
    for edge in graph:
        a = forest.find(edge[0])
        b = forest.find(edge[1])
        if a != b and (forest.size_of(a) < min_size or forest.size_of(b) < min_size):
            forest.merge(a, b)
    return  forest

初次分割後的影象，對於其中定點數均小於min_size的兩個相鄰區域，進行合併。

def graphbased_segmentation(img_path, neighbor, sigma, K, min_size):
    # neighbor = int(8)
    image_file = Image.open(img_path)
    # sigma = float(0.5)
    # K = float(800)
    # min_size = int(20)

    size = image_file.size
    print('Image info: ', image_file.format, size, image_file.mode)

    # 生成高斯濾波運算元
    grid = gaussian_grid(sigma)

    if image_file.mode == 'RGB':
        image_file.load()
        r, g, b = image_file.split()
        # 對r,g,b三個通道分別進行濾波(height,width),x行y列
        r = filter_image(r, grid)
        g = filter_image(g, grid)
        b = filter_image(b, grid)
        # print(r.shape)

        smooth = (r, g, b)
        diff = diff_rgb
    else:
        smooth = filter_image(image_file, grid)
        diff = diff_grey
    # print(smooth[0].shape)
    # 對影象中每個畫素建立圖。
    graph = build_graph(smooth, size[0], size[1], diff, neighbor == 8)
    forest = segment_graph(graph, size[0]*size[1], K, min_size, threshold)
    image = generate_image(forest, size[0], size[1])
    # image.save("output2.jpg")
    print('Number of components: %d' % forest.num_sets)

    return image

首先影象濾波，然後把影象中每個畫素作為一個頂點，建立圖，然後進行影象分割。

問題

對於內部差的更新公式： ${\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}}$ 還不是很理解。

基於圖的影象分割(Effective graph-based image segmentation)python實現