本文主要講如何通過HOG特徵和SVM分類器實現部分交通標誌的檢測。由於能力有限，本文的檢測思路很簡單，主要是用來自己練習程式設計用，也順便釋出出來供需要的人蔘考。本專案完整的程式碼可以在我的github上下載：traffic-sign-detection。部落格或程式碼中遇到的任何問題，歡迎指出，希望能相互學習。廢話不多說了，下面就來一步步介紹我的檢測過程。**

資料集

資料集都是我的一個學妹幫忙採集的。在此表示感謝。本文一共選用了6種交通標誌，分別為：

資料預處理

一共拍了1465張照片，由於是用手機在路上拍的，影象畫素過大且大小不一（有的是橫著拍的，有的數豎著拍的），影響檢測效率。因此，我先將所有的圖片進行了預處理，具體處理步驟為：
（1）以圖片寬和高較小的值為裁剪的邊長S，從原圖中裁剪出S×S的正方形中心區域；
（2）將裁剪出的區域resize為640×640；
處理的主要函式如下：

def center_crop(img_array, crop_size=-1, resize=-1, write_path=None):
    """ crop and resize a square image from the centeral area.
    Args:
        img_array: image array
        crop_size: crop_size (default: -1, min(height, width)).
        resize: resized size (default: -1, keep cropped size)
        write_path: write path of the image (default: None, do not write to the disk).
    Return:
        img_crop: copped and resized image.
    """ 

    rows = img_array.shape[0]
    cols = img_array.shape[1]

    if crop_size==-1 or crop_size>max(rows,cols):
        crop_size = min(rows, cols)
    row_s = max(int((rows-crop_size)/2), 0)
    row_e = min(row_s+crop_size, rows) 
    col_s = max(int((cols-crop_size)/2), 0)
    col_e = min(col_s+crop_size, cols)

    img_crop = img_array[row_s:row_e,col_s:col_e,]

    if 
 resize>0:
        img_crop = cv2.resize(img_crop, (resize, resize))

    if write_path is not None:
        cv2.imwrite(write_path, img_crop)
    return img_crop

def crop_img_dir(img_dir,  save_dir, crop_method = "center", rename_pre=-1):
    """ crop and save square images from original images saved in img_dir.
    Args:
        img_dir: image directory.
        save_dir: save directory.
        crop_method: crop method (default: "center").
        rename_pre: prename of all images (default: -1, use primary image name).
    Return: none
    """
    img_names = os.listdir(img_dir)
    img_names = [img_name for img_name in img_names if img_name.split(".")[-1]=="jpg"]
    index = 0
    for img_name in img_names:
        img = cv2.imread(os.path.join(img_dir, img_name))

        rename = img_name if rename_pre==-1 else rename_pre+str(index)+".jpg"
        img_out_path = os.path.join(save_dir, rename)

        if crop_method == "center":
            img_crop = center_crop(img, resize=640, write_path=img_out_path)

        if index%100 == 0:
            print "total images number = ", len(img_names), "current image number = ", index
        index += 1

資料標註

標註資訊採用和PASCAL VOC資料集一樣的方式，對於正樣本，直接使用labelImg工具進行標註，這裡給出我用的一個版本的連結：https://pan.baidu.com/s/1Q0cqJI9Dnvxkj7159Be4Sw。對於負樣本，可以使用python中的xml模組自己寫xml標註檔案，主要函式如下：

from xml.dom.minidom import Document
import os
import cv2

def write_img_to_xml(imgfile, xmlfile):
    """
    write xml file.
    Args:
        imgfile: image file.
        xmlfile: output xml file.
    """
    img = cv2.imread(imgfile)
    img_folder, img_name = os.path.split(imgfile)
    img_height, img_width, img_depth = img.shape
    doc = Document()

    annotation = doc.createElement("annotation")
    doc.appendChild(annotation)

    folder = doc.createElement("folder")
    folder.appendChild(doc.createTextNode(img_folder))
    annotation.appendChild(folder)

    filename = doc.createElement("filename")
    filename.appendChild(doc.createTextNode(img_name))
    annotation.appendChild(filename)

    size = doc.createElement("size")
    annotation.appendChild(size)

    width = doc.createElement("width")
    width.appendChild(doc.createTextNode(str(img_width)))
    size.appendChild(width)

    height = doc.createElement("height")
    height.appendChild(doc.createTextNode(str(img_height)))
    size.appendChild(height)

    depth = doc.createElement("depth")
    depth.appendChild(doc.createTextNode(str(img_depth)))
    size.appendChild(depth)

    with open(xmlfile, "w") as f:
        doc.writexml(f, indent="\t", addindent="\t", newl="\n", encoding="utf-8")

def write_imgs_to_xmls(imgdir, xmldir):
    img_names = os.listdir(imgdir)
    for img_name in img_names:
        img_file = os.path.join(imgdir,img_name)
        xml_file = os.path.join(xmldir, img_name.split(".")[0]+".xml")
        print img_name, "has been written to xml file in ", xml_file 
        write_img_to_xml(img_file, xml_file)

資料集劃分

這裡我們將1465張圖片按照7：2：1的比例隨機劃分為訓練集、測試集和驗證集。為了方便執行，我們先建立一個名為images的資料夾，下面有JPEGImages和Annotations分別存放了所有的圖片和對應的標註檔案。同樣，最後附上劃分資料集的主要函式：

import os
import shutil
import random

def _copy_file(src_file, dst_file):
    """copy file.
    """
    if not os.path.isfile(src_file):
        print"%s not exist!" %(src_file)
    else:
        fpath, fname = os.path.split(dst_file)
        if not os.path.exists(fpath):
            os.makedirs(fpath)
        shutil.copyfile(src_file, dst_file)

def split_data(data_dir, train_dir, test_dir, valid_dir, ratio=[0.7, 0.2, 0.1], shuffle=True):
    """ split data to train data, test data, valid data.
    Args:
        data_dir -- data dir to to be splitted.
        train_dir, test_dir, valid_dir -- splitted dir.
        ratio -- [train_ratio, test_ratio, valid_ratio].
        shuffle -- shuffle or not.
    """
    all_img_dir = os.path.join(data_dir, "JPEGImages/")
    all_xml_dir = os.path.join(data_dir, "Annotations/")
    train_img_dir = os.path.join(train_dir, "JPEGImages/")
    train_xml_dir = os.path.join(train_dir, "Annotations/")
    test_img_dir = os.path.join(test_dir, "JPEGImages/")
    test_xml_dir = os.path.join(test_dir, "Annotations/")
    valid_img_dir = os.path.join(valid_dir, "JPEGImages/")
    valid_xml_dir = os.path.join(valid_dir, "Annotations/")

    all_imgs_name = os.listdir(all_img_dir)
    img_num = len(all_imgs_name)
    train_num = int(1.0*img_num*ratio[0]/sum(ratio))
    test_num = int(1.0*img_num*ratio[1]/sum(ratio))
    valid_num = img_num-train_num-test_num

    if shuffle:
        random.shuffle(all_imgs_name)
    train_imgs_name = all_imgs_name[:train_num]
    test_imgs_name = all_imgs_name[train_num:train_num+test_num]
    valid_imgs_name = all_imgs_name[-valid_num:]

    for img_name in train_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(train_img_dir, img_name)
        xml_dstfile = os.path.join(train_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

    for img_name in test_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(test_img_dir, img_name)
        xml_dstfile = os.path.join(test_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

    for img_name in valid_imgs_name:
        img_srcfile = os.path.join(all_img_dir, img_name)
        xml_srcfile = os.path.join(all_xml_dir, img_name.split(".")[0]+".xml")
        xml_name = img_name.split(".")[0] + ".xml"

        img_dstfile = os.path.join(valid_img_dir, img_name)
        xml_dstfile = os.path.join(valid_xml_dir, xml_name)
        _copy_file(img_srcfile, img_dstfile)
        _copy_file(xml_srcfile, xml_dstfile)

程式碼執行的結果是在指定的資料夾下分別建立訓練集、測試集和驗證集資料夾，並且每個資料夾下包含了JPEGImages和Annotations兩個子資料夾來存放結果。

到這裡用於目標檢測的資料集已經準備好了。下面我們介紹整個檢測模型的框架。

檢測框架

本文用的檢測思路非常直觀，總的來講分為候選區域提取、HOG特徵提取和SVM分類。

候選區域提取

理論上可以通過設定不同的滑動視窗對整張影象進行遍歷，但是這樣做不僅計算太大，而且視窗的大小也不好把握。考慮到我們要檢測的交通標誌都有比較規則的幾何形狀和顏色資訊，我們可以通過檢測形狀（平行四邊形、橢圓）和顏色（紅色、藍色等）來實現初步的預處理以減少計算量，提高檢測效率。這裡我們以僅顏色資訊為例介紹。

由於需要檢測的6類標誌主要是紅色和藍色（或者紅藍結合），環境中的不同光照強度可能會使顏色變化較大因此給定一張影象，先在HSV空間中通過顏色閾值分割選出藍色和紅色對應的區域得到二值化影象。然後對二值化影象進行凸包檢測（可通過OpenCV實現），下圖給出了一個示例：

可以看出，經過二值化處理後，影象中的3個標誌（其中2個標誌是我們需要檢測識別的）的輪廓資訊都被保留下來了。但是存在依然存在一些問題：（1）背景噪聲較多，這會導致檢測更多的凸包，從而影響檢測速度和精度；（2）三個標誌離得很近，可能會導致只檢測出一個凸包。我之前考慮過用腐蝕膨脹來濾除一部分的噪聲，但在實驗的時候發現這會導致更多的漏檢。這是因為在腐蝕膨脹的時候部分標誌的輪廓資訊很有可能會被破壞（尤其是禁止鳴笛標誌），導致在凸包檢測的階段被遺漏。所以在最終測試的時候並沒有使用腐蝕膨脹操作。下面給出閾值化處理和凸包檢測的函式：

def preprocess_img(imgBGR, erode_dilate=True):
    """preprocess the image for contour detection.
    Args:
        imgBGR: source image.
        erode_dilate: erode and dilate or not.
    Return:
        img_bin: a binary image (blue and red).

    """
    rows, cols, _ = imgBGR.shape
    imgHSV = cv2.cvtColor(imgBGR, cv2.COLOR_BGR2HSV)

    Bmin = np.array([100, 43, 46])
    Bmax = np.array([124, 255, 255])
    img_Bbin = cv2.inRange(imgHSV,Bmin, Bmax)

    Rmin1 = np.array([0, 43, 46])
    Rmax1 = np.array([10, 255, 255])
    img_Rbin1 = cv2.inRange(imgHSV,Rmin1, Rmax1)

    Rmin2 = np.array([156, 43, 46])
    Rmax2 = np.array([180, 255, 255])
    img_Rbin2 = cv2.inRange(imgHSV,Rmin2, Rmax2)
    img_Rbin = np.maximum(img_Rbin1, img_Rbin2)
    img_bin = np.maximum(img_Bbin, img_Rbin)

    if erode_dilate is True:
        kernelErosion = np.ones((3,3), np.uint8)
        kernelDilation = np.ones((3,3), np.uint8) 
        img_bin = cv2.erode(img_bin, kernelErosion, iterations=2)
        img_bin = cv2.dilate(img_bin, kernelDilation, iterations=2)

    return img_bin

def contour_detect(img_bin, min_area=0, max_area=-1, wh_ratio=2.0):
    """detect contours in a binary image.
    Args:
        img_bin: a binary image.
        min_area: the minimum area of the contours detected.
            (default: 0)
        max_area: the maximum area of the contours detected.
            (default: -1, no maximum area limitation)
        wh_ratio: the ration between the large edge and short edge.
            (default: 2.0)
    Return:
        rects: a list of rects enclosing the contours. if no contour is detected, rects=[]
    """
    rects = []
    _, contours, _ = cv2.findContours(img_bin.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    if len(contours) == 0:
        return rects

    max_area = img_bin.shape[0]*img_bin.shape[1] if max_area<0 else max_area
    for contour in contours:
        area = cv2.contourArea(contour)
        if area >= min_area and area <= max_area:
            x, y, w, h = cv2.boundingRect(contour)
            if 1.0*w/h < wh_ratio and 1.0*h/w < wh_ratio:
                rects.append([x,y,w,h])
    return rects

從函式中可以看出，為了提高候選框的質量，在函式中加入了對凸包面積和外接矩形框長寬比的限制。但需要注意到，凸包的最小面積設定不能太大，否則會導致圖片中一些較小的交通標誌被漏檢。另外，長寬比的限制也不能太苛刻，因為考慮到實際影象中視角的不同，標誌的外接矩形框的長寬比可能會比較大。在程式碼中我的最大長寬比限制為2.5。

這樣候選區域雖然選出來了，但是還需要考慮到一件事，我們找出的候選框大小不一，而我們後面的SVM需要固定長度的特徵向量，因此在HOG特徵提取之前，應把所有的候選區域調整到固定大小（程式碼中我用的是64×64），這裡提供兩種解決方案：（1）不管三七二十一，直接將候選區域resize成指定大小，這樣做很簡單，但是扭曲了原始候選區域的目標資訊，不利於SVM的識別（當然，如果用卷積神經網路，這一點問題不是太大，因為卷積神經網路對於物體的扭曲形變有很好的學習能力）；（2）提取正方形候選區域，然後resize到指定大小。即對於一個（H×W）的候選框，假設H

HOG特徵提取

HOG特徵即梯度方向直方圖。這裡不多介紹，詳細的原理可以看我的這篇部落格：梯度方向直方圖Histogram of Oriented Gradients (HOG)。在具體的實現上是利用skimage庫中的feature模組，函式如下：

def hog_feature(img_array, resize=(64,64)):
    """extract hog feature from an image.
    Args:
        img_array: an image array.
        resize: size of the image for extracture.  
    Return:
    features:  a ndarray vector.      
    """
    img = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, resize)
    bins = 9
    cell_size = (8, 8)
    cpb = (2, 2)
    norm = "L2"
    features = ft.hog(img, orientations=bins, pixels_per_cell=cell_size, 
                        cells_per_block=cpb, block_norm=norm, transform_sqrt=True)
    return features

def extra_hog_features_dir(img_dir, write_txt, resize=(64,64)):
    """extract hog features from images in a directory.
    Args:
        img_dir: image directory.
        write_txt: the path of a txt file used for saving the hog features of all images.
        resize: size of the image for extracture.  
    Return:
        none.
    """
    img_names = os.listdir(img_dir)
    img_names = [os.path.join(img_dir, img_name) for img_name in img_names]
    if os.path.exists(write_txt):
        os.remove(write_txt)

    with open(write_txt, "a") as f:
        index = 0
        for img_name in img_names:
            img_array = cv2.imread(img_name)
            features = hog_feature(img_array, resize)
            label_name = img_name.split("/")[-1].split("_")[0]
            label_num = img_label[label_name]

            row_data = img_name + "\t" + str(label_num) + "\t"

            for element in features:
                row_data = row_data + str(round(element,3)) + " "
            row_data = row_data + "\n"
            f.write(row_data)

            if index%100 == 0:
                print "total image number = ", len(img_names), "current image number = ", index
            index += 1

HOG特徵提取的一些引數設定可以在函式中看到，如影象尺寸為64×64，設定了9個梯度方向（bin=9）進行梯度資訊統計，cell的大小為8×8，每個block包含4個cell（cpb=(2, 2)），標準化方法採用L2標準化（norm=”L2”）。

SVM分類器

對於支援向量機的介紹，網上有一份非常不錯的教程：支援向量機通俗導論（理解SVM的三層境界），建議去看一看。我們這裡主要是用SVM來對找到的候選區域上提取到的HOG特徵做分類。這裡我將分別SVM分類器的資料集建立和擴充、模型訓練和測試。

資料集建立

這裡的資料集和剛開始我們介紹的用於目標檢測的資料集不同，我們這邊需要構建一個用於分類的資料集。因為已經有了上面的資料，我們可以直接從我們的檢測資料中生成。這邊我採用的方法和上面介紹的候選區域提取很相似。總體的思路是從目標檢測的資料集中裁剪出目標區域作為SVM分類的正樣本，同時裁剪出其他的區域（不包含目標的區域）作為負樣本。具體的做法如下：

（1）對於包含目標的圖片，直接根據標籤資訊裁剪出一個正方形區域（以長邊為邊長，少數邊界情況需要變形），並移除一些不好的樣本（size很小的區域）。這裡裁剪出的正樣本或多或少包含一部分背景資訊，這有利於提高模型對噪聲的魯棒性，同時也為樣本較少的情況下資料擴充（如仿射變換）提供了可能。

（2）對於不包含任何目標的圖片，通過顏色閾值分割（紅色和藍色）和凸包檢測提取一些區域，並裁剪正方形區域（以長邊為邊長），移除面積較小的區域。與直接隨機裁剪相比，這種做法更有針對性，因為在檢測提取候選框的時候，很多和交通標誌顏色很像的區域會被找出來，直接把這些樣本當作負樣本對於我們的模型訓練很有幫助。

以下是我用的建立正負樣本的函式：

解析圖片標註資訊

def parse_xml(xml_file):
    """parse xml_file
    Args:
        xml_file: the input xml file path
    Returns:
        image_path: string
        labels: list of [xmin, ymin, xmax, ymax, class]
    """
    tree = ET.parse(xml_file)
    root = tree.getroot()
    image_path = ''
    labels = []

    for item in root:
        if item.tag == 'filename':
            image_path = os.path.join(DATA_PATH, "JPEGImages/", item.text)
        elif item.tag == 'object':
            obj_name = item[0].text
            obj_num = classes_num[obj_name]
            xmin = int(item[4][0].text)
            ymin = int(item[4][1].text)
            xmax = int(item[4][2].text)
            ymax = int(item[4][3].text)
            labels.append([xmin, ymin, xmax, ymax, obj_num])
    return image_path, labels

正樣本和負樣本提取

def produce_pos_proposals(img_path, write_dir, labels, min_size, square=False, proposal_num=0, ):
    """produce positive proposals based on labels.
    Args:
        img_path: image path.
        write_dir: write directory.
        min_size: the minimum size of the proposals.
        labels: a list of bounding boxes.
            [[x1, y1, x2, y2, cls_num], [x1, y1, x2, y2, cls_num], ...]
        square:  crop a square or not.
    Return:
        proposal_num: proposal numbers.
    """
    img = cv2.imread(img_path)
    rows = img.shape[0]
    cols = img.shape[1]
    for label in labels:
        xmin, ymin, xmax, ymax, cls_num = np.int32(label)
        # remove the proposal with small area
        if xmax-xmin<min_size or ymax-ymin<min_size:
            continue
        # crop a square area
        if square is True:
            xcenter = int((xmin + xmax)/2)
            ycenter = int((ymin + ymax)/2)
            size = max(xmax-xmin, ymax-ymin)
            xmin = max(xcenter-size/2, 0)
            xmax = min(xcenter+size/2,cols)
            ymin = max(ycenter-size/2, 0)
            ymax = min(ycenter+size/2,rows)
            proposal = img[ymin:ymax, xmin:xmax]
            proposal = cv2.resize(proposal, (size,size))
        else:
            proposal = img[ymin:ymax, xmin:xmax]

        cls_name = classes_name[cls_num]
        proposal_num[cls_name] +=1
        write_name = cls_name + "_" + str(proposal_num[cls_name]) + ".jpg"
        cv2.imwrite(os.path.join(write_dir,write_name), proposal)
    return proposal_num

def produce_neg_proposals(img_path, write_dir, min_size, square=False, proposal_num=0):
    """produce negative proposals from a negative image.
    Args:
        img_path: image path.
        write_dir: write directory.
        min_size: the minimum size of the proposals.
        square:  crop a square or not.
        proposal_num: current negative proposal numbers.
    Return:
        proposal_num: negative proposal numbers.
    """
    img = cv2.imread(img_path)
    rows = img.shape[0]
    cols = img.shape[1]
    imgHSV = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    imgBinBlue = cv2.inRange(imgHSV,np.array([100,43,46]), np.array([124,255,255]))
    imgBinRed1 = cv2.inRange(imgHSV,np.array([0,43,46]), np.array([10,255,255]))
    imgBinRed2 = cv2.inRange(imgHSV,np.array([156,43,46]), np.array([180,255,255]))
    imgBinRed = np.maximum(imgBinRed1, imgBinRed2)
    imgBin = np.maximum(imgBinRed, imgBinBlue)

    _, contours, _ = cv2.findContours(imgBin, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    for contour in contours:
        x,y,w,h = cv2.boundingRect(contour)
        if w<min_size or h<min_size:
            continue

        if square is True:
            xcenter = int(x+w/2)
            ycenter = int(y+h/2)
            size = max(w,h)
            xmin = max(xcenter-size/2, 0)
            xmax = min(xcenter+size/2,cols)
            ymin = max(ycenter-size/2, 0)
            ymax = min(ycenter+size/2,rows)
            proposal = img[ymin:ymax, xmin:xmax]
            proposal = cv2.resize(proposal, (size,size))

        else:
            proposal = img[y:y+h, x:x+w]
        write_name = "background" + "_" + str(proposal_num) + ".jpg"
        proposal_num += 1
        cv2.imwrite(os.path.join(write_dir,write_name), proposal)
    return proposal_num

def produce_proposals(xml_dir, write_dir, square=False, min_size=30):
    """produce proposals (positive examples for classification) to disk.
    Args:
        xml_dir: image xml file directory.
        write_dir: write directory of all proposals.
        square: crop a square or not.
        min_size: the minimum size of the proposals.
    Returns:
        proposal_num: a dict of proposal numbers.
    """

    proposal_num = {}
    for cls_name in classes_name:
        proposal_num[cls_name] = 0

    index = 0
    for xml_file in os.listdir(xml_dir):
        img_path, labels = parse_xml(os.path.join(xml_dir,xml_file))
        img = cv2.imread(img_path)
        rows = img.shape[0]
        cols = img.shape[1]

        if len(labels) == 0:
            neg_proposal_num = produce_neg_proposals(img_path, write_dir, min_size, square, proposal_num["background"])
            proposal_num["background"] = neg_proposal_num
        else:
            proposal_num = produce_pos_proposals(img_path, write_dir, labels, min_size, square=True, proposal_num=proposal_num)

        if index%100 == 0:
            print "total xml file number = ", len(os.listdir(xml_dir)), "current xml file number = ", index
            print "proposal num = ", proposal_num
        index += 1

    return proposal_num

上面的返回值proposal_num是用來統計提取的樣本數量的。最終我在訓練集中獲取到的樣本數量如下：

proposal_num = {'right': 117, 'straight': 334, 'stop': 224, 'no hook': 168, 'crosswalk': 128, 'left': 208, 'background': 1116}

裁剪的部分正負樣本如下：

前面幾行對應6類正樣本，最後一行是背景，可以發現，程式碼中找出來的背景主要是和我們交通標誌顏色（藍色和紅色）相似的區域。我們用相同的方法從我們的驗證集中提取正負樣本用於SVM模型引數的調整和評估。這裡就不再贅述。

訓練資料擴充

從上面各個類別樣本數量上來看，正樣本的各類標誌數量相對背景（負樣本）很少。為了近些年資料的平衡，我們對正樣本進行了擴充。由於我們的資料中包含了向左向右等標誌，如何通過旋轉或者映象變換會出問題（當然可以旋轉小範圍旋轉），我也考慮過亮度變換，但是由於HOG特徵中引入了歸一化方法使得HOG特徵對光照不敏感。最終我選用的是仿射變換，這個可以通過OpenCV很方便地實現，具體的仿射變換理論和程式碼示例可以參考OpenCV官方教程中的Affine Transformations ，這裡也給出我對資料集仿射變換的函式：

def affine(img, delta_pix):
    """affine transformation
    Args:
        img: a numpy image array.
        delta_pix: the offset for affine.
    Return:
        res: affined image. 
    """
    rows, cols, _ = img.shape
    pts1 = np.float32([[0,0], [rows,0], [0, cols]])
    pts2 = pts1 + delta_pix
    M = cv2.getAffineTransform(pts1, pts2)
    res = cv2.warpAffine(img, M, (rows, cols))
    return res


def affine_dir(img_dir, write_dir, max_delta_pix):
    """ affine transformation on the images in a directory.
    Args:
        img_dir: image directory.
        write_dir: save directory of affined images.
        max_delta_pix: the maximum offset for affine.
    """
    img_names = os.listdir(img_dir)
    img_names = [img_name for img_name in img_names if img_name.split(".")[-1]=="jpg"]
    for index, img_name in enumerate(img_names):
        img = cv2.imread(os.path.join(img_dir,img_name))
        save_name = os.path.join(write_dir, img_name.split(".")[0]+"f.jpg")
        delta_pix = np.float32(np.random.randint(-max_delta_pix, max_delta_pix+1, [3,2]))
        img_a = affine(img, delta_pix)
        cv2.imwrite(save_name, img_a)

上面函式輸入引數max_delta_pix用來控制隨機仿射變換的最大強度（正整數），max_delta_pix的絕對值越大，變換越明顯（太大可能導致目標資訊的完全丟失），我在擴充時這個引數取為10。需要注意的是，10只是變換的最大強度，在對每一張圖片進行變換前，會在[-max_delta, max_delta]生成一個隨機整數delta_pix（當然你也可以多取幾次不同的值來生成更多的變換圖片）,這個整數控制了當前圖片變換的強度。以下是一些變換的結果示例：

模型訓練和測試

模型的訓練我是直接呼叫sklearn中的svm庫，很多引數都使用了預設值，在訓練時發現，懲罰因子C的取值對訓練的影響很大，我這邊就偷個懶，大概設定了一個值。（超引數可以利用之前的驗證集去調整，這裡就不贅述了。）用到的函式如下：

def load_hog_data(hog_txt):
    """ load hog features.
    Args:
        hog_txt: a txt file used to save hog features.
            one line data is formated as "img_path \t cls_num \t hog_feature_vector"
    Return:
        img_names: a list of image names.
        labels: numpy array labels (1-dim).
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
    """
    img_names = []
    labels = []
    hog_features = []
    with open(hog_txt, "r") as f:
        data = f.readlines()
        for row_data in data:
            row_data = row_data.rstrip()
            img_path, label, hog_str = row_data.split("\t")
            img_name = img_path.split("/")[-1]
            hog_feature = hog_str.split(" ")
            hog_feature = [float(hog) for hog in hog_feature]
            #print "hog feature length = ", len(hog_feature)
            img_names.append(img_name)
            labels.append(int(label))
            hog_features.append(hog_feature)
    return img_names, np.array(labels), np.array(hog_features)



def svm_train(hog_features, labels, save_path="./svm_model.pkl"):
    """ SVM train
    Args:
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
        labels: numpy array labels (1-dim).
        save_path: model save path.
    Return:
        none.
    """
    clf = SVC(C=10, tol=1e-3, probability = True)
    clf.fit(hog_features, labels)
    joblib.dump(clf, save_path)
    print "finished."

def svm_test(svm_model, hog_feature, labels):
    """SVM test
    Args:
        hog_feature: numpy array hog features.
            formated as [[hog1], [hog2], ...]
        labels: numpy array labels (1-dim).
    Return:
        accuracy: test accuracy.
    """
    clf = joblib.load(svm_model)
    accuracy = clf.score(hog_feature, labels)
    return accuracy

最後，我在3474張訓練集（正樣本擴充為原來的2倍，負樣本沒有擴充）上訓練，在C=10的時候（其他引數預設），在驗證集上（322張）的準確率為97.2%。也就是說有9張圖片分類錯誤，還是可以接受的。

檢測結果

回顧一下，我們現在已經可以提取候選區域提取並分類了，也就是說，已經可以對一張完整的圖片進行檢測了。這裡給出我的檢測程式碼和檢測結果示例。

import os
import numpy as np 
import cv2
from skimage import feature as ft 
from sklearn.externals import joblib

cls_names = ["straight", "left", "right", "stop", "nohonk", "crosswalk", "background"]
img_label = {"straight": 0, "left": 1, "right": 2, "stop": 3, "nohonk": 4, "crosswalk": 5, "background": 6}

def preprocess_img(imgBGR, erode_dilate=True):
    """preprocess the image for contour detection.
    Args:
        imgBGR: source image.
        erode_dilate: erode and dilate or not.
    Return:
        img_bin: a binary image (blue and red).

    """
    rows, cols, _ = imgBGR.shape
    imgHSV = cv2.cvtColor(imgBGR, cv2.COLOR_BGR2HSV)

    Bmin = np.array([100, 43, 46])
    Bmax = np.array([124, 255, 255])
    img_Bbin = cv2.inRange(imgHSV,Bmin, Bmax)

    Rmin1 = np.array([0, 43, 46])
    Rmax1 = np.array([10, 255, 255])
    img_Rbin1 = cv2.inRange(imgHSV,Rmin1, Rmax1)

    Rmin2 = np.array([156, 43, 46])
    Rmax2 = np.array([180, 255, 255])
    img_Rbin2 = cv2.inRange(imgHSV,Rmin2, Rmax2)
    img_Rbin = np.maximum(img_Rbin1, img_Rbin2)
    img_bin = np.maximum(img_Bbin, img_Rbin)

    if erode_dilate is True:
        kernelErosion = np.ones((9,9), np.uint8)
        kernelDilation = np.ones((9,9), np.uint8) 
        img_bin = cv2.erode(img_bin, kernelErosion, iterations=2)
        img_bin = cv2.dilate(img_bin, kernelDilation, iterations=2)

    return img_bin


def contour_detect(img_bin, min_area=0, max_area=-1, wh_ratio=2.0):
    """detect contours in a binary image.
    Args:
        img_bin: a binary image.
        min_area: the minimum area of the contours detected.
            (default: 0)
        max_area: the maximum area of the contours detected.
            (default: -1, no maximum area limitation)
        wh_ratio: the ration between the large edge and short edge.
            (default: 2.0)
    Return:
        rects: a list of rects enclosing the contours. if no contour is detected, rects=[]
    """
    rects = []
    _, contours, _ = cv2.findContours(img_bin.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
    if len(contours) == 0:
        return rects

    max_area = img_bin.shape[0]*img_bin.shape[1] if max_area<0 else max_area
    for contour in contours:
        area = cv2.contourArea(contour)
        if area >= min_area and area <= max_area:
            x, y, w, h = cv2.boundingRect(contour)
            if 1.0*w/h < wh_ratio and 1.0*h/w < wh_ratio:
                rects.append([x,y,w,h])
    return rects


def draw_rects_on_img(img, rects):
    """ draw rects on an image.
    Args:
        img: an image where the rects are drawn on.
        rects: a list of rects.
    Return:
        img_rects: an image with rects.
    """
    img_copy = img.copy()
    for rect in rects:
        x, y, w, h = rect
        cv2.rectangle(img_copy, (x,y), (x+w,y+h), (0,255,0), 2)
    return img_copy


def hog_extra_and_svm_class(proposal, clf, resize = (64, 64)):
    """classify the region proposal.
    Args:
        proposal: region proposal (numpy array).
        clf: a SVM model.
        resize: resize the region proposal
            (default: (64, 64))
    Return:
        cls_prop: propabality of all classes.
    """
    img = cv2.cvtColor(proposal, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, resize)
    bins = 9
    cell_size = (8, 8)
    cpb = (2, 2)
    norm = "L2"
    features = ft.hog(img, orientations=bins, pixels_per_cell=cell_size, 
                        cells_per_block=cpb, block_norm=norm, transform_sqrt=True)
    print "feature = ", features.shape
    features = np.reshape(features, (1,-1))
    cls_prop = clf.predict_proba(features)
    print("type = ", cls_prop)
    print "cls prop = ", cls_prop
    return cls_prop


if __name__ == "__main__":
    img = cv2.imread("/home/meringue/Documents/traffic_sign_detection/svm_hog_classification/sign_89.jpg")
    rows, cols, _ = img.shape
    img_bin = preprocess_img(img,False)
    cv2.imshow("bin image", img_bin)
    cv2.imwrite("bin_image.jpg", img_bin)
    min_area = img_bin.shape[0]*img.shape[1]/(25*25)
    rects = contour_detect(img_bin, min_area=min_area)
    img_rects = draw_rects_on_img(img, rects)
    cv2.imshow("image with rects", img_rects)
    cv2.imwrite("image_rects.jpg", img_rects)

    clf = joblib.load("./svm_model.pkl")

    img_bbx = img.copy()

    for rect in rects:
        xc = int(rect[0] + rect[2]/2)
        yc = int(rect[1] + rect[3]/2)

        size = max(rect[2], rect[3])
        x1 = max(0, int(xc-size/2))
        y1 = max(0, int(yc-size/2))
        x2 = min(cols, int(xc+size/2))
        y2 = min(rows, int(yc+size/2))
        proposal = img[y1:y2, x1:x2]
        cls_prop = hog_extra_and_svm_class(proposal, clf)
        cls_prop = np.round(cls_prop, 2)[0]
        cls_num = np.argmax(cls_prop)
        cls_name = cls_names[cls_num]
        prop = cls_prop[cls_num]
        if cls_name is not "background":
            cv2.rectangle(img_bbx,(rect[0],rect[1]), (rect[0]+rect[2],rect[1]+rect[3]), (0,0,255), 2)
            cv2.putText(img_bbx, cls_name+str(prop), (rect[0], rect[1]), 1, 1.5, (0,0,255),2)

    cv2.imshow("detect result", img_bbx)
    cv2.imwrite("detect_result.jpg", img_bbx)
    cv2.waitKey(0)

上圖中從左到右分別為閾值化後的圖、候選框提取結果和最終檢測檢測結果（類別名+置信度），最終各個類別標誌的準確率和召回率（IOU的閾值設為0.5）如下（計算的程式碼在我的github裡可以找到，就不放在部落格裡了。）：

標誌	直行 (straight)	左轉(left)	右轉 (right)	禁止鳴笛(no-honk)	人行橫道(crosswalk)	禁止通行(stop)
準確率（precision）	41.6%	45.8%	43.5%	45.3%	75.6%	45.7%
召回率（recall）	37.1%	39.8%	43.5%	48.3%	50.8%	57.1%

用於視訊中的實時檢測視訊示例：

對SVM輸出的概率值依次設定0.1、0.2 …0.9的閾值，得到的平均準確率和召回率變化趨勢如下：

pre_rec

從資料上可以發現，總體的檢測結果還是很不理想的。我們通過觀察準確率和召回率的變化曲線發現，當置信度的閾值不斷變大時，平均準確率不斷上升，而召回率比較平緩（閾值大於0.7的時候略微下降）。進一步觀察檢測的圖片發現，候選區域的提取是我們檢測模型效能的瓶頸，這主要體現在以下兩點：

（1）有很多標誌所在的候選區域被漏檢（詳見Bad Cases Analysis），這直接導致最終的召回率很低。
（2）有些包含標誌的候選區域雖然被找出來了，但是其中包含了大量的噪聲，如出現相似顏色的背景時，標誌只佔候選區域的一小部分，或者多個標誌相鄰時被框在了一起，這將直接影響分類的結果，降低準確率。

而提高置信度時，大量的誤檢會被排除，而漏檢情況幾乎不受影響（候選區域的提取不受置信度閾值的影響），所以會明顯提高準確率。

Bad Cases Analysis

基於上面的檢測結果，我把所有的檢測矩形框在影象中畫出來，並一一檢視，發現誤檢和漏檢問題主要體現在一下幾個方面：

光線不均勻。由於圖片都是在不同的時刻從戶外進行採集的，測試集中的部分交通標誌存在在強光和弱光的情況，這將直接對候選區域的提取造成困難。雖然我們在顏色空間上已經選用了對光線魯棒性較好的HSV空間，但仍然無法避免光照過於惡劣的情況。不過我發現，光照對分類的影響很小，這是因為我們使用的HOG特徵裡有標準化的操作，使得同一個候選框在不同的光照下HOG特徵保持不變。我實驗的時候考慮過適當放寬藍色和紅色的閾值範圍，但是這樣做也會產生更多的背景框，影響檢測速度。

複雜或相似的背景干擾。我們的閾值化是基於顏色資訊的，所以當標誌物周圍有顏色相近的背景時（如樓房、藍天等），會很大程度上對候選框的提取造成影響。如下圖中，由於左邊的兩個標誌周圍有顏色接近紅色的小區的干擾，所以在閾值化時周圍包含了大量的噪聲，對SVM的分類影響很大。可以考慮加入輕微的腐蝕膨脹來弱化噪聲的影響，但對於一些較小甚至不完全封閉的標誌，會破壞原有的結構，造成漏檢。

一次不太成功的專案實戰：HOG特徵+SVM實現交通標誌的檢測

資料集

資料預處理

資料標註

資料集劃分

檢測框架

候選區域提取

HOG特徵提取

SVM分類器

資料集建立

訓練資料擴充

模型訓練和測試

檢測結果

Bad Cases Analysis

一次不太成功的專案實戰：HOG特徵+SVM實現交通標誌的檢測

記一次不太成功的爬取dingtalk上的企業的信息

一次不成功的風投見面會（中期專案，則需要有良好的財務構成，以及合適的風投退出方式）

專案實戰：流水線影象顯示控制元件（列重新整理、1ms一次、縮放、拽拖、拽拖預覽、效能優化、支援OpenGL GPU加速）

記一次不成功的redis訪問

大漠折戟--記一次不順利的災備專案實施經歷

專案實戰：一、json解析資料的框架

QT開發（十一）——專案實戰：截圖工具

記一次《flask web 開發實戰》flask-login學習(不完全的小整理)

一次不專業的路由/交換機配置學習

記錄一次網站漏洞修復過程(三)：第二輪處理（攔截SQL註入、跨站腳本攻擊XSS）

[Windows10]記一次修復註冊表相關血案：該文件沒有與之關聯的應用來執行該操作。請安裝應用，若已經安裝應用，請在“默認應用設置”頁面中創建關聯。

做一次面向對象的體操：將JSON字符串轉換為嵌套對象的一種方法

以太坊開發實戰：私有鏈搭建操作指南

Vue.js學習記錄-14-Vue去哪兒網專案實戰：城市列表頁開發-Alphabet + 細節配置補充

Vue.js學習記錄-13-Vue去哪兒網專案實戰：城市列表頁開發-Search + List

Vue.js學習記錄-12-Vue去哪兒網專案實戰：城市列表頁開發-Header + Vuex實現資料互動

Vue.js學習記錄-11-Vue去哪兒網專案實戰：城市列表頁開發-功能點概述 + City

Vue.js學習記錄-10-Vue去哪兒網專案實戰：首頁開發-Icon + Recommend + Weekend + 細節配置補充

Vue.js學習記錄-9-Vue去哪兒網專案實戰：首頁開發-Home + Header + Swiper

一次不太成功的專案實戰：HOG特徵+SVM實現交通標誌的檢測

資料集

資料預處理

資料標註

資料集劃分

檢測框架

候選區域提取

HOG特徵提取

SVM分類器

資料集建立

訓練資料擴充

模型訓練和測試

檢測結果

Bad Cases Analysis

相關推薦