詞袋模型BoW影象檢索Python實戰

阿新 • • 發佈：2018-12-30

前幾天把HABI雜湊影象檢索工具包更新到V2.0版本後，小白菜又重新回頭來用Python搞BoW詞袋模型，一方面主要是練練Python，另一方面也是為了CBIR群開講的關於影象檢索群活動第二期而準備的一些素材。關於BoW，網上堆資料講得挺好挺全的了，小白菜自己在曾留下過一篇講解BoW詞袋構建過程的博文Bag of Words模型，所以這裡主要講講BoW的實戰。不過在實戰前，小白菜還想在結合自己這兩年多BoW的思考和沉澱重新以更直白的方式對BoW做一下總結。

舉兩個例子來說明BoW詞袋模型。第一個例子在介紹BoW詞袋模型時一般資料裡會經常使用到，就是將影象類比成文件，即一幅影象類比成一個文件，將影象中提取的諸如SIFT特徵點類比成文件中的單詞，然後把從影象庫中所有提取的所有SIFT特徵點弄在一塊進行聚類，從中得到具有代表性的聚類中心(單詞)，再對每一幅影象中的SIFT特徵點找距離它最近的聚類中心(單詞)，做詞頻(TF)統計，圖解如下： clustering

做完詞頻(TF)統計後，為了降低停用詞帶來的干擾，可以再算個逆文件詞頻(IDF)，也就是給TF乘上個權重，該過程可以圖解如下： clustering

上面單詞權重即逆文件詞頻(IDF)，那時通過統計每個單詞包含了多少個文件然後按設定的一個對數權重公式計算得來的，具體如下： clustering

對於上傳上來的查詢影象，提取SIFT然後統計tf後乘上上面的idf便可得到id-idf向量，然後進行L2歸一化，用內積做相似性度量。

在做TF統計的時候，我們知道一般為了取得更好的效果，通常單詞數目會做得比較大，動則上萬或幾十萬，所以在做聚類的時候，可以對這些類中心做一個K-D樹，這樣在做TF詞頻統計的時候便可以加快單詞直方圖計算的速度。

上面舉的例子對於初次接觸BoW的人來說可能講得不是那麼的直觀，小白菜可以舉一個更直觀的例子(雖然有些地方可能會不怎麼貼切，但還是觸及BoW的本質)，比如美國總統全國大選，假設有10000個比較有影響力的人蔘加總統競選，這10000個人表示的就是聚類中心，他們最具有代表性(K-means做的就是得到那些設定數目的最具有代表性的特徵點)，每個州類比成一幅影象，州里的人手裡持的票就好比是SIFT特徵點，這樣的話，我們就可以對每個州做一個10000維的票數統計結果，這個統計出來的就是上面第一個例子裡所說的詞頻向量。另外，我們還可以統計每個競選人有多少個州投了他的票，那麼就可以得到一個10000維長的對州的統計結果，這個結果再稍微和對數做下處理，便得到了所謂的逆文件詞頻。

上面的兩個例子應該講清楚了BoW詞袋模型吧，下面就來看看BoW詞袋模型用Python是怎麼實現的。

#!/usr/local/bin/python2.7
#python findFeatures.py -t dataset/train/

import argparse as ap
import cv2
import numpy as np
import os
from sklearn.externals import joblib
from scipy.cluster.vq import *

from sklearn import preprocessing
from rootsift import 
 RootSIFT
import math

# Get the path of the training set
parser = ap.ArgumentParser()
parser.add_argument("-t", "--trainingSet", help="Path to Training Set", required="True")
args = vars(parser.parse_args())

# Get the training classes names and store them in a list
train_path = args["trainingSet"]
#train_path = "dataset/train/"

training_names = os.listdir(train_path)

numWords = 1000

# Get all the path to the images and save them in a list
# image_paths and the corresponding label in image_paths
image_paths = []
for training_name in training_names:
    image_path = os.path.join(train_path, training_name)
    image_paths += [image_path]

# Create feature extraction and keypoint detector objects
fea_det = cv2.FeatureDetector_create("SIFT")
des_ext = cv2.DescriptorExtractor_create("SIFT")

# List where all the descriptors are stored
des_list = []

for i, image_path in enumerate(image_paths):
    im = cv2.imread(image_path)
    print "Extract SIFT of %s image, %d of %d images" %(training_names[i], i, len(image_paths))
    kpts = fea_det.detect(im)
    kpts, des = des_ext.compute(im, kpts)
    # rootsift
    #rs = RootSIFT()
    #des = rs.compute(kpts, des)
    des_list.append((image_path, des))

# Stack all the descriptors vertically in a numpy array
#downsampling = 1
#descriptors = des_list[0][1][::downsampling,:]
#for image_path, descriptor in des_list[1:]:
#    descriptors = np.vstack((descriptors, descriptor[::downsampling,:]))

# Stack all the descriptors vertically in a numpy array
descriptors = des_list[0][1]
for image_path, descriptor in des_list[1:]:
    descriptors = np.vstack((descriptors, descriptor))

# Perform k-means clustering
print "Start k-means: %d words, %d key points" %(numWords, descriptors.shape[0])
voc, variance = kmeans(descriptors, numWords, 1)

# Calculate the histogram of features
im_features = np.zeros((len(image_paths), numWords), "float32")
for i in xrange(len(image_paths)):
    words, distance = vq(des_list[i][1],voc)
    for w in words:
        im_features[i][w] += 1

# Perform Tf-Idf vectorization
nbr_occurences = np.sum( (im_features > 0) * 1, axis = 0)
idf = np.array(np.log((1.0*len(image_paths)+1) / (1.0*nbr_occurences + 1)), 'float32')

# Perform L2 normalization
im_features = im_features*idf
im_features = preprocessing.normalize(im_features, norm='l2')

joblib.dump((im_features, image_paths, idf, numWords, voc), "bof.pkl", compress=3)

將上面的檔案儲存為findFeatures.py，前面主要是一些通過parse使得可以在敲命令列的時候可以向裡面傳遞引數，後面就是提取SIFT特徵，然後聚類，計算TF和IDF，得到單詞直方圖後再做一下L2歸一化。一般在一幅影象中提取的到SIFT特徵點是非常多的，而如果影象庫很大的話，SIFT特徵點會非常非常的多，直接聚類是非常困難的(記憶體不夠，計算速度非常慢)，所以，為了解決這個問題，可以以犧牲檢索精度為代價，在聚類的時候先對SIFT做降取樣處理。最後對一些在線上查詢時會用到的變數儲存下來。對於某個影象庫，可以在命令列裡通過下面命令生成BoF：

python findFeatures.py -t dataset/train/

線上查詢階段相比於上面簡單了些，沒有了聚類過程，具體程式碼如下:

#!/usr/local/bin/python2.7
#python search.py -i dataset/train/ukbench00000.jpg

import argparse as ap
import cv2
import imutils
import numpy as np
import os
from sklearn.externals import joblib
from scipy.cluster.vq import *

from sklearn import preprocessing
import numpy as np

from pylab import *
from PIL import Image
from rootsift import RootSIFT

# Get the path of the training set
parser = ap.ArgumentParser()
parser.add_argument("-i", "--image", help="Path to query image", required="True")
args = vars(parser.parse_args())

# Get query image path
image_path = args["image"]

# Load the classifier, class names, scaler, number of clusters and vocabulary
im_features, image_paths, idf, numWords, voc = joblib.load("bof.pkl")

# Create feature extraction and keypoint detector objects
fea_det = cv2.FeatureDetector_create("SIFT")
des_ext = cv2.DescriptorExtractor_create("SIFT")

# List where all the descriptors are stored
des_list = []

im = cv2.imread(image_path)
kpts = fea_det.detect(im)
kpts, des = des_ext.compute(im, kpts)

# rootsift
#rs = RootSIFT()
#des = rs.compute(kpts, des)

des_list.append((image_path, des))

# Stack all the descriptors vertically in a numpy array
descriptors = des_list[0][1]

#
test_features = np.zeros((1, numWords), "float32")
words, distance = vq(descriptors,voc)
for w in words:
    test_features[0][w] += 1

# Perform Tf-Idf vectorization and L2 normalization
test_features = test_features*idf
test_features = preprocessing.normalize(test_features, norm='l2')

score = np.dot(test_features, im_features.T)
rank_ID = np.argsort(-score)

# Visualize the results
figure()
gray()
subplot(5,4,1)
imshow(im[:,:,::-1])
axis('off')
for i, ID in enumerate(rank_ID[0][0:16]):
    img = Image.open(image_paths[ID])
    gray()
    subplot(5,4,i+5)
    imshow(img)
    axis('off')

show()

將上面的程式碼儲存為search.py,對某幅影象進行查詢時，只需在命令列裡輸入：

#python search.py -i dataset/train/ukbench00000.jpg(查詢影象的路徑)

上面的程式碼中，你可以看到rootSIFT註釋掉了，你也可以去掉註釋，採用rootSIFT，但這裡實驗中我發覺rootSIFT並沒有SIFT的效果好。最後看看檢索的效果，最上面一張是查詢影象，後面的是搜尋到的影象：

ukbench00000

ukbench00055

整個實戰的程式碼可以在這裡下載：下載地址。

from: http://yongyuan.name/blog/practical-BoW-for-image-retrieval-with-python.html

詞袋模型BoW影象檢索Python實戰

詞袋模型BoW影象檢索Python實戰

BoW影象檢索Python實戰

第十九節、基於傳統影象處理的目標檢測與識別(詞袋模型BOW+SVM附程式碼)

【泡泡機器人原創專欄】DBoW3 視覺詞袋模型視覺字典和影象資料庫分析

Bow詞袋模型原理與例項（bag of words）

BoW詞袋模型Bag of Words cpp實現(stable version 0.01)

機器學習基礎（二）——詞集模型（SOW）和詞袋模型（BOW）

BoW模型用於影象檢索的一般化流程

BOW詞袋模型

詞袋模型（BOW，bag of words）和詞向量模型（Word Embedding）概念介紹

機器學習---文本特征提取之詞袋模型（Machine Learning Text Feature Extraction Bag of Words）

【火爐煉AI】機器學習051-視覺詞袋模型+極端隨機森林建立圖像分類器

NLP入門（一）詞袋模型及句子相似度

對文字抽取詞袋模型特徵

詞袋模型和詞向量模型

迴環檢測中的詞袋模型（bag of words）

Bag-of-words 詞袋模型基本原理

文字相似度-詞袋模型

自然語言處理中的詞袋模型

自然語言處理(NLP) 三：詞袋模型 + 文字分類

詞袋模型BoW影象檢索Python實戰

相關推薦