1. 程式人生 > >基於BOW模型的影象分類Bag Of Visual Words model for image classification

基於BOW模型的影象分類Bag Of Visual Words model for image classification

I wanted to play around with Bag Of Words for visual classification, so I coded a Matlab implementation that uses VLFEAT for the features and clustering. It was tested on classifying Mac/Windows desktop screenshots.
For a small testing data set (about 50 images for each category), the best vocabulary size was about 80.
It scored 97% accuracy on the training set, and 85% accuracy on the cross validation set,
so the over-fitting can be improved a bit more.

Overview:

1. Collect a data set of examples. I used a python script to download images from Google. 2. Partition the data set into a training set, and a cross validation set (80% - 20%). 3. Find key points in each image, using SIFT. 4. Take a patch around each key point, and calculate it's Histogram of Oriented Gradients (HoG)
. Gather all these features. 5. Build a visual vocabulary by finding representatives of the gathered features (quantization).
This done by k-means clustering. 6. Find the distribution of the vocabulary in each image in the training set.
This is done by a histogram with a bin for each vocabulary word.
The histogram values can be either hard values, or soft values.
Hard values means that for each descriptor of a key point patch in an image, we add 1 to the bin of the vocabulary word closest to it in absolute square value.
Soft values means that each patch votes to all 
histogram bins, but give a higher weight to bin representing words that are similar to that patch. Take a look here. 7. Train an SVM on the resulting histograms (each histogram is a feature vector, with a label). 8. Test the classifier on the cross validation set. 9. If results are not satisfactory, repeat 5 for a different vocabulary size and a different SVM parameters. from: http://jacobcv.blogspot.com/2014/05/is-screenshot-from-mac-or-windows-bag.html