1. 程式人生 > >正則化技術(分類識別):PatchShuffle Regularization 論文閱讀筆記

正則化技術(分類識別):PatchShuffle Regularization 論文閱讀筆記

PatchShuffle Regularization論文下載:https://arxiv.org/abs/1707.07103
論文詳細資訊
這裡寫圖片描述
過擬合問題發生的本質是模型更多的去學習噪聲而不是捕捉潛在的存在於資料中的Varations關鍵因素,即由於缺少資料的多樣性或者模型過於複雜使得模型的學習被不相關的區域性資訊所誤導,學習了沒用的噪聲資料,考慮人的機制,即觀察影象整體結構不變情況下,區域性的適中程度模糊,有助於減少對區域性噪音資料的學習,產生多樣的區域性變化,使模型注意這些variations,這種機制有助於模型的訓練和學習,因此,在本論文工作:引入了一種新的隨機化的正則化方法–PatchShuffle,來提升訓練模型對噪音和遮擋的魯棒性。且與其他的正則化方法具有一定的互補性,可以綜合應用他們,取得更好的效果。該操作簡單有效,可以應用於影象本身或者特徵圖上,減少訓練網路的過擬合發生,某種程度上相當於資料增強(生成新的image或者特徵圖(無序隨機的塊內元素被打亂),)提升模型的泛化能力。
由於PatchShuffle只是被應用在全部影象或者特徵圖中佔很小的百分比,且生成的影象或者特徵圖共享原始影象的全域性結構(區域性區域畫素的行列變換而weight sharing),所以作者認為該方法歸於正則化更貼切。應用在image上時,相當於資料增強;PatchShuffle當被應用於特徵圖上時,相當於model ensemble。這實際上, locally shuffling the pixels within a patch is equivalent to shuffling the convolutional kernels given unshuffled patches. PatchShuffle can also be considered to enable weight sharing within each patch. By shuffling, the pixel instantiation at a specific position of an image can be viewed as being sampled from its neighboring pixels within a patch with equal probability。
Improving the robustness of CNNs to data that is noisy or losses partial (如椒鹽噪聲和遮擋)information。PatchShuffle relates to two kinds of regularizations:model ensemble和weight sharing。

與作者相近的工作:
Xu Shen, Xinmei Tian, Shaoyan Sun, and Dacheng Tao. Patch reordering: A novelway to achieve rotation and translation invariance in convolutional neural networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 2534–2540, 2017(該論文重排塊,打亂了影象整體結構,採用啟發式搜尋ranking,聚焦於旋轉和平移變換,這不同於作者的工作 )
其他正則化方法:Regularization
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In ICML, 2013
J. Ba and B. Frey. Adaptive dropout for training deep neural networks. In NIPS, 2013
M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In ICLR, 2013
L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In CVPR, 2016

Saurabh Singh, Derek Hoiem, and David A. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 28–36, 2016.(swapout
Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam,The Netherlands, October 11-14, 2016, Proceedings, Part IV, pages 646–661, 2016.(stochastic depth

Steven J Nowlan and Geoffrey E Hinton. Simplifying neural networks by soft weight-sharing. volume 4, pages 473–493. MIT Press, 1992(weight-sharing
Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In NIPS,1991.(weight decay )
Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. 2012.(DropOut)
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. volume 15, pages 1929–1958, 2014.(DropOut)
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.(BatchNormalization)

涉及資料增強的文獻:Data augmentation(flipping,translation, cropping, etc)
Zhe Gan, Ricardo Henao, David Carlson, and Lawrence Carin. Learning deep sigmoid belief networks with data augmentation. In Artificial Intelligence and Statistics, pages 268–276, 2015.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In ICLR, 2014

等換網路:Transformation equivariant and invariant networks.
Robert Gens and Pedro M Domingos. Deep symmetry networks. In NIPS, 2014
Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. 2016
Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Deep learning with sets and point clouds. 2016

論文介紹:塊內元素的隨機打亂排列——PatchShuffle

資料增強和正則化的區別在於,前者偏向於擴大資料集大小及多樣性,而正則化在於不增大資料集容量而是專注於資料變換
隨機性處理被證明是有用的,對於通過模型平均訓練CNN時的正則化
這裡寫圖片描述
作者做塊的畫素隨機排列應用了矩陣的前乘和後乘的行列變換的幾何意義,建模如下:
這裡寫圖片描述

影象和特徵圖都是矩陣,所以兩者都可以做塊內元素的隨機打亂排列
PatchShuffle on images在影象上做塊的打亂操作
這裡寫圖片描述

PatchShuffle on feature maps
因為越低的層,空間的結構保留的越多,所以儘可能的讓PatchShuffle應用在隨機挑選的較低層的特徵圖上,而對於更高層的特徵圖,PatchShuffle
使臨近畫素可以權重引數共享,這使對映到原始影象上有更大重疊感受野的臨近畫素受益。

:PatchShuffle creates new images and feature maps, which increases the variety of the training data。但也引入CNN更多的bias,因此應用PatchShuffle 的影象或者特徵圖要佔小的百分比!

該正則化方法反應在數學公式的推到為:
其中伯努利分佈即:
這裡寫圖片描述
這裡寫圖片描述
這裡寫圖片描述

實際中,這個shuffle probability這裡寫圖片描述不應太大,即原圖被打亂操作的概率應該是小概率。

訓練過程處理的流程:以特徵圖的操作為例:
這裡寫圖片描述

實驗:
四個影象分類資料集:CIFAR-10, SVHN, STL-10 and MNIST.

patch size Hp × Wp and shuffle probability 的超引數影響和選擇:

In fact, within an extent, the increase of both parameters improve the variety of training sample without introducing too much bias. But under larger values, the benefit brought by diversity is gradually overtaken by the classifier bias, so error rate increases.
由圖知,下面實驗採用這裡寫圖片描述這裡寫圖片描述

這裡寫圖片描述
這裡寫圖片描述

Classification Performance w/wo PatchShuffle
在CIFAR-10資料集上
這裡寫圖片描述

在SVHN資料集上
這裡寫圖片描述

在STL-10資料集上
The five-bit binary code denotes on what stages PatchShuffle is applied. The first bit denotes the input layer, and the other four bits correspond to four residual stages
這裡寫圖片描述

在MNIST資料集上 & Robustness to the Noise
Salt-and-pepper noise is added to the image by changing the pixel to white or black with probability τ1. For the occlusion,each pixel is randomly chosen to be imposed by a black block of certain size centered on it with probability τ2. The size of the block adopted in our experiment is 3 × 3.
這裡寫圖片描述
These results indicate that Patchshuffle improves the robustness of CNNs against common image pollutions like noise and occlusion.