正則化技術(分類識別)：PatchShuffle Regularization 論文閱讀筆記

阿新 • • 發佈：2019-01-10

PatchShuffle Regularization論文下載：https://arxiv.org/abs/1707.07103
論文詳細資訊：
這裡寫圖片描述
過擬合問題發生的本質是模型更多的去學習噪聲而不是捕捉潛在的存在於資料中的Varations關鍵因素，即由於缺少資料的多樣性或者模型過於複雜使得模型的學習被不相關的區域性資訊所誤導，學習了沒用的噪聲資料，考慮人的機制，即觀察影象整體結構不變情況下，區域性的適中程度模糊，有助於減少對區域性噪音資料的學習，產生多樣的區域性變化，使模型注意這些variations，這種機制有助於模型的訓練和學習，因此，在本論文工作：引入了一種新的隨機化的正則化方法–PatchShuffle，來提升訓練模型對噪音和遮擋的魯棒性。且與其他的正則化方法具有一定的互補性，可以綜合應用他們，取得更好的效果。該操作簡單有效，可以應用於影象本身或者特徵圖上，減少訓練網路的過擬合發生，某種程度上相當於資料增強（生成新的image或者特徵圖（無序隨機的塊內元素被打亂），）提升模型的泛化能力。
由於PatchShuffle只是被應用在全部影象或者特徵圖中佔很小的百分比，且生成的影象或者特徵圖共享原始影象的全域性結構（區域性區域畫素的行列變換而weight sharing），所以作者認為該方法歸於正則化更貼切。應用在image上時，相當於資料增強；PatchShuffle當被應用於特徵圖上時，相當於model ensemble。這實際上, locally shuffling the pixels within a patch is equivalent to shuffling the convolutional kernels given unshuffled patches. PatchShuffle can also be considered to enable weight sharing within each patch. By shuffling, the pixel instantiation at a specific position of an image can be viewed as being sampled from its neighboring pixels within a patch with equal probability。
Improving the robustness of CNNs to data that is noisy or losses partial (如椒鹽噪聲和遮擋)information。PatchShuffle relates to two kinds of regularizations：model ensemble和weight sharing。

與作者相近的工作：
Xu Shen, Xinmei Tian, Shaoyan Sun, and Dacheng Tao. Patch reordering: A novelway to achieve rotation and translation invariance in convolutional neural networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., pages 2534–2540, 2017（該論文重排塊，打亂了影象整體結構，採用啟發式搜尋ranking，聚焦於旋轉和平移變換，這不同於作者的工作）
其他正則化方法：Regularization
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 2
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In ICML, 2013
J. Ba and B. Frey. Adaptive dropout for training deep neural networks. In NIPS, 2013
M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In ICLR, 2013
L. Xie, J. Wang, Z. Wei, M. Wang, and Q. Tian. Disturblabel: Regularizing cnn on the loss layer. In CVPR, 2016

Saurabh Singh, Derek Hoiem, and David A. Forsyth. Swapout: Learning an ensemble of deep architectures. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 28–36, 2016.（swapout）
Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam,The Netherlands, October 11-14, 2016, Proceedings, Part IV, pages 646–661, 2016.（stochastic depth

）

Steven J Nowlan and Geoffrey E Hinton. Simplifying neural networks by soft weight-sharing. volume 4, pages 473–493. MIT Press, 1992（weight-sharing）
Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In NIPS,1991.(weight decay )
Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. 2012.(DropOut)
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. volume 15, pages 1929–1958, 2014.(DropOut)
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.(BatchNormalization)

涉及資料增強的文獻：Data augmentation（flipping,translation, cropping, etc）
Zhe Gan, Ricardo Henao, David Carlson, and Lawrence Carin. Learning deep sigmoid belief networks with data augmentation. In Artificial Intelligence and Statistics, pages 268–276, 2015.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In ICLR, 2014

等換網路：Transformation equivariant and invariant networks.
Robert Gens and Pedro M Domingos. Deep symmetry networks. In NIPS, 2014
Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. 2016
Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Deep learning with sets and point clouds. 2016

論文介紹：塊內元素的隨機打亂排列——PatchShuffle

資料增強和正則化的區別在於，前者偏向於擴大資料集大小及多樣性，而正則化在於不增大資料集容量而是專注於資料變換。
隨機性處理被證明是有用的，對於通過模型平均訓練CNN時的正則化
這裡寫圖片描述
作者做塊的畫素隨機排列應用了矩陣的前乘和後乘的行列變換的幾何意義，建模如下：

影象和特徵圖都是矩陣，所以兩者都可以做塊內元素的隨機打亂排列
PatchShuffle on images在影象上做塊的打亂操作
這裡寫圖片描述

PatchShuffle on feature maps
因為越低的層，空間的結構保留的越多，所以儘可能的讓PatchShuffle應用在隨機挑選的較低層的特徵圖上，而對於更高層的特徵圖，PatchShuffle
使臨近畫素可以權重引數共享，這使對映到原始影象上有更大重疊感受野的臨近畫素受益。

注：PatchShuffle creates new images and feature maps, which increases the variety of the training data。但也引入CNN更多的bias，因此應用PatchShuffle 的影象或者特徵圖要佔小的百分比！

該正則化方法反應在數學公式的推到為：
其中伯努利分佈即：
這裡寫圖片描述

實際中，這個shuffle probability 這裡寫圖片描述不應太大，即原圖被打亂操作的概率應該是小概率。

訓練過程處理的流程：以特徵圖的操作為例：
這裡寫圖片描述

實驗：
四個影象分類資料集：CIFAR-10, SVHN, STL-10 and MNIST.

patch size Hp × Wp and shuffle probability 的超引數影響和選擇：

In fact, within an extent, the increase of both parameters improve the variety of training sample without introducing too much bias. But under larger values, the benefit brought by diversity is gradually overtaken by the classifier bias, so error rate increases.
由圖知，下面實驗採用這裡寫圖片描述

這裡寫圖片描述

Classification Performance w/wo PatchShuffle
在CIFAR-10資料集上
這裡寫圖片描述

在SVHN資料集上
這裡寫圖片描述

在STL-10資料集上
The five-bit binary code denotes on what stages PatchShuffle is applied. The first bit denotes the input layer, and the other four bits correspond to four residual stages
這裡寫圖片描述

在MNIST資料集上 & Robustness to the Noise
Salt-and-pepper noise is added to the image by changing the pixel to white or black with probability τ1. For the occlusion,each pixel is randomly chosen to be imposed by a black block of certain size centered on it with probability τ2. The size of the block adopted in our experiment is 3 × 3.
這裡寫圖片描述
These results indicate that Patchshuffle improves the robustness of CNNs against common image pollutions like noise and occlusion.

正則化技術(分類識別)：PatchShuffle Regularization 論文閱讀筆記

正則化技術(分類識別)：PatchShuffle Regularization 論文閱讀筆記

機器學習：正則化技術

資深程序員帶你玩轉深度學習中的正則化技術（附Python代碼）！

深度學習基礎--正則化與norm--正則化技術

淺議過擬合現象(overfitting)以及正則化技術原理

[視訊講解]史上最全面的正則化技術總結與分析！

L2正則化項為什麼能防止過擬合學習筆記

CS231n課程筆記3.1：線性分類器（SVM，softmax）的誤差函式、正則化

Regularized least-squares classification（正則化最小二乘法分類器）取代SVM

改善深層神經網絡：超參數調試、正則化及優化

斯坦福大學公開課機器學習： advice for applying machine learning | regularization and bais/variance（機器學習中方差和偏差如何相互影響、以及和算法的正則化之間的相互關系）

第九節，改善深層神經網絡：超參數調試、正則化以優化(下)

機器學習之路： python線性回歸過擬合 L1與L2正則化

使用L2正則化和平均滑動模型的LeNet-5MNIST手寫數字識別模型

改善深層神經網路：超引數除錯、正則化以及優化_課程筆記_第一、二、三週

吳恩達改善深層神經網路引數：超引數除錯、正則化以及優化——優化演算法

機器學習：偏差、方差與正則化

改善深層神經網路：超引數除錯、正則化以及優化優化演算法第二週

吳恩達改善深層神經網路：超引數除錯、正則化以及優化第一週

機器學習筆記：正則化

正則化技術(分類識別)：PatchShuffle Regularization 論文閱讀筆記

相關推薦