Repo:Deep Learning with Differential Privacy
翻譯參考:https://blog.csdn.net/qq_42803125/article/details/81232037
>>>Introduction:
當前的神經網路存在的問題:資料集是眾包(crowdsourced)的,並且可能含有敏感資訊
(眾包:一個廣泛的未加定義的群體而不是一個特定的群體)
in this paper:結合了機器學習與隱私保護機制,用一個modest privacy budget(一位數)訓練神經網路 ???這裡我沒怎麼懂
(與之前的成果對比)用了non-convex的objective,幾個layer,一萬-百萬個引數(主要區別在objective和引數的數目)
1.追蹤detailed info of the privacy loss->對overall privacy loss 的更緊的估計
2.對individual training example用了一個計算梯度的高效演算法,把工作分成小批量處理減少記憶體,在input layer應用differential privacy principle
3.在tensor flow上訓練帶有differential privacy的模型,用MNIST和CIFAR10 評估:證明deep neural network的privacy protection可以modest cost in software complexity, training effiency,model quality
ML system經常會有保護資料的機制;理解deep neural network 是很困難的;adversary會提取訓練資料恢復image
>>>background
>>>differential privacy
是一個保證聚合資料集的標準
training dataset:<image,label> pairs
adjcent的定義:we say that two of these sets are adjacent if they differ in a single entry, that is, if one image-label pair is present in one set and absent in the other.????所以是有且只有一個條目不同還是隻要有一個條目不同就是不同?
(ε,δ)-differential privacy的定義:
A randomized mechanism M: D → R with domain D and range R satisfies (ε,δ)-differential privacy if for any two adjacent inputs d, d′ ∈ D and for any subset of outputs S ⊆ R it holds that
Pr[M(d) ∈ S] ≤ Pr[M(d′) ∈ S] + δ.
differential privacy的性質:composability(組合性), group privacy, and robustness to auxiliary information(輔助資訊)。
讓一個實值函式f具有differential provacy的方法是新增一個Sf;f:D->R 的sensitivity Sf=max{|f(d)-f(d')|}
設計一個differentially private additive-noise mechanism的步驟:1.approximating the func- tionality by a sequential composition of bounded-sensitivity functions;2.choosing parameters of additive noise;3.per- forming privacy analysis of the resulting mechanism
>>>deep learning
Deep neural network: inputs+params f >outputs f中有很多層 仿射函式啊非線性變換什麼的
loss function 的定義 penalty for mismatching the training data
The loss L(θ) on parameters θ is the average of the loss over the training examples {x1,...,xN}, so
trianing 包括找到一個使loss足夠小(理想情況下最小)
在複雜的network裡面loss很難最小化,一般是用一個mini-batch stochastic gradient de- scent (SGD) 演算法。
在這個演算法裡面,在每一步,一些隨機的樣例組成一個batch B,然後計算,作為的估值,然後就會隨著-g(B)的方向降到一個local minimum !!!真的是機智啊
TensorFlow:TensorFlow允許程式設計師從基本操作符定義大型計算圖,並在異構分散式系統中分配它們的執行。 TensorFlow自動建立漸變的計算圖形; 它還使批量計算變得容易。
>>>approach&implementation
differential private training on neural network
主要組成部分:a differentially private stochastic gradient descent (SGD) algorithm;the moments accountant;hyperpa- rameter tuning.(超級引數調整)
>>>Differentially Private SGD Algorithm
在整個訓練的過程中控制訓練資料的影響,特別是在SGD的計算中。
1.training 一個,使最小化:在每個SGD中,計算梯度,clip梯度(???what),加噪保護隱私,向noisy gradient的反方向進一步
(但是在那個虛擬碼裡面 T 是個什麼????)
>>>Norm Clipping
C 是一個bound,因為梯度沒有先驗界
This clipping ensures that if ∥g∥2 ≤ C, then g is preserved, whereas if ∥g∥2 > C, it gets scaled down to be of norm C.
>>>Per-layer and time-dependent parameters
對於multi layer network對每一層layer單獨考慮,所以就會有不同的clipping thresholds C 和noise scales σ
>>>Lots
This average provides an unbiased estimator, the variance of which decreases quickly with the size of the group. We call such a group a lot, to distinguish it from the computational grouping that is commonly called a batch.
set batch size much smaller than the param L to limit memory
perform the computation in batches then group several batches into a lot for adding noise(不明白,所以xi是個batch嘛???)
>>>privacy accounting
computes the privacy cost at each access to the training data, and accumulates this cost as the training progresses
(所以什麼是privacy cost???)
>>>moment accountant(這個部分我有點懵逼)
privacy amplification theorem->each step is (O(qε),qδ)-differentially private with respect to the full database where q = L/N is the sampling ratio per lot and ε ≤ 1
moments accountant->(O(qε T ), δ)- differentially private for appropriately chosen settings of the noise scale and the clipping threshold
privacy loss的定義:for neighboring databases d,d′ ∈ Dn, a mechanism M, auxiliary input aux, and an outcome o ∈ R, define the privacy loss at o as
aux input of Mk is the output of all previous M
privacy guarantees: bound all
>>>hyperparameter tuning
hyperparameters that we can tune in order to balance privacy, accuracy, and performance
就是引數的調整:對於convex objective batch size要小於1,non-convex objective和epoch的number一樣;learning rate不用調到很小,比較好的是一開始較大,逐漸減小,最後保持一個常數
>>>implementation
sanitizer, which preprocesses the gradient to protect privacy, and privacy_accountant, which keeps track of the privacy spending over the course of training.
>>>result
MNIST, we achieve 97% training accu- racy and for CIFAR-10 we achieve 73% accuracy both with (8, 10−5 )-differential privacy
>>>related work
>>>concludes
a mechanism for tracking privacy loss, the moments accoun- tant. It permits tight automated analysis of the privacy loss of complex composite mechanisms that are currently beyond the reach of advanced composition theorems.