1. 程式人生 > >Repo:Deep Learning with Differential Privacy

Repo:Deep Learning with Differential Privacy

翻譯參考:https://blog.csdn.net/qq_42803125/article/details/81232037

>>>Introduction:

當前的神經網路存在的問題:資料集是眾包(crowdsourced)的,並且可能含有敏感資訊

(眾包:一個廣泛的未加定義的群體而不是一個特定的群體)

in this paper:結合了機器學習與隱私保護機制,用一個modest privacy budget(一位數)訓練神經網路  ???這裡我沒怎麼懂

(與之前的成果對比)用了non-convex的objective,幾個layer,一萬-百萬個引數(主要區別在objective和引數的數目)

1.追蹤detailed info of the privacy loss->對overall privacy loss 的更緊的估計

2.對individual training example用了一個計算梯度的高效演算法,把工作分成小批量處理減少記憶體,在input layer應用differential privacy principle

3.在tensor flow上訓練帶有differential privacy的模型,用MNIST和CIFAR10 評估:證明deep neural network的privacy protection可以modest cost in software complexity, training effiency,model quality

ML system經常會有保護資料的機制;理解deep neural network 是很困難的;adversary會提取訓練資料恢復image

>>>background

>>>differential privacy

是一個保證聚合資料集的標準

training dataset:<image,label> pairs

adjcent的定義:we say that two of these sets are adjacent if they differ in a single entry, that is, if one image-label pair is present in one set and absent in the other.????所以是有且只有一個條目不同還是隻要有一個條目不同就是不同?

(ε,δ)-differential privacy的定義

A randomized mechanism M: D → R with domain D and range R satisfies (ε,δ)-differential privacy if for any two adjacent inputs d, d′ ∈ D and for any subset of outputs S ⊆ R it holds that

Pr[M(d) ∈ S] ≤ e^{\varepsilon} Pr[M(d′) ∈ S] + δ.

differential privacy的性質:composability(組合性), group privacy, and robustness to auxiliary information(輔助資訊)。

讓一個實值函式f具有differential provacy的方法是新增一個Sf;f:D->R 的sensitivity Sf=max{|f(d)-f(d')|}

設計一個differentially private additive-noise mechanism的步驟:1.approximating the func- tionality by a sequential composition of bounded-sensitivity functions;2.choosing parameters of additive noise;3.per- forming privacy analysis of the resulting mechanism

>>>deep learning

Deep neural network: inputs+params   f   >outputs f中有很多層 仿射函式啊非線性變換什麼的

loss function 的定義 penalty for mismatching the training data

The loss L(θ) on parameters θ is the average of the loss over the training examples {x1,...,xN}, soL(\theta)={1\over N}\sum_{i} L(\theta,x_i)

trianing 包括找到一個\theta使loss足夠小(理想情況下最小)

在複雜的network裡面loss很難最小化,一般是用一個mini-batch stochastic gradient de- scent (SGD) 演算法。

在這個演算法裡面,在每一步,一些隨機的樣例組成一個batch B,然後計算g(B)={1\over |B|}\sum_{x\in B}\bigtriangledown _{\theta}L(\theta,x),作為\bigtriangledown _{\theta} L(\theta)的估值,然後\theta就會隨著-g(B)的方向降到一個local minimum !!!真的是機智啊

TensorFlow:TensorFlow允許程式設計師從基本操作符定義大型計算圖,並在異構分散式系統中分配它們的執行。 TensorFlow自動建立漸變的計算圖形; 它還使批量計算變得容易。

>>>approach&implementation

differential private training on neural network

主要組成部分:a differentially private stochastic gradient descent (SGD) algorithm;the moments accountant;hyperpa- rameter tuning.(超級引數調整)

>>>Differentially Private SGD Algorithm

在整個訓練的過程中控制訓練資料的影響,特別是在SGD的計算中。

1.training 一個\theta,使L(\theta)最小化:在每個SGD中,計算梯度,clip梯度(???what),加噪保護隱私,向noisy gradient的反方向進一步

但是在那個虛擬碼裡面 T 是個什麼????)

>>>Norm Clipping

g\rightarrow {g \over max(1,{||g||_2 \over C})}

C 是一個bound,因為梯度沒有先驗界

This clipping ensures that if ∥g∥2 ≤ C, then g is preserved, whereas if ∥g∥2 > C, it gets scaled down to be of norm C.

>>>Per-layer and time-dependent parameters

對於multi layer network對每一層layer單獨考慮,所以就會有不同的clipping thresholds C 和noise scales σ

>>>Lots

This average provides an unbiased estimator, the variance of which decreases quickly with the size of the group. We call such a group a lot, to distinguish it from the computational grouping that is commonly called a batch. 

set batch size much smaller than the param L to limit memory

perform the computation in batches then group several batches into a lot for adding noise(不明白,所以xi是個batch嘛???)

>>>privacy accounting 

computes the privacy cost at each access to the training data, and accumulates this cost as the training progresses

(所以什麼是privacy cost???)

>>>moment accountant(這個部分我有點懵逼)

privacy amplification theorem->each step is (O(qε),qδ)-differentially private with respect to the full database where q = L/N is the sampling ratio per lot and ε ≤ 1

moments accountant->(O(qε T ), δ)- differentially private for appropriately chosen settings of the noise scale and the clipping threshold

privacy loss的定義:for neighboring databases d,d′ ∈ Dn, a mechanism M, auxiliary input aux, and an outcome o ∈ R, define the privacy loss at o as

c(o,M,aux,d,d')=\log {Pr[M(aux,d)=o]\over Pr[M(aux,d')]=o}

aux input of Mk is the output of all previous M

\alpha_M(\lambda,aux,d,d')=\log E_{o~M(aux,d)} [exp(\lambda c(o;M,aux,d,d'))]

privacy guarantees: bound all \alpha _M(\lambda ) = max_{aux,d,d'}\alpha _M (\lambda;aux,d,d')

>>>hyperparameter tuning

hyperparameters that we can tune in order to balance privacy, accuracy, and performance

就是引數的調整:對於convex objective batch size要小於1,non-convex objective和epoch的number一樣;learning rate不用調到很小,比較好的是一開始較大,逐漸減小,最後保持一個常數

>>>implementation

sanitizer, which preprocesses the gradient to protect privacy, and privacy_accountant, which keeps track of the privacy spending over the course of training.

>>>result

MNIST, we achieve 97% training accu- racy and for CIFAR-10 we achieve 73% accuracy both with (8, 10−5 )-differential privacy

>>>related work

>>>concludes

a mechanism for tracking privacy loss, the moments accoun- tant. It permits tight automated analysis of the privacy loss of complex composite mechanisms that are currently beyond the reach of advanced composition theorems.