1. 程式人生 > >softmax + cross-entropy交叉熵損失函式詳解及反向傳播中的梯度求導

softmax + cross-entropy交叉熵損失函式詳解及反向傳播中的梯度求導

相關

正文

在大多數教程中, softmax 和 cross-entropy 總是一起出現, 求梯度的時候也是一起考慮. 我們來看看為什麼.

關於 softmax 和 cross-entropy 的梯度的求導過程, 已經在上面的兩篇文章中分別給出, 這裡將他們放在一起看看.

1. 題目

考慮一個輸入向量 X, 經 softmax 函式歸一化處理後得到向量 S 作為預測的概率分佈, 已知向量 Y 為真實的概率分佈, 由 cross-entropy 函式計算得出損失值 L (標量), 求 L 關於 X 的梯度. X=(x1,x2,x3, ,xk)Y=(y1,y

2,y3, ,yk)S=(s1,s2,s3, ,sk)=softmax(X)si=exit=1kextl=crossEntropy(S,Y)=i=1kyilog(si) \quad\\ X = (x_1, x_2, x_3, \cdots, x_k)\\ \quad\\ Y = (y_1, y_2, y_3, \cdots, y_k)\\ \quad\\ S = (s_1, s_2, s_3, \cdots,s_k) = softmax(X)\\ \quad\\ s_{i} = \frac{e^{x_{i}}}{ \sum_{t = 1}^{k}e^{x_{t}}} \\ \quad\\ l = crossEntropy(S, Y) = -\sum_{i = 1}^{k}y_{i}log(s_{i})\\
已知 : lS=(ls1,ls2, ,lsk)=(y1s1,y2s2, ,yksk)S=SX=(s1/x1s1/x2s1/xks2/x1s2/x2s2/xksk/x1sk/x2sk/xk)=(s1s1+s1s1s
2s1sks2s1s2s2+s2s2sksks1sks2sksk+sk)S=(S)T \frac{\partial l}{\partial S} =(\frac{\partial l}{\partial s_{1}},\frac{\partial l}{\partial s_{2}}, \cdots, \frac{\partial l}{\partial s_{k}}) =( -\frac{y_1}{s_1}, -\frac{y_2}{s_2},\cdots,-\frac{y_k}{s_k}) \\ \quad\\ \triangledown S= \frac{\partial S}{\partial X}= \begin{pmatrix} \partial s_{1}/\partial x_{1}&\partial s_{1}/\partial x_{2}& \cdots&\partial s_{1}/\partial x_{k}\\ \partial s_{2}/\partial x_{1}&\partial s_{2}/\partial x_{2}& \cdots&\partial s_{2}/\partial x_{k}\\ \vdots & \vdots & \ddots & \vdots \\ \partial s_{k}/\partial x_{1}&\partial s_{k}/\partial x_{2}& \cdots&\partial s_{k}/\partial x_{k}\\ \end{pmatrix}= \begin{pmatrix} -s_{1}s_{1} + s_{1} & -s_{1}s_{2} & \cdots & -s_{1}s_{k} \\ -s_{2}s_{1} & -s_{2}s_{2} + s_{2} & \cdots & -s_{2}s_{k} \\ \vdots & \vdots & \ddots & \vdots \\ -s_{k}s_{1} & -s_{k}s_{2} & \cdots & -s_{k}s_{k} + s_{k} \end{pmatrix} \\ \quad\\ \triangledown S = (\triangledown S)^T

2. 求解過程 :

l=(lx1,lx2,lx3, ,lxk)lxi=ls1s1xi+ls2s2xi+ls3s3xi++lskskxil=(S)TlS=SlS \triangledown l = (\frac{\partial l}{\partial x_1},\frac{\partial l}{\partial x_2},\frac{\partial l}{\partial x_3}, \cdots ,\frac{\partial l}{\partial x_k}) \\ \quad\\ \frac{\partial l}{\partial x_i} = \frac{\partial l}{\partial s_1}\frac{\partial s_1}{\partial x_i} +\frac{\partial l}{\partial s_2}\frac{\partial s_2}{\partial x_i} +\frac{\partial l}{\partial s_3}\frac{\partial s_3}{\partial x_i} + \cdots +\frac{\partial l}{\partial s_k}\frac{\partial s_k}{\partial x_i}\\ \quad\\ \triangledown l = (\triangledown S)^T \cdot \frac{\partial l}{\partial S} = \triangledown S \cdot \frac{\partial l}{\partial S}