1. 程式人生 > >多元複合函式二階導數與向量微積分的思考

多元複合函式二階導數與向量微積分的思考

## 多元複合函式二階導數與向量微積分的思考 ### 引入 對於形似$z=f(u_1,u_2,...,u_n),$其中$u_i=g_i(x_i)$的多元複合函式,對其二階導數的考察常常會經過繁瑣而重複的運算,且容易在連續運用鏈式法則時犯錯。本文將提出該類題型的通解以及理論推導過程供參考。 **例1:**設$z=f(x^2-y^2,e^{xy})$,其中$f$具有二階連續偏導數,求 $\frac{\partial ^2z}{ \partial x \partial y}$. 通過鏈式法則,我們可以得到結果$\frac{\partial ^2z}{ \partial x \partial y}=-4xyf^{''}_{11}+2(x^2-y^2)e^{xy}f^{''}_{12}+xye^{2xy}f{''}_{22}+e^{xy}(1+xy)f^{'}_2$ 對於式子中的$f^{''}_{11}、f^{''}_{12}$的出現,我們可以聯想到矩陣的下標,由此引發我們對該式子簡化形式甚至該類題型通解的思考。 --- ### 梯度矩陣 我們定義[[1]](#refer-anchor-1),對於一個函式$f: ℝ^n\rightarrow ℝ ,\pmb{x} \rightarrow f(\pmb{x}),\pmb{x}\in ℝ^n$,即,$\pmb{x}=[x_1,x_2,x_3,...,x_n]^T$,偏導數為: $$ \frac{\partial f}{\partial x_1}= \lim_{h \rightarrow 0} f\frac{(x_1+h,x_2,...,x_n)-f(\pmb{x})}{h}\\.\\.\\.\\\frac{\partial f}{\partial x_n}= \lim_{h \rightarrow 0}\frac{f(x_1,x_2,...,x_n+h)-f(\pmb{x})}{h} \tag{2.1} $$ 我們寫作行向量的形式,記作: $$ ∇_{\pmb{x}}f=grad\ f=\left[\begin{matrix}\frac{\partial f(\pmb{x})}{\partial x_1} & \frac{\partial f(\pmb{x})}{\partial x_1} & ... & \frac{\partial f(\pmb{x})}{\partial x_n}\\\end{matrix} \right] \inℝ^n \tag{2.2} $$ 例如,對於函式$f(x,y)=(x+2y^3)^2$,我們有: $$ ∇f=\left[\begin{matrix}2(x+2y^3) & 12(x+2y^3)y^2\end{matrix} \right] \inℝ^{1×2} \tag{2.3} $$ 為了探求文章開始所提出問題通解形式的探討,繞不開的一個重要步驟是對梯度矩陣$∇f$進行求導,我們將在推導的過程中單獨進行分析。 --- ### 多元複合函式的二階導數與黑塞矩陣 設$z=f(u_1,u_2,...,u_n),$其中$u_i=g_i(x_i)$,求$\frac{\partial ^2z}{ \partial x_i \partial x_j}$. $$ \frac{\partial z}{ \partial x_i}=\frac{\partial z}{ \partial \pmb{u} }·\frac{\partial \pmb{u}}{ \partial x_i} =\left[\begin{matrix}\frac{\partial f}{\partial u_1} & \frac{\partial f}{\partial u_2} & ... & \frac{\partial f}{\partial u_n}\end{matrix} \right] \left[\begin{matrix}\frac{\partial u_1}{\partial x_i} \\ \frac{\partial u_2}{\partial x_i} \\ ... \\ \frac{\partial u_n}{\partial x_i}\end{matrix} \right] \tag{3.1} $$ 為了簡化形式,我們令: $$ \pmb{X_i}=\left[\begin{matrix}\frac{\partial u_1}{\partial x_i} & \frac{\partial u_2}{\partial x_i} & ... & \frac{\partial u_n}{\partial x_i}\end{matrix} \right]^T \tag{3.2} $$ 那麼: $$ \frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \tag{3.3} $$ 接下來,我們需要求解 $$ \frac{\partial {}}{ \partial x_i}(∇_{\pmb{u}}f·\pmb{X_i} \tag{3.4}) $$ $$ \frac{\partial {}}{ \partial x_j}(∇_{\pmb{u}}f·\pmb{X_i} )=\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i} + ∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i} \tag{3.5} $$ $\frac{\partial {}}{ \partial x_j}\pmb{X_i}$的答案容易得到的,我們著重於討論$\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f·\pmb{X_i}$,尤其是$\frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f$的結果。 經分析: $$ \frac{\partial {}}{ \partial x_j}∇_{\pmb{u}}f=\frac{\partial {}}{ \partial \pmb{u}^T}·\frac{\partial {\pmb{u}^T}}{ \partial x_j}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f \tag{3.6} $$ 問題被簡化轉化為解決向量$(∇_{\pmb{u}}f)$對向量$(\pmb{u}^T)$求導的問題。 我們對這個運算進行進一步分析,這個運算的實質是梯度矩陣中的元素逐個對$u_i$分別求導,結果顯然是一個$2×2$的方陣,而這個矩陣在數學上被定義為 **黑塞矩陣(Hessian Matrix)**,記作$H(f)$,它的具體形式是: $$ A= \left[\begin{matrix} \frac{\partial^2 f}{\partial x_1\partial x_1} & \frac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial x_2\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \frac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n} \end{matrix} \right]\tag{3.7} $$ 其規律是顯而易見的。 於是,引入$H(f)$後,我們可以繼續化簡: $$ \frac{\partial {\pmb{u}^T}}{ \partial x_j}·\frac{\partial {}}{ \partial \pmb{u}^T}·∇_{\pmb{u}}f=\frac{\partial {\pmb{u}^T}}{ \partial x_j}·\left[\begin{matrix} \frac{\partial^2 f}{\partial u_1\partial u_1} & \frac{\partial^2 f}{\partial u_1\partial u_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial u_n}\\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial u_2\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_2\partial u_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial u_n\partial u_1} & \frac{\partial^2 f}{\partial u_n\partial u_2} & \cdots & \frac{\partial^2 f}{\partial u_1\partial u_n} \end{matrix} \right]=\pmb{X_j}^T·H_{\pmb{u}}(f)\tag{3.7} $$ 所以 $$ \frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\frac{\partial {}}{ \partial x_j}\pmb{X_i}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{3.8} $$ 其中 $$ \pmb{X_{ij}}=\left[\begin{matrix}\frac{\partial^2 u_1}{\partial x_i\partial x_j} & \frac{\partial^2 u_2}{\partial x_i\partial x_j} & ... & \frac{\partial^2 u_n}{\partial x_i\partial x_j}\end{matrix} \right]^T\tag{3.9} $$ 當然在實際計算過程中,由於$\pmb{X_i}$的值已經被計算,所以直接計算$\frac{\partial {}}{ \partial x_j}\pmb{X_i}$或許更為便捷。 --- ### 總結 設$z=f(u_1,u_2,...,u_n),$其中$u_i=g_i(x_i)$,求$\frac{\partial ^2z}{ \partial x_i \partial x_j}$. $$ \frac{\partial z}{ \partial x_i}=∇_{\pmb{u}}f·\pmb{X_i} \\\frac{\partial ^2z}{ \partial x_i \partial x_j}=\pmb{X_j}^T·H_{\pmb{u}}(f)·\pmb{X_i}+∇_{\pmb{u}}f·\pmb{X_{ij}}\tag{end} $$ --- ### 參考 - [1] 《MATHEMATICS FOR MACHINE LEARNING》(Marc Peter Deisenroth,A. Aldo Faisal ,Cheng Soo