非線性最小二乘問題的求解方法

阿新 • • 發佈：2019-07-25

1. 非線性最小二乘問題的定義
2. 最速下降法
3. 牛頓法
4. 高斯牛頓法（Gauss Newton）
5. 列文伯格-馬爾夸特法 (Levenberg-Marquardt)

1. 非線性最小二乘問題的定義

對於形如（1）的函式，希望尋找一個區域性最優的$x^*$，使得$F(x)$達到區域性極小值$F(x^*)$ 。
\[ \begin{equation} F(\mathbf{x})=\frac{1}{2} \sum_{i=1}^{m}\left(f_{i}(\mathbf{x})\right)^{2} \end{equation} \]

其中，$f_{i} : \mathbf{R}^{n} \mapsto \mathbf{R}, i=1, \ldots, m$，即 $x \in R^n,f_i(x)\in R$。

區域性極小值：存在$\delta>0$，對任意滿足$\left\|\mathbf{x}-\mathbf{x}^{*}\right\|<\delta$ 的$x$，都有$F\left(\mathbf{x}^{*}\right) \leq F(\mathbf{x})$。

這裡舉三個簡單的例子：

$x\in R,m=1,f_1(x)=x+1$，則$F(x)=\frac{1} {2}（x+1）^2$，區域性極小值在$x^*=-1$處取得。

$x\in R , m=2,f_1(x)=x+1,f_2(x)=exp(3x^2+2x+1)$，則$F(x)=\frac{1} {2}（x+1）^2+exp(3x^2+2x+1)$，此時就不容易計算出區域性最優$x^*$的位置了。
$x\in R^3, m=1,f_1(x)=x^Tx$，則$F(x)=\frac {1} {2}(x^Tx)^2$

事實上，$f_i$也可以將$x$對映到$R^m$空間中，因為$f_i(x)^2=f_i(x)^Tf_i(x) \in R$，最終計算出來的值總是一個實數。對於簡單的最小二乘問題，如1，可以用求導取極值的方法得到區域性極小值的位置，然而複雜的、高維的，如2和3，就只能採取一些迭代的策略來求解區域性極小值了。

注意，是區域性極小值而非全域性最小值！

2. 最速下降法

假設函式（1）是可導並且光滑的，則可以對函式（1）在$x$處進行二階泰勒展開為（2）式
\[ \begin{equation} F(\mathbf{x}+\Delta \mathbf{x})=F(\mathbf{x})+\Delta \mathbf{x}^T \mathbf{J} +\frac{1}{2} \Delta \mathbf{x}^{\top} \mathbf{H} \Delta \mathbf{x}+O\left(\|\Delta \mathbf{x}\|^{3}\right) \end{equation} \]
其中 $J=\left[\begin{array}{c}{\frac{\partial J(\mathbf{x})}{\partial x_{1}}} \\ {\vdots} \\ {\frac{\partial J(\mathbf{x})}{\partial x_{n}}}\end{array}\right]$，$H=\left[\begin{array}{ccc}{\frac{\partial^{2} H(\mathbf{x})}{\partial x_{1} \partial x_{1}}} & {\cdots} & {\frac{\partial^{2} H(\mathbf{x})}{\partial x_{1} \partial x_{n}}} \\ {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial^{2} H(\mathbf{x})}{\partial x_{n} \partial x_{1}}} & {\cdots} & {\frac{\partial^{2} H(\mathbf{x})}{\partial x_{n} \partial x_{n}}}\end{array}\right]$，$J$和$H$分別$F$對變數$x$的一階導和二階導。

忽略公式（2）二階以上的導數，可以將其寫成二次多項式的形式
\[ \begin{equation} F(\mathbf{x}+\Delta \mathbf{x}) \approx F (\mathbf{x})+\Delta \mathbf{x}^T \mathbf{J} +\frac{1}{2} \Delta \mathbf{x}^{\top} \mathbf{H} \Delta \mathbf{x} \end{equation} \]
當$x=x^* $時，我們可以得出一些結論

$J= \mathbf{0}$，我們稱此點為駐點。（原因，假設其不為0，則必定存在使$F(x)$下降的$\Delta x$）
$H$為對稱矩陣，且$H$為正定矩陣。（原因同樣是保證不會存在使$F(x)$下降的$\Delta x$）

那麼，如何尋找$\Delta x$使得$F(x+\Delta x)$保持下降呢？最速下降法給出了一個解決方法，首先是忽略掉一階以上的導數，則公式（3）可以重寫為
\[ \begin{equation} F(\mathbf{x}+\Delta \mathbf{x}) \approx F (\mathbf{x})+\Delta \mathbf{x}^T \mathbf{J} \end{equation} \]
而
\[ \begin{equation} \frac{\partial F(\mathbf{x}+\Delta \mathbf{x})}{\partial \Delta \mathbf{x} } \approx \mathbf{J} \end{equation} \]
最速下降法的迭代更新方式為
\[ \begin{equation} \Delta x = \lambda J \end{equation} \]
則
\[ \begin{equation} F(\mathbf{x}+\Delta \mathbf{x}) \approx F (\mathbf{x})-\lambda \mathbf{J}^T \mathbf{J} \end{equation} \]
此方法最速體現在負梯度方向是區域性下降最快的梯度方向。

3. 牛頓法

牛頓法利用了（3）式的二階導數，收斂速度為二階收斂，比最速下降法的一階收斂要快。

對（3）式的$\Delta x$求導可得
\[ \begin{equation} \frac{\partial F(\mathbf{x}+\Delta \mathbf{x})}{\partial \Delta \mathbf{x} } \approx \mathbf{J} + \mathbf{H}\Delta \mathbf{x} \end{equation} \]
令上式等於0，即可計算出每次迭代的$\Delta \mathbf{x}$步長
\[ \begin{equation} \Delta \mathbf{x} = -\mathbf{H}^{-1}\mathbf{J} \end{equation} \]
然而，$\mathbf{H}$可能不存在逆矩陣，因此可以加上阻尼因子$\mu>0$
\[ \begin{equation} \Delta \mathbf{x} = -(\mathbf{H}+\mu \mathbf{I})^{-1}\mathbf{J} \end{equation} \]
此法保證$\mathbf{H}+\mu \mathbf{I}$一定可逆，同時阻尼因子是對$\Delta x$的大小做了限制。因為最優化的式子變成了
\[ \begin{equation} \mathop {\min }\limits_{\Delta x} F(\mathbf{x}+\Delta \mathbf{x})+\frac{1}{2}\mu \Delta \mathbf{x}^{\top} \Delta \mathbf{x} \approx F (\mathbf{x})+\Delta \mathbf{x}^T \mathbf{J} +\frac{1}{2} \Delta \mathbf{x}^{\top} \mathbf{H} \Delta \mathbf{x} + \frac{1}{2}\mu \Delta \mathbf{x}^{\top} \Delta \mathbf{x} \end{equation} \]
對$\Delta x$求導很容易得到（10）式的結論。

當然，阻尼高斯方法也存在一定的缺陷:$\mathbf{H}$矩陣不好計算，可能會花費更多的時間。

4. 高斯牛頓法（Gauss Newton）

前面提到的方法沒有利用到最小二乘問題的形式，我想此方法之所以前面有高斯二字就是因為最小二乘形式的應用吧。

為了使公式看起來儘量簡潔，我們將公式（1）寫成矩陣形式
\[ \begin{equation} F({x})=\frac{1}{2}f(x)^2 \end{equation} \]
其中 $\mathbf{f}(\mathbf{x})=\left[\begin{array}{c}{f_{1}(\mathbf{x})} \\ {\vdots} \\ {f_{m}(\mathbf{x})}\end{array}\right]$，這也是我此前說$f_i$也可以將$x$對映到$R^m$空間中的原因，這種堆疊方式其實與前述的對映在原理上是一致的。

記
\[ \begin{equation} J_{i}(\mathbf{x})=\frac{\partial f_{i}(\mathbf{x})}{\partial \mathbf{x}}=\left[\frac{\partial f_{i}(\mathbf{x})}{\partial x_{1}} \cdots \frac{\partial f_{i}(\mathbf{x})}{\partial x_{n}}\right] \end{equation} \]

\[ \begin{equation} J(\mathbf{x})=\left[\begin{array}{cccc}{\frac{\partial f_{1}(\mathbf{x})}{\partial \mathbf{x}}} & {\cdots} & {\frac{\partial f_{m}(\mathbf{x})}{\partial \mathbf{x}}}\end{array}\right]=\left[\begin{array}{ccc}{\frac{\partial f_{1}(\mathbf{x})}{\partial x_{1}}} & {\cdots} & {\frac{\partial f_{m}(\mathbf{x})}{\partial x_{n}}} \\ {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial f_{1}(\mathbf{x})}{\partial x_{1}}} & {\cdots} & {\frac{\partial f_{m}(\mathbf{x})}{\partial x_{n}}}\end{array}\right] \end{equation} \]

則$J_i(\mathbf{x})$是一個 $ 1 \times n $ 向量，而$J(x)$是一個 $ n \times n $ 矩陣。

將$f(x)$進行一階泰勒展開
\[ \begin{equation} l(\Delta \mathbf{x}) =f(\mathbf{x})+\mathbf{J}\Delta \mathbf{x}\approx f(x+\Delta \mathbf{x}) \end{equation} \]
則
\[ \begin{equation} F(x) \approx L(\Delta \mathbf{x}) = \frac{1}{2} l(\Delta \mathbf{x})^Tl(\Delta \mathbf{x}) =\frac{1}{2} (f(\mathbf{x})+\mathbf{J}\Delta \mathbf{x})^T(f(\mathbf{x})+\mathbf{J}\Delta \mathbf{x}) \\ =\frac{1}{2}f(x)^2 + f(x)^TJ \Delta x + \Delta x^T J^TJ\Delta \mathbf{x} \end{equation} \]
對$L(\Delta x)$求導並令其為0，則
\[ \begin{equation} \frac{\partial L(\Delta \mathbf{x})}{\partial \Delta \mathbf{x}} = f(x)^TJ + \Delta x^T J^TJ \\ = J^Tf(x) + J^TJ\Delta x = 0 \end{equation} \]
可得
\[ \begin{equation} \Delta x = -（J^TJ)^{-1}J^Tf(x) \end{equation} \]

高斯牛頓法有效的避免了Hessian矩陣的計算，可以加快運算速度。

5. 列文伯格-馬爾夸特法 (Levenberg-Marquardt)

相當於新增阻尼因子，目的是使步長不要太大，起到限制步長的作用。
\[ \begin{equation} \mathop {\min }\limits_{\Delta x} {F(\mathbf{x}+\Delta \mathbf{x})+\frac{1}{2}\mu \Delta \mathbf{x}^{\top} \Delta \mathbf{x}} \approx \frac{1}{2}f(x)^2 + f(x)^TJ \Delta x + \Delta x^T J^TJ\Delta \mathbf{x} + \frac{1}{2}\mu \Delta \mathbf{x}^{\top} \Delta \mathbf{x} \end{equation} \]
對$\Delta x$求導之後得到
\[ \begin{equation} f(x)^TJ +J^TJ\Delta x+\mu \Delta x = 0 \end{equation} \]
\[ \begin{equation} \\ \Delta x = -(J^TJ+\mu I)^{-1}f(x)^TJ \end{equation} \]

當然，這裡的步長都有一定的更新策略，而基本的方法就是上面這些內容。

LM方法在高斯牛頓方法的基礎上添加了正則化約束，能夠權衡步長與非線性程度之間的利弊。當迭代點附近的非線性程度比較高時，傾向於增大阻尼因子而減小步長，此時接近最速下降方法；而當系統近乎線性時，減小阻尼因子而增大步長，此時接近高斯牛頓方法。同時阻尼因子的引入也保證了$(J^TJ+\mu I)^{-1}$有意義。

參考文獻

[1] methods for non-linear least squares problems.

轉載請獲得作者同

非線性最小二乘問題的求解方法

1. 非線性最小二乘問題的定義

2. 最速下降法

3. 牛頓法

4. 高斯牛頓法（Gauss Newton）

5. 列文伯格-馬爾夸特法 (Levenberg-Marquardt)

非線性最小二乘問題的求解方法

演算法解析：L-M法求解非線性最小二乘問題

Levmar:Levenberg-Marquardt非線性最小二乘算法

非線性最小二乘之Guass-Newton和Levenberg-Marquardt

關於Matlab中的線性與非線性最小二乘擬合

雅克比矩陣、海森矩陣與非線性最小二乘間的關係與在SFM和Pose Estimation中的應用

線性最小二乘兩種方法

視覺SLAM常見的QR分解SVD分解等矩陣分解方式求解滿秩和虧秩最小二乘問題

求解線性最小二乘系統

牛頓法求解最小二乘問題（線性迴歸）

迭代求解最優化問題——最小二乘問題、高斯牛頓法

最小二乘迴歸樹Python實現——統計學習方法第五章課後題

求解非線性最小二乘法 Eigen

SVD(奇異值分解)及求解最小二乘問題

OpenCV 最小二乘擬合方法求取直線傾角

Matlab中Savitzky-Golay filtering（最小二乘平滑濾波）函式sgolayfilt的使用方法

機器學習---用python實現最小二乘線性回歸並用隨機梯度下降法求解（Machine Learning Least Squares Linear Regression Application SGD）

非線性函式的最小二乘擬合及在Jupyter notebook中輸入公式 [原創]

python 最小二乘 leastsq 函數實現

Spark機器學習(10)：ALS交替最小二乘算法

非線性最小二乘問題的求解方法

1. 非線性最小二乘問題的定義

2. 最速下降法

3. 牛頓法

4. 高斯牛頓法（Gauss Newton）

5. 列文伯格-馬爾夸特法 (Levenberg-Marquardt)

相關推薦