Machine Learning--week1 監督學習、預測函式、代價函式以及梯度下降演算法

阿新 • • 發佈：2018-12-30

Supervised Learning
- given labelled data to train and used to predict
- for regression problem and classification problem
Unsupervised Learning
- derive structure from data where we don't necessarily know the effect of the variables
- no feedback based on the prediction results
- Clustering Algorism is just one type of Unsupervised Learning
- Cocktail Party Algorithm is non-clustering
差別在於：是否有監督（supervised），就看輸入資料是否有標籤（label）。輸入資料有標籤，則為有監督學習，沒標籤則為無監督學習。

分類和迴歸的區別在於輸出變數的型別。

定量輸出稱為迴歸，或者說是連續變數預測；
定性輸出稱為分類，或者說是離散變數預測。

舉個例子：

預測明天的氣溫是多少度，這是一個迴歸任務；
預測明天是陰、晴還是雨，就是一個分類任務。

訓練出來的預測函式名通常取 h（== hypothesis(假設)），

how to represent h？ For example: $h_{\theta}(x) = \theta _{0} + \theta _{1}x$ , $\theta_{i}$ are the parameters of the model

調整$\theta_{i} $ $s.t.$ $\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^{2}$ 儘可能小，其中M是訓練集的樣本容量。為了儘量減少平均誤差，該求和式可以寫為$\frac{1}{2m}\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^{2}$

$J(\theta_{0}, \theta_{1}) = \frac{1}{2m}\sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^{2}$ 就是所謂cost function(代價函式)，又稱squared error function(平方誤差函式)

整理一下：

Hypothesis:
\[ h_{\theta}(x) = \theta _{0} + \theta _{1}x \]
Parameters:
\[ \theta_{0},\theta_{1} \]
Cost Function:
\[ J(\theta_{0}, \theta_{1}) = \frac{1}{2m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^{2} \]
Goal:
\[ \underbrace{\rm minimize}_{\theta_{0},\theta_{1}}\, J(\theta_{0},\theta_{1}) \]

用Gradient Descent Algorithm（梯度下降演算法）去minimize the cost function

Gradient Descent Algorithm(虛擬碼)：
\[ \text{repeat until convergence}\{\\ \qquad\qquad\qquad\qquad\qquad\qquad \theta_{j}\; \text{:= } \theta_{j} - \alpha\frac{\partial}{\partial \theta_{j}}J(\theta_{0},\theta_{1}) \qquad \text{(j = 0,1,2...)}\\ \}\qquad\qquad\qquad\qquad\qquad\; \]

其中$\alpha$是一個數字，被稱為learning rate（學習速率），用於控制梯度下降的速率：

$\alpha$ is too small：gradient descent algorism can be too slow
$\alpha$ is too larger：gradient descent can overshoot the minimum and may even fail to converge or even may diverge

解釋一下這個演算法：

當$\theta_{j}\uparrow$會導致$J(\theta_0,\theta_1)\uparrow$時，偏導數$>0$，於是表示式的效果就是$\theta_j\downarrow$；而$\theta_{j}\uparrow$導致$J(\theta_0,\theta_1)\downarrow$時，偏導數$<0$,於是 $\theta_j\uparrow$

這樣$\theta_j$就會逐漸向梯度為0的地方滑落

另外即使$\alpha$是一定的，在gradient descent 的過程中，$\theta_{j}$變化的幅度也是越來越小的，因為其偏導數趨向於0，所以沒必要在靠近區域性最優點時再額外減少$\alpha$

有一點需要注意：所有的$\theta_{i}$需要同時更新，因此我們不能直接用$ \theta_{0}; \text{:=} \theta_{0} - \alpha\frac{\partial}{\partial \theta_{0}}J(\theta_{0},\theta_{1})$這樣的表示式，而應該寫成：

$$
temp_0\text{ := } \theta_0 - \alpha\frac{\partial}{\partial \theta_{0}}J(\theta_{0},\theta_{1})\

temp_1\text{ := } \theta_1 - \alpha\frac{\partial}{\partial \theta_{1}}J(\theta_{0},\theta_{1})\

\theta_{0}; \text{:= } temp_0\qquad\qquad\qquad\qquad\quad\

\theta_{1}; \text{:= } temp_1\qquad\qquad\qquad\qquad\quad
$$

這樣才能避免第一和第二條式子中的$J(\theta_0,\theta_1)$不一致

gradient descent algorism 要求所有$\theta_{i}$同步更新

注：由不同的起點可能得到不同的區域性最小點

將$J(\theta_0,\theta_1)$的表示式代入Gradient Descent Algorithm中時，可得到（虛擬碼）
\[ \text{repeat until convergence}\{\\ \qquad\qquad\qquad\qquad\qquad \theta_{0}\; \text{:= } \theta_{0} - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})\\ \qquad\qquad\qquad\qquad\qquad\qquad \theta_{1}\; \text{:= } \theta_{1} - \alpha \frac{1}{m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})\sdot x^{(i)}\\ \}\qquad\qquad\qquad\qquad\qquad\; \]
這種形式稱為“Batch” Gradient Descent Algorithm：

“Batch”：Each step of gradient descent uses all the training examples

Machine Learning--week1 監督學習、預測函式、代價函式以及梯度下降演算法

Machine Learning--week1 監督學習、預測函式、代價函式以及梯度下降演算法

斯坦福CS229機器學習課程筆記一：線性迴歸與梯度下降演算法

機器學習3 邏輯斯提回歸和梯度下降演算法

斯坦福大學機器學習筆記——多變數的線性迴歸以及梯度下降法注意事項（內有程式碼）

機器學習(Machine Learning)與深度學習(Deep Learning)資料之文章、部落格

Machine Learning（機械學習）

周志華《Machine Learning》強化學習

Machine Learning（機器學習）之三

Machine Learning第七講SVM --（二）核函式

機器學習基石(Machine Learning Foundations) 機器學習基石作業四課後習題解答

機器學習基石(Machine Learning Foundations) 機器學習基石課後習題連結彙總

Semi-supervised Learning ;半監督學習

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 17—Large Scale Machine Learning 大規模機器學習

機器學習基石(Machine Learning Foundations) 機器學習基石作業三 Q13-15 C++實現

機器學習基石(Machine Learning Foundations) 機器學習基石手寫版筆記大全

機器學習---迴歸預測---向量、矩陣求導

【Machine Learning】機器學習及其基礎概念簡介

機器學習基石(Machine Learning Foundations) 機器學習基石作業四 Q13-20 MATLAB實現

機器學習基石(Machine Learning Foundations) 機器學習基石作業三 Q18-20 C++實現

機器學習(Machine Learning)與深度學習(Deep Learning)資料之相關論文和會議報告

Machine Learning--week1 監督學習、預測函式、代價函式以及梯度下降演算法

相關推薦