優達學城-深度學習筆記(一)
優達學城-深度學習筆記(一)
標籤: 機器學習
一. 神經網路簡介
1.最大似然概率
將可能分類正確的概率相乘,將全部分類正確的概率做比較,最大的即為最優的
2.交叉熵(Cross entropy)
由於很多數相乘值會非常小,於是採用-ln進行相加,更小的交叉熵更優
2.1交叉熵程式碼實現
def cross_entropy(Y, P):
Y=np.float_(Y)
P=np.float_(P)
ans=-np.sum(Y*np.log(P)+(1-Y)*np.log(1-P))
return ans
2.2多類別交叉熵
3.對數機率迴歸的誤差函式(cost function)
goal:最小化誤差函式
4.梯度下降程式碼
隨機初始化一個權重
w1...,wn,b 對於每一個分類點(
x1,...xn )
2.1 For i=1…n
2.1.1. 更新wi=wi−α(y−y′)xi
2.1.2 更新b=b−α(y−y′) - 重複步驟2直到誤差最小
# Implement the following functions
# Activation (sigmoid) function
def sigmoid(x):
return 1/(1+np.exp(-x))
# Output (prediction) formula
def output_formula(features, weights, bias):
return sigmoid(np.dot(features, weights) + bias)
# Error (log-loss) formula
def error_formula(y, output):
return - y*np.log(output) - (1 - y) * np.log(1-output)
# Gradient descent step
def update_weights(x, y, weights, bias, learnrate):
output = output_formula(x, weights, bias)
d_error = -(y - output)
weights -= learnrate * d_error * x
bias -= learnrate * d_error
return weights, bias
5.神經網路
當存在非線性資料時,例如需要用曲線進行劃分,則用神經網路
5.1 前向傳播
5.2 反向傳播
反向傳播包括:
2.1 進行前向反饋運算。
2.2 將模型的輸出與期望的輸出進行比較。
2.3 計算誤差。
2.4 向後執行前向反饋運算(反向傳播),將誤差分散到每個權重上。
2.5 更新權重,並獲得更好的模型。
2.6 繼續此流程,直到獲得很好的模型。
二.梯度下降的神經網路
1.梯度下降程式碼實現
# Defining the sigmoid function for activations
# 定義 sigmoid 啟用函式
def sigmoid(x):
return 1/(1+np.exp(-x))
# Derivative of the sigmoid function
# 啟用函式的導數
def sigmoid_prime(x):
return sigmoid(x) * (1 - sigmoid(x))
# Input data
# 輸入資料
x = np.array([0.1, 0.3])
# Target
# 目標
y = 0.2
# Input to output weights
# 輸入到輸出的權重
weights = np.array([-0.8, 0.5])
# The learning rate, eta in the weight step equation
# 權重更新的學習率
learnrate = 0.5
# the linear combination performed by the node (h in f(h) and f'(h))
# 輸入和權重的線性組合
h = x[0]*weights[0] + x[1]*weights[1]
# or h = np.dot(x, weights)
# The neural network output (y-hat)
# 神經網路輸出
nn_output = sigmoid(h)
# output error (y - y-hat)
# 輸出誤差
error = y - nn_output
# output gradient (f'(h))
# 輸出梯度
output_grad = sigmoid_prime(h)
# error term (lowercase delta)
error_term = error * output_grad
# Gradient descent step
# 梯度下降一步
del_w = [ learnrate * error_term * x[0],
learnrate * error_term * x[1]]
# or del_w = learnrate * error_term * x
2.反向傳播示例
先使用正向傳播計算輸入層到隱藏層節點:
h=∑iwixi=0.1∗0.4−0.2∗0.3=−0.02 計算隱藏節點的輸出
a=f(h)=sigmoid(−0.02)=0.495 將其作為輸出節點的輸入,該神經網路的輸出可表示為
y^=f(W∗a)=sigmoid(0.1∗0.495)=0.512 根據神經網路的輸出,用反向傳播更新各層的權重,sigmoid函式的倒數為
f′(W∗a)=f(W∗a)(1−f(W∗a)) ,輸出節點的誤差項可表示為
δo=(y−y^)f′(W∗a)=(1−0.512)∗0.512∗(1−0.512)=0.122 計算隱藏節點的誤差項
δhj=∑kWjkδokf′(hj)
因為只有一個隱藏節點
δh=Wδof′(h)=0.1∗0.122∗0.495∗(1−0.495)=0.003 計算梯度下降步長了。隱藏層-輸出層權重更新步長是學習速率乘以輸出節點誤差再乘以隱藏節點啟用值。
ΔW=αδoa=0.5∗0.122∗0.495=0.0302 輸入-隱藏層權重
wi 是學習速率乘以隱藏節點誤差再乘以輸入值。
Δwi=αδhxi=(0.5∗0.003∗0.1,0.5∗0.003∗0.3)=(0.00015,0.00045)
3.反向傳播程式碼實現
import numpy as np
from data_prep import features, targets, features_test, targets_test
np.random.seed(21)
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# Hyperparameters
n_hidden = 2 # number of hidden units
epochs = 900
learnrate = 0.005
n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
size=n_hidden)
for e in range(epochs):
del_w_input_hidden = np.zeros(weights_input_hidden.shape)
del_w_hidden_output = np.zeros(weights_hidden_output.shape)
for x, y in zip(features.values, targets):
## Forward pass ##
# TODO: Calculate the output
hidden_input = np.dot(x, weights_input_hidden)
hidden_output = sigmoid(hidden_input)
output = sigmoid(np.dot(hidden_output,
weights_hidden_output))
## Backward pass ##
# TODO: Calculate the network's prediction error
error = y - output
# TODO: Calculate error term for the output unit
output_error_term = error * output * (1 - output)
## propagate errors to hidden layer
# TODO: Calculate the hidden layer's contribution to the error
hidden_error = np.dot(output_error_term, weights_hidden_output)
# TODO: Calculate the error term for the hidden layer
hidden_error_term = hidden_error * hidden_output * (1 - hidden_output)
# TODO: Update the change in weights
del_w_hidden_output += output_error_term * hidden_output
del_w_input_hidden += hidden_error_term * x[:, None]
# TODO: Update weights
weights_input_hidden += learnrate * del_w_input_hidden / n_records
weights_hidden_output += learnrate * del_w_hidden_output / n_records
# Printing out the mean square error on the training set
if e % (epochs / 10) == 0:
hidden_output = sigmoid(np.dot(x, weights_input_hidden))
out = sigmoid(np.dot(hidden_output,
weights_hidden_output))
loss = np.mean((out - targets) ** 2)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss
# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
三.訓練神經網路
1.正則化
1.傾向於獲得稀疏向量