1. 程式人生 > >coursera 斯坦福 Andrew Ng 機器學習_程式設計作業

coursera 斯坦福 Andrew Ng 機器學習_程式設計作業

一. 第二週程式設計作業: Linear Regression

1.computeCost.m

  • 公式:

J_{\left ( \theta \right )} = \frac{1}{2m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )^2

h_\theta\left ( x \right ) = \Theta^Tx

  • 程式碼:
%計算成本J
tmp = (X * theta - y) .^ 2;
J = 1 / (2 * m) * sum(tmp);

2.gradientDescent.m

  • 公式:

J_\theta = J_\theta - \alpha \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )}

  • 程式碼:
%計算梯度下降
tmp = X' * (X * theta - y);
theta = theta - alpha * 1/m * tmp;

3.featureNormalize.m

  • 程式碼:
%特徵歸一化:(X - 均值)/標準差
mu = mean(X); %均值
sigma = std(X); %標準差
tmp = bsxfun(@minus, X, mu);
X_norm = bsxfun(@rdivide, tmp, sigma);

4.computeCostMulti.m

注:應用矩陣計算同computeCost.m

5.gradientDescentMulti.m

注:應用矩陣計算同gradientDescent.m

6.normalEqn.m

  • 公式:

\Theta = \left ( X^TX \right )^{-1}X^T\vec{y}

  • 程式碼:
% 用正規方程計算theta
theta = pinv(X' * X) * X' * y;

二.第三週程式設計作業: Logistic Regression

1.sigmoid.m

  • 公式:

g\left ( z \right ) = \frac{1}{1 + e ^ {-z}}

  • 程式碼:
%sigmoid函式
g = 1 ./ (1 + e .^ (-z));

2.costFunction.m

  • 公式:

J_{\left ( \theta \right )} = \frac{1}{m}\sum_{i=1}^m\left [-y^{\left ( i \right )}log\left (h_\theta\left ( x^{\left ( i \right )} \right ) \right ) - \left ( 1- y^{\left ( i \right )} \right ) log\left (1-h_\theta\left ( x^{\left ( i \right )} \right ) \right ) \right ]

\frac{\partial J\left ( \theta \right )}{\partial \theta_j } = \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )}

  • 程式碼:
%成本J
tmp = sigmoid(X * theta);
J = 1 / m * sum(-y' * log(tmp) - (1 - y)' * log(1 - tmp));

%梯度grad
tmp = X' * (sigmoid(X * theta) - y);
grad = (1 / m) * tmp;

3.predict.m

  • 程式碼:
%將大於等於0.5的預測值置為1
p(find(sigmoid(X * theta) >= 0.5)) = 1;

4.costFunctionReg.m

  • 公式:

J_{\left ( \theta \right )} = \frac{1}{m}\sum_{i=1}^m\left [-y^{\left ( i \right )}log\left (h_\theta\left ( x^{\left ( i \right )} \right ) \right ) - \left ( 1- y^{\left ( i \right )} \right ) log\left (1-h_\theta\left ( x^{\left ( i \right )} \right ) \right ) \right ] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2

for j = 0:      \frac{\partial J\left ( \theta \right )}{\partial \theta_j } = \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )}   

for j >= 1:     \frac{\partial J\left ( \theta \right )}{\partial \theta_j } = \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )} + \frac{\lambda}{m} \theta_j

  • 程式碼:
tmp = sigmoid(X * theta);
%成本J
J = 1 / m * sum(-y' * log(tmp) - (1 - y)' * log(1 - tmp)) + ...
    lambda/(2 * m) * sum(theta([2:length(theta)], :) .^ 2);

%梯度grad;第1項不懲罰
grad = (1 / m) * X' * (tmp - y) + lambda * theta / m;
grad(1) = (1 / m) * X(:, 1)' * (tmp - y); 

三.第四周程式設計作業: Multi-class Classification and Neural Networks

1.lrCostFunction.m

  • 公式:

J_{\left ( \theta \right )} = \frac{1}{m}\sum_{i=1}^m\left [-y^{\left ( i \right )}log\left (h_\theta\left ( x^{\left ( i \right )} \right ) \right ) - \left ( 1- y^{\left ( i \right )} \right ) log\left (1-h_\theta\left ( x^{\left ( i \right )} \right ) \right ) \right ] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2

for j = 0:   

  \frac{\partial J\left ( \theta \right )}{\partial \theta_j } = \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )}   

for j >= 1:   

 \frac{\partial J\left ( \theta \right )}{\partial \theta_j } = \frac{1}{m}\sum_{i=1}^m\left ( h_\theta\left ( x^{\left ( i \right )} \right ) - y^{\left ( i \right )}\right )x_j^{\left ( i \right )} + \frac{\lambda}{m} \theta_j

  • 程式碼:
tmp = sigmoid(X * theta);
%成本J
J = 1 / m * sum(-y' * log(tmp) - (1 - y)' * log(1 - tmp)) + ...
    lambda/(2 * m) * sum(theta([2:length(theta)], :) .^ 2);

%梯度grad;第1項不懲罰
grad = (1 / m) * X' * (tmp - y) + lambda * theta / m;
grad(1) = (1 / m) * X(:, 1)' * (tmp - y); 

2.oneVsAll.m

  • 程式碼:
%應用fmincg函式
initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
J = 0;

for c = 1:num_labels
  [theta] = ...
	fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
	initial_theta, options);
  all_theta(c,:) = theta';
end

3.predictOneVsAll.m

  • 程式碼:
for r = 1:m,
  [maxnum p(r)] = max(sigmoid(X(r,:) * all_theta'), [], 2);
end

4.predict.m

  • 程式碼:
X = [ones(m, 1) X];

for r = 1:m,
  tmp1 = sigmoid(X(r,:) * Theta1');
  tmp1 = [ones(1,1) tmp1];
  tmp2 = sigmoid(tmp1 * Theta2');
  [maxnum p(r)] = max(tmp2, [], 2);
end

四.第五週程式設計作業: Neural Network Learning

1.nnCostFunction.m

  • 公式:

成本J:

J_{\left ( \theta \right )} = \frac{1}{m}\sum_{i=1}^m\sum_{i=1}^k\left [-y_k^{\left ( i \right )}log\left (\left (h_\theta\left ( x^{\left ( i \right )} \right ) \right )_k \right ) - \left ( 1- y_k^{\left ( i \right )} \right ) log\left (1-\left (h_\theta\left ( x^{\left ( i \right )} \right ) \right )_k \right ) \right ] + \frac{\lambda}{2m}\bigg [ \sum_{j=1}^{n1}\sum_{k=1}^{K1} \left ( \Theta_{j,k}^{\left ( 1 \right )} \right )^2 + \sum_{j=1}^{n2}\sum_{k=1}^{K2} \left ( \Theta_{j,k}^{\left ( 2 \right )} \right )^2 \bigg ]

反向傳播:

\delta_k^{\left ( 3 \right )} = \left ( a_k^{\left ( 3 \right )} - y_k \right )

\delta_k^{\left ( 2 \right )} = \left ( \Theta^\left ( 2 \right ) \right )^T\delta^{\left ( 3 \right )} .* g'\left ( z^\left ( 2 \right ) \right )

梯度:

\Delta^{\left ( l \right )} = \Delta^{\left ( l \right )} + \delta^{\left ( l + 1 \right )}\left ( a^{\left ( l \right )} \right )^T

for j = 0:   

  \frac{\partial}{\partial \Theta_{i,j}^{\left ( l \right )} }J\left ( \Theta \right ) = D_{i,j}^{\left ( l \right )} = \frac{1}{m}\Delta_{i,j}^{\left ( l \right )}   

for j >= 1:   

 \frac{\partial}{\partial \Theta_{i,j}^{\left ( l \right )} }J\left ( \Theta \right ) = D_{i,j}^{\left ( l \right )} = \frac{1}{m}\Delta_{i,j}^{\left ( l \right )} + \frac{\lambda}{m} \Theta_{i,j}^{\left ( l \right )}

  • 程式碼:
%初始化引數
X = [ones(m, 1) X];	%加x0 = 1
y = eye(num_labels, num_labels)(y, :);	%把y值轉換為向量num_labels X m

%計算a(2)
z2 = X * Theta1';
a2 = [ones(m, 1) sigmoid(z2)];

%計算a(3)
a3 = sigmoid(a2 * Theta2');

%成本J
tmp1 = sum(sum(-y .* log(a3) - (1 - y) .* log(1 - a3)));
tmp2 = sum(sum(Theta1(:,2 : end) .^ 2)) + ...
	sum(sum(Theta2(:,2 : end) .^ 2));  	%正則化,第一個Theta不懲罰
J = 1 / m * tmp1 + lambda / (2 * m) * tmp2;

%反向傳播,梯度grad
delta3 = (a3 - y);
delta2 = delta3 * Theta2(:, 2 : end) .* sigmoidGradient(z2);

Theta1_grad = 1 / m * (delta2' * X);
Theta1_grad(:,2 : end) = Theta1_grad(:,2 : end) + (lambda / m * Theta1(:,2 : end));
Theta2_grad = 1 / m * (delta3' * a2);
Theta2_grad(:,2 : end) = Theta2_grad(:,2 : end) + (lambda / m * Theta2(:,2 : end));

2.sigmoidGradient.m

  • 公式:

g^\prime\left ( z \right ) = \frac{\partial}{\partial z}g\left ( z \right )=g\left ( z \right )\left ( 1- g\left ( z \right ) \right )

  • 程式碼:
g = sigmoid(z) .* (1 - sigmoid(z));