Andrew Ng’s Coursera Machine Leaning Coding Hw 1

Author: Yu-Shih Chen
December 21, 2018 4:17AM


Week 2 Coding Assignment


  1. Warm-up Exercise
  2. Plot Data
  3. Cost Function
  4. Gradient Descent

Warm-up Exercise

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix

A = [];
% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix 
%               In octave, we return values by defining which variables
%               represent the return values (at the top of the file)
%               and then set them accordingly. 

A = eye(5,5) 

% ===========================================


這個沒什麼好講的,就是做一個5x5的identity matrix,一行就完事了。

Plot Data

function plotData(x, y)
%PLOTDATA Plots the data points x and y into a new figure 
%   PLOTDATA(x,y) plots the data points and gives the figure axes labels of
%   population and profit.

figure; % open a new figure window

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the training data into a figure using the 
%               "figure" and "plot" commands. Set the axes labels using
%               the "xlabel" and "ylabel" commands. Assume the 
%               population and revenue data have been passed in
%               as the x and y arguments of this function.
% Hint: You can use the 'rx' option with plot to have the markers
%       appear as red crosses. Furthermore, you can make the
%       markers larger by using plot(..., 'rx', 'MarkerSize', 10);

data = load('ex1data1.txt');
X = data(:,1);
y = data(:,2);
m = size(X,1); % number of training sets
ylabel('Profit in %10,000s');
xlabel('Population of City in 10,000s');

% ============================================================


這裡就是extract我們需要的資料也就是X(features)和y(results)。Specifically,我們要通過一個城市的population(X)去預測profit for food truck(y)。 這個section只是把提供的資料庫給用xy圖表畫出來而已:

Compute Cost

這個section要寫出我們的J(cost function)也就是誤差公式:

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

h_x = X * theta;
J = sum((h_x - y).^2) / (2*m)

% =========================================================================

這裡注意好矩陣之間的關係就好。這裡X加了1列的 ‘1’(預設值,詳細看coursera的視訊教程)之後是 m x 2 (m = sample的總量),而theta被我們初始為theta = zeros(2, 1); 也就是2 x 1的matrix(全部為0)。所以h_x (預測值)就是X * theta,出來的是個m x 1 的vector,也就跟我們的y一樣(請參考linear algebra的矩陣乘法)。 之後再把h_x帶到我們的公式裡就好,簡單粗暴。

Gradient Descent



function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    h_x = X * theta; % m x 1 vector
    %Do not need to loop over theta since only 2
    temp1 = theta(1) - (alpha * sum(h_x - y) / m);
    temp2 = theta(2) - (alpha * sum((h_x - y).* X(:,2))/m);
    % Store in temp because we don't want to change theta value before using it.
    theta(1) = temp1;
    theta(2) = temp2;
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);




  • 需要將theta存進temp裡面,因為如果直接assign的話它執行theta2的時候就會使用一個跟theta1不一樣的theta(因為被更改theta1的時候改掉了)。
  • 如果theta的元素更多,那將會需要用for loop來給所有的theta做gradient descent(也就是第二種寫法)


function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    h_x = X * theta; % m x 1 vector
    temp = theta;
    % For loop to loop over elements in temp
    for i = 1:size(theta,1)
        temp(i) = theta(i) - (alpha * sum((h_x - y).* X(:,i))/m);
    % Store in temp because we don't want to change theta value before using it.
    theta = temp;
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);



這裡是將theta存進一個temp的矩陣(還是一樣的原因,避免更改theta的值)然後用一個for loop計算所有在‘temp’裡面的元素的gradient descent,最後將計算好的temp來更改theta。這是一次的gradient descent的迴圈,將這個環節進行多次便能找到理想的theta值。
用contour graph和xy-graph來visualize我們的結果:
我們可以拿來跟最開始的graph作比較,可以發現這是一個還算不錯的line of fit。做到這裡就可以恭喜你做出了你的第一個用machine learning算出的預測公式啦!(此處應有掌聲啪啪啪)

Week2 的coding作業(required section)就到這裡啦。
Thanks for reading!