機器學習筆記之簡化成本函式和梯度下降
Simplified Cost Function and Gradient Descent
Note: [6:53 - the gradient descent equation should have a 1/m factor]
We can compress our cost function's two conditional cases into one case:
Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
Notice that when y is equal to 1, then the second term (1−y)log(1−h
We can fully write out our entire cost function as follows:
J(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))] |
A vectorized implementation is:
h |
Gradient Descent
Remember that the general form of gradient descent is:
Repeat{θj:=θj−α∂∂θjJ(θ)} |
We can work out the derivative part using calculus to get:
Repeat{θj:=θj−αm∑i=1m(hθ(x(i))−y(i))x(i)j} |
Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.
A vectorized implementation is:
θ:=θ−αmXT(g(Xθ)−y⃗ )