1. 程式人生 > >機器學習系列之coursera week 10 Large Scale Machine Learning

機器學習系列之coursera week 10 Large Scale Machine Learning

目錄

1. Gradient Descent with Large Datasets

1.1 Learning with large datasets

Learn with large datasets:

m = 100,000,000

plot learning cruve like this:

fig. 1

(引自coursera week 10 Learning with large datasets)

===> 可以用更多資料訓練來降低泛化誤差

1.2 Stochastic gradient descent

Linear regression with gradient descent:

===> 又叫batch gradient descent(考慮所有樣本)

stochastic gradient descent:

(1) Randomly shuffle(reorder) training examples

(2) 

1.3 Mini-Batch Gradient Dedscent

Batch Gradient Descent: use all m examples in each iteration

Stochastic Gradient Descent: use 1 example in each iteration

Mini-Batch Gradient Descent: use b examples in each iteration

b = mini-batch size, always 10 or 2 to 100

say b = 10, m = 1000

1.4 Stochastic gradient descent convergence

checking for convergence:

During learning, compute Cost before updating θ using (x(i), y(i))

Every 1000 iterations, plot Cost averaged over the last 1000 examples processed by algorithm.

fig. 2

(引自coursera week 10 Stochastic gradient descent convergence)

Learning rate α is typically held constant. Can slowly decrease α over time if we want θ to converge(e.g.

α = const1 / (#iteration + const2)). 但往往這樣將問題轉變成尋找常數1和常數2,變得更加複雜.

2. Advance Topics

2.1 Online learning

Shipping service website where user comes, specifies origin and destination, you offer to ship their package for some asking price, and users sometimes choose to use your shipping service(y = 1), not(y = 0).

Features x capture properties of user of origin/destination and asking price. We want to learn p(y = 1| x; θ) to optomize price.

Repeat forever {
Get (x, y) corresponding to user.
Update θ using (x, y)
θ_j = θ_j - α(h(x) - y)x_j
}

Online learing can adaot ti changing user tastes and it allows us to learn from a continuous stream of data, since we use each example once then no longer need to process it again.

2.2 Map-Reduce and Data Paralelism

fig. 3

(引自coursera week 10 Map-Reduce and Data Paralelism)

fig. 4

(引自coursera week 10 Map-Reduce and Data Paralelism)

fig. 5

(引自coursera week 10 Map-Reduce and Data Paralelism)