1. 程式人生 > >Coursera | Andrew Ng (02-week-2-2.3)—指數加權平均

Coursera | Andrew Ng (02-week-2-2.3)—指數加權平均

該系列僅在原課程基礎上部分知識點添加個人學習筆記,或相關推導補充等。如有錯誤,還請批評指教。在學習了 Andrew Ng 課程的基礎上,為了更方便的查閱複習,將其整理成文字。因本人一直在學習英語,所以該系列以英文為主,同時也建議讀者以英文為主,中文輔助,以便後期進階時,為學習相關領域的學術論文做鋪墊。- ZJ

轉載請註明作者和出處:ZJ 微信公眾號-「SelfImprovementLab」

2.3 Exponentially weighted averages (指數加權平均)

(字幕來源:網易雲課堂)

這裡寫圖片描述

I want to show you a few optimization algorithms.They are faster than gradient descent.In order to understand those algorithms,you need to be able use something called exponentially weighted averages

.Also called exponentially weighted moving averages in statistics.Let’s first talk about that,and then we’ll use this to build up to more sophisticated optimization algorithms.So, even though I now live in the United States,I was born in London.So, for this example I got the daily temperature from London from last year.So, on January 1, temperature was 40 degrees Fahrenheit.Now, I know most of the world uses a Celsius system,but I guess I live in United States which uses Fahrenheit.So that’s four degrees Celsius.And on January 2, it was nine degrees Celsius and so on.And then about halfway through the year,a year has 365 days so, that would be,sometime day number 180 will be sometime in late May, I guess.It was 60 degrees Fahrenheit which is 15 degrees Celsius, and so on.So, it start to get warmer, towards summer and it was colder in January.

這裡寫圖片描述

我想向你展示幾個優化演算法,它們比梯度下降法快要理解這些演算法,你需要用到指數加權平均,在統計中也叫作指數加權移動平均,我們首先講這個,然後再來講更加複雜的優化演算法,雖然現在我生活在美國,實際上我生於英國倫敦,比如我這兒有去年倫敦的每日溫度,所以 1月1號 溫度是 40 華氏度,我知道世界上大部分地區使用攝氏度,但是美國使用華氏度,相當於 4 攝氏度,在 1 月 2 號是 9攝氏度等等,在年中的時候,一年 365 天 年中就是說,大概 180 天的樣子 也就是 5 月末,溫度是 60 華氏度 也就是 15 攝氏度等等,夏季溫度轉暖 然後冬季降溫。

So, you plot the data you end up with this.Where day one being sometime in January, that you know,being the, beginning of summer,and that’s the end of the year, kind of late December.So, this would be January, January 1,It is the middle of the year approaching summer,and this would be the data from the end of the year.So, this data looks a little bit noisy and if you want to compute the trends,the local average or a moving average of the temperature,here’s what you can do.Let’s initialize V zero equals zero.And then, on every day, we’re going to average it with a weight of 0.9 times whatever appears as value,plus 0.1 times that day temperature.So, data one here would be the temperature from the first day.And on the second day, we’re again going to take a weighted average.0.9 times the previous value plus 0.1 times today’s temperature and so on.Day two plus 0.1 times data three and so on.And the more general formula is V on a given day is 0.9 times V from the previous day,plus 0.1 times the temperature of that day.So, if you compute this and plot it in red,this is what you get.You get a moving average of what’s calledan exponentially weighted average of the daily temperature.

這裡寫圖片描述

你用資料作圖 可以得到以下結果,起始日在 1 月份,這裡是夏季初,這裡是年末 相當於 12 月末,這裡是 1 月 1 號,年中接近夏季的時候,隨後就是年末的資料,看起來有些雜亂 如果要計算趨勢的話,也就是溫度的區域性平均值 或者說移動平均值,你要做的是,首先使 V0 等於 0,每天 需要使用 0.9 的加權數之前的數值,加上當日溫度的 0.1,所以這裡是第一天的溫度值,第二天 又可以獲得一個加權平均數,0.9 乘以之前的值加上當日的溫度的 0.1 以此類推,第二天值加上第三日資料的 0.1 如此往下,大體公式就是某天的 V 等於前一天 V 值的0.9,加上當日溫度的 0.1,如此計算 然後用紅線作圖的話,便得到這樣的結果,你得到了移動平均值,每日溫度的指數加權平均值。

So, let’s look at the equation we had from the previous slide,it was Vt equals,previously we had 0.9.We’ll now turn that prime to beta,beta times V t minus one plus and it previously, was 0.1, I’m going to turn that into one minus beta times data t,so, previously you had beta equals 0.9.It turns out that for reasons we are going to later,when you compute this you can think of Vt as approximately averaging over,something like one over one minus beta, day’s temperature.So, for example when beta goes 0.9 you could think of this as averaging over the last 10 days temperature.And that was the red line.

這裡寫圖片描述

看一下上一張幻燈片裡的公式,Vt等於,之前我們採用的是 0.9,我們把這個常數變成 β , β 乘上V(t1)加上,之前是 0.1 現在是(1- β )乘以第 t 天的資料,所以之前 β 等於 0.9,由於以後我們要考慮的原因,在計算時可視Vt為,大概是 1/(1- β ) 的每日溫度,如果 β 是 0.9 你會想,這是十天的平均值,也就是紅線部分。

Now, let’s try something else.Let’s set beta to be very close to one,let’s say it’s 0.98 .Then, if you look at 1/1 minus 0.98 ,this is equal to 50 .So, this is, you know, think of this as averaging over roughly,the last 50 days temperature.And if you plot that you get this green line.So, notice a couple of things with this very high value of beta.The plot you get is much smoother because you’re now averaging over more days of temperature.So, the curve is just, you know,less wavy is now smoother,but on the flip side the curve has now shifted further to the rightbecause you’re now averaging over a much larger window of temperatures.And by averaging over a larger window,this formula, this exponentially weighted average formula.It adapts more slowly, when the temperature changes.So, there’s just a bit more latency.And the reason for that is when Beta 0.98 then it’sgiving a lot of weight to the previous valueand a much smaller weight just 0.02, to whatever you’re seeing right now.So, when the temperature changes,when temperature goes up or down,there’s exponentially weighted average,just adapts more slowly when beta is so large.

這裡寫圖片描述

我們來試試別的,將 β 設定成接近 1 的一個值,比如 0.98 ,如果計算1/(1- 0.98 ),答案是 50 ,這就是粗略平均了一下,過去 50 天的溫度,這時作圖可以得到綠線,這個高值 β 要注意幾點,你得到的曲線要平坦一些 原因在於,你多平均了幾天的溫度,所以這個曲線,波動更小 更加平坦,缺點是曲線進一步右移,因為現在平均的溫度值更多,要平均更多的值,指數加權平均公式,在溫度變化時 適應地更緩慢一些,所以會出現一定延遲,因為當 β 等於 0.98 相當於,給前一天地值加了太多權重,只有 0.02 的權重給了當日的值,所以溫度變化時,溫度上下起伏,當 β 較大時,指數加權平均值適應地更慢一些。

Now, let’s try another value.If you set beta to another extreme,let’s say it is 0.5 ,then this by the formula we have on the right.This is something like averaging over just two days temperature,and you plot that you get this yellow line.And by averaging only over two days temperature,you have a much, as if you’re averaging over much shorter window.So, you’re much more noisy,much more susceptible to outliers.But this adapts much more quickly to what the temperature changes.So, this formula is highly implemented, exponentially weighted average.Again, it’s called an exponentially weighted,moving average in the statistics literature.We’re going to call it exponentially weighted average for short andby varying this parameter,or later we’ll see such a hyper parameter if you’re learning algorithm,you can get slightly different effectsand there will usually be some value in between that works best.That gives you the red curve which you know maybe looks likebetter average of the temperature are either the green or the yellow curve.You now know the basics of how to compute exponentially weighted averages.In the next video, let’s get a bit more intuition about what it’s doing.

這裡寫圖片描述

我們可以再換一個值試一試,如果 β 是另一個極端值,比如說 0.5 ,根據右邊公式,這是平均了兩天的溫度,作圖執行後得到黃線,由於僅平均了兩天的溫度,平均的資料太少,所以得到的曲線有更多的噪聲,更有可能出現異常值,但是這個曲線能夠更快適應溫度變化,所以指數加權平均數經常被使用,再說一次 它在統計學中被稱為,指數加權移動平均值,我們就簡稱為指數加權平均數,通過調整這個引數,或者說後面的演算法學習你會發現這是一個很重要的引數,可以取得稍微不同的效果,往往中間有某個值效果最好, β 為中間值時得到的紅色曲線,比起綠線和黃線更好地平均了溫度,現在你知道計算指數加權平均數的基本原理,下一個視訊中 我們再聊聊它的本質作用。

重點總結:

指數加權平均

指數加權平均的關鍵函式:

vt=βvt1+(1β)θt

下圖是一個關於天數和溫度的散點圖:

這裡寫圖片描述

  • β=0.9 時,指數加權平均最後的結果如圖中紅色線所示;
  • β=0.98 時,指數加權平均最後的結果如圖中綠色線所示;
  • β=0.5 時,指數加權平均最後的結果如下圖中黃色線所示;

這裡寫圖片描述

參考文獻:

PS: 歡迎掃碼關注公眾號:「SelfImprovementLab」!專注「深度學習」,「機器學習」,「人工智慧」。以及 「早起」,「閱讀」,「運動」,「英語 」「其他」不定期建群 打卡互助活動。