1. 程式人生 > >斯坦福大學深度學習公開課cs231n學習筆記(1)softmax函式理解與應用

斯坦福大學深度學習公開課cs231n學習筆記(1)softmax函式理解與應用

<div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post" style="height: 1820px; overflow: hidden;">
                    <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/htmledit_views-0a60691e80.css">
            <div class="htmledit_views">
                
<p><span style="font-size:14px;">我學習使用的是帶中文翻譯字幕的網易課程,公開課地址:<a href="http://study.163.com/course/courseLearn.htm?courseId=1003223001#/learn/video?lessonId=1003734105&courseId=1003223001" target="_blank">http://study.163.com/course/courseLearn.htm?courseId=1003223001#/learn/video?lessonId=1003734105&courseId=1003223001</a></span></p>
<p><span style="font-size:14px;">該節課中提到了一種叫作softmax的函式,因為之前對這個概念不瞭解,所以本篇就這個函式進行整理,如下:</span></p>
<p><span style="font-size:14px;">維基給出的解釋:softmax函式,也稱指數歸一化函式,它是一種<span style="color:#cc0000;">logistic函式</span>的歸一化形式,可以將K維實數向量壓縮成範圍[0-1]的新的K維實數向量。函式形式為:</span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127213109950?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">  (1)<br></span></p>
<p><span style="font-size:14px;">其中,分母部分起到歸一化的作用。至於取指數的原因,第一是要模擬max的行為,即使得大的數值更大;第二是方便求導運算。</span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127214016170?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;"><img src="https://pic4.zhimg.com/50/v2-11758fbc2fc5bbbc60106926625b3a4f_hd.jpg" alt=""><br></span></p>
<p><span style="font-size:14px;">在概率論中,softmax函式輸出可以代表一個類別分佈--有k個可能結果的概率分佈。<br></span></p>
<p><span style="font-size:14px;">從定義中也可以看出,softmax函式與logistic函式有著緊密的的聯絡,對於logistic函式,定義如下:</span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127214513836?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127214523348?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;">最顯著的區別:<span style="color:#ff0000;">logistic 迴歸是針對二分類問題,softmax則是針對多分類問題,logistic可看成softmax的特例。</span><br></span></p>
<p><span style="font-size:14px;">二分類器(two-class classifier)要最大化資料集的似然值等價於將每個資料點的線性迴歸輸出推向正無窮(類1)和負無窮(類2)。邏輯迴歸的損失方程(Loss Function):<br></span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127215037856?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;"><span style="font-family:sans-serif;">對於給定的測試輸入 </span><img class="tex" alt="\textstyle x" src="http://ufldl.stanford.edu/wiki/images/math/f/6/c/f6c0f8758a1eb9c99c0bbe309ff2c5a5.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;">,假如想用假設函式針對每一個類別j估算出概率值 </span><img class="tex" alt="\textstyle p(y=j | x)" src="http://ufldl.stanford.edu/wiki/images/math/c/1/d/c1d5aaee0724f2183116cb8860f1b9e4.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;">。即估計 </span><img class="tex" alt="\textstyle x" src="http://ufldl.stanford.edu/wiki/images/math/f/6/c/f6c0f8758a1eb9c99c0bbe309ff2c5a5.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;"> 的每一種分類結果出現的概率。因此,假設函式將要輸出一個 </span><img class="tex" alt="\textstyle k" src="http://ufldl.stanford.edu/wiki/images/math/b/0/0/b0066e761791cae480158b649e5f5a69.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;"> 維的向量(向量元素的和為1)來表示這 </span><img class="tex" alt="\textstyle k" src="http://ufldl.stanford.edu/wiki/images/math/b/0/0/b0066e761791cae480158b649e5f5a69.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;"> 個估計的概率值。
 假設函式 </span><img class="tex" alt="\textstyle h_{\theta}(x)" src="http://ufldl.stanford.edu/wiki/images/math/8/8/7/887e72d0a7b7eb5083120e23a909a554.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;"> 形式如下:</span><br></span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127215854632?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;"><span style="font-family:sans-serif;">其中 </span><img class="tex" alt="\theta_1, \theta_2, \ldots, \theta_k \in \Re^{n+1}" src="http://ufldl.stanford.edu/wiki/images/math/f/d/9/fd93be6ab8e2b869691579202d7b4417.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;"> 是模型的引數。請注意 </span><img class="tex" alt="\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }" src="http://ufldl.stanford.edu/wiki/images/math/a/a/b/aab84964dbe1a2f77c9c91327ea0d6d6.png" style="border:none;vertical-align:middle;margin:0px;font-family:sans-serif;"><span style="font-family:sans-serif;">這一項對概率分佈進行歸一化,使得所有概率之和為
 1 。</span><br></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;">其<strong>代價函式</strong>可以寫為:</span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171127220138270?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;">其中,1{真}=1,1{假}=0.</span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;color:#ff0000;"><em><strong>12.23補充:</strong></em></span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;">關於代價函式,softmax用的是cross-entropy loss,</span></span><span style="font-size:16px;font-family:'microsoft yahei';">資訊理論中有個重要的概念叫做交叉熵cross-entropy,
</span><span style="font-size:16px;font-family:'microsoft yahei';">公式是: </span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171223112802040?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""></span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;"><span style="font-family:'microsoft yahei';font-size:16px;">夏農熵的公式:</span></span></span></p>
<p><span style="font-family:sans-serif;"><span style="font-size:14px;"><img src="http://images.cnitblog.com/blog/571227/201412/112112589313898.png" alt="這裡寫圖片描述" title="" style="border:0px;vertical-align:middle;margin-top:15px;margin-bottom:15px;font-family:'microsoft yahei';font-size:16px;"><br style="font-family:'microsoft yahei';font-size:16px;"><span style="font-family:'microsoft yahei';font-size:16px;">交叉熵與 loss的聯絡,</span><span style="font-family:'microsoft yahei';font-size:16px;">設p(x)代表的是真實的概率分佈</span><span class="MathJax_Preview" style="margin:0px;padding:0px;font-family:'microsoft yahei';font-size:16px;"></span><span style="font-family:'microsoft yahei';font-size:16px;">,那麼可以看出上式是概率分佈為<img src="https://img-blog.csdn.net/20171223113004779?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""></span><span class="MathJax_Preview" style="margin:0px;padding:0px;font-family:'microsoft yahei';font-size:16px;"></span><span style="font-family:'microsoft yahei';font-size:16px;">的相對熵公式,<img src="https://img-blog.csdn.net/20171223113004779?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" style="font-family:'microsoft yahei';font-size:16px;"></span><span style="font-family:'microsoft yahei';font-size:16px;">是對第i個類別概率的估計。使用損失函式可以描述真實分佈於估計分佈的交叉熵。交叉熵可以看做熵與相對熵之和:<img src="https://img-blog.csdn.net/20171223113113934?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">,</span><span style="font-family:'microsoft yahei';font-size:16px;">這裡的相對熵也叫作kl距離,在資訊理論中,D(P||Q)表示當用概率分佈Q來擬合真實分佈P時,產生的資訊損耗,其中P表示真實分佈,Q表示P的擬合分佈。又因為真實值的熵是不變的,交叉熵也描述預測結果與真實結果的相似性,用來做損失函式可保證預測值符合真實值。 </span><br></span></span></p>
<h1><a name="t0"></a><span style="font-family:sans-serif;"><span style="font-size:14px;">softmax的應用:</span></span></h1>
<p><span><span><span style="font-family:sans-serif;"><span style="font-size:14px;">在人工神經網路(ANN)中,Softmax常被用作輸出層的啟用函式。<span style="line-height:33px;">其</span><span style="margin:0px;padding:0px;font-family:'Times New Roman';line-height:33px;"><span style="margin:0px;padding:0px;font-family:Arial, 'Microsoft YaHei';">中,<span><img src="https://img-blog.csdn.net/20160402203524206?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" style="border:none;vertical-align:middle;font-family:'Times New Roman';line-height:33px;"></span>表示第L層(通常是最後一層)第j個神經元的輸入,<span><img src="https://img-blog.csdn.net/20160402203430831?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" style="border:none;vertical-align:middle;"></span>表示第L層第j個神經元的輸出,<span><img src="https://img-blog.csdn.net/20160402203643644?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" style="border:none;vertical-align:middle;"></span>表示自然常數。<span style="line-height:33px;"><span style="font-family:Arial;">注意</span></span><span style="line-height:33px;"><span style="font-family:Arial;">看,<span><img src="https://img-blog.csdn.net/20160402203914176?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" style="border:none;vertical-align:middle;"></span>表示了第L層所有神經元的輸入之和。</span></span></span></span><br></span></span></span></span></p>
<p><span style="font-size:14px;"><span style="font-family:sans-serif;">不僅是因為它的效果好,而且它使得ANN的輸出值更易於理解,即</span><span style="font-family:sans-serif;">神經元的輸出值越大,則該神經元對應的類別是真實類別的可能性更高。</span><br></span></p>
<h1><a name="t1"></a><span style="font-family:sans-serif;"><span style="font-size:14px;">12.17補充:softmax求導</span></span></h1>
<p><span style="font-size:14px;">由公式(1)可知,softmax函式僅與分類有關:</span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171217113132547?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;">其負對數似然函式為:</span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171217113325834?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""></span></p>
<p><span style="font-size:14px;">對該似然函式求導,得:<br></span></p>
<p><span style="font-size:14px;"><img src="https://img-blog.csdn.net/20171217132601530?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""></span></p>
<p><span style="font-size:14px;"><span style="color:#CC0000;"><em>注:參考部落格裡上面求導公式有誤,已更正。</em></span><br></span></p>
<p><span style="font-size:14px;">對於①條件:先Copy一下Softmax的結果(即prob_data)到bottom_diff,再對k位置的unit減去1<br>
對於②條件:直接Copy一下Softmax的結果(即prob_data)到bottom_diff<br>
對於③條件:找到ignore位置的unit,強行置為0。<br><img src="https://img-blog.csdn.net/20171217113544387?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvcGlhb3h1ZXpob25n/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt=""><br></span></p>
<p><span style="font-size:14px;">參考:</span></p>
<p><span style="font-size:14px;">https://en.wikipedia.org/wiki/Softmax_function<br></span></p>
<p><span style="font-size:14px;">https://zhuanlan.zhihu.com/p/25723112<br></span></p>
<p><span style="font-size:14px;">http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92<br></span></p>
<p><span style="font-size:14px;">https://www.cnblogs.com/maybe2030/p/5678387.html?utm_source=tuicool&utm_medium=referral<br></span></p>
<p><span style="font-size:14px;">http://blog.csdn.net/bea_tree/article/details/51489969#t10</span></p>
<p><span style="font-size:14px;">https://github.com/YuDamon/Softmax</span></p>
<p><span style="font-size:14px;">https://www.cnblogs.com/neopenx/p/5590756.html</span><br></p>
            </div>
                </div>