1. 程式人生 > >生成學習演算法(Generative Learning algorithms)

生成學習演算法(Generative Learning algorithms)

看了一下斯坦福大學公開課:機器學習教程(吳恩達教授),記錄了一些筆記,寫出來以便以後有用到。筆記如有誤,還望告知。
本系列其它筆記:
線性迴歸(Linear Regression)
分類和邏輯迴歸(Classification and logistic regression)
廣義線性模型(Generalized Linear Models)
生成學習演算法(Generative Learning algorithms)

生成學習演算法(Generative Learning algorithms)

之前我們學習的演算法 p

( y x ; θ ) p(y|x;\theta) 給定x的y的條件分佈,我們稱之為判別學習演算法(discriminative learning algorithms);現在我們學習相反的演算法 p
( x y ) ( p ( y ) )
p(x|y)(p(y))
,稱之為生成學習演算法(generative
learning algorithms)。

使用貝葉斯定理,我們可以得到給定x後y的分佈:
p ( y x ) = p ( x y ) p ( y ) p ( x ) p ( x ) = p ( x y = 1 ) p ( y = 1 ) + p ( x y = 0 ) p ( y = 0 ) p(y|x) = \frac{p(x|y)p(y)}{p(x)} \\p(x) = p(x|y = 1)p(y = 1) + p(x|y = 0)p(y = 0)

1 高斯判別分析(Gaussian discriminant analysis)

1.1 多元高斯分佈(多元正態分佈)

假設輸入特徵 x R n x \in \R^n ,且是連續的;p(x|y)滿足高斯分佈。

假設z符合多元高斯分佈 z N ( μ , Σ ) z \backsim\mathcal{N}(\vec\mu,\Sigma)
p ( z ) = 1 ( 2 π ) n 2 Σ 1 2 exp ( 1 2 ( x μ ) T Σ 1 ( x μ ) ) p(z) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu))

1.2 高斯判別分析模型(The Gaussian Discriminant Analysis model)

y B e r n o u l l i ( ϕ ) x y = 0 N ( μ 0 , Σ ) x y = 1 N ( μ 1 , Σ ) p ( y ) = ϕ y ( 1 ϕ ) 1 y p ( x y = 0 ) = 1 ( 2 π ) n 2 Σ 1 2 exp ( 1 2 ( x μ 0 ) T Σ 1 ( x μ 0 ) ) p ( x y = 1 ) = 1 ( 2 π ) n 2 Σ 1 2 exp ( 1 2 ( x μ 1 ) T Σ 1 ( x μ 1 ) ) y \backsim Bernoulli(\phi) \\x|y = 0 \backsim\mathcal{N}(\mu_0,\Sigma) \\x|y = 1 \backsim\mathcal{N}(\mu_1,\Sigma) \\p(y) = \phi^{y}(1-\phi)^{1-y} \\p(x|y = 0) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu_0)^{T}\Sigma^{-1}(x-\mu_0)) \\p(x|y = 1) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu_1)^{T}\Sigma^{-1}(x-\mu_1))

( ϕ , μ 0 , μ 1 , Σ ) = log i = 1 m p ( x ( i ) , y ( i ) ; ϕ , μ 0 , μ 1 , Σ ) = log i = 1 m p ( x ( i ) y ( i ) ; μ 0 , μ 1 , Σ ) p ( y ( i ) ; ϕ ) j o i n t   L i k e l i h o o d = i = 1 m ( log p ( x ( i ) y ( i ) ; μ 0 , μ 1 , Σ ) + log p ( y ( i ) ; ϕ ) ) \ell(\phi,\mu_0,\mu_1,\Sigma) = \log\prod_{i=1}^{m}p(x^{(i)},y^{(i)};\phi,\mu_0,\mu_1,\Sigma) \\ = \log\prod_{i=1}^{m}p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)\cdot p(y^{(i)};\phi) \rightarrow joint \ Likelihood \\ = \sum_{i=1}^{m}(\log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+ \log p(y^{(i)};\phi))