高斯生成模型与朴素贝叶斯
cs229 Generative Model & Discriminative Model
目录
Defination
learns P(x|y) : x is feature and y is the class
Bayes rule
\[P(y|x)=\frac{P(x|y)P(y)}{P(x)}
\]
Gaussian Discriminative Analysis (GDA)
GDA is a generative algorithm.
\(x \in \R^{n}\), assume p(x|y) is Gaussian Distribution. \(\N(\mu,\Sigma)\)
we often assume +/- classes have different \(\mu\) and same \(\Sigma\)
\[P(y)=p_+^y (1-p_+)^{1-y}
\]
Parameters
\(\theta = \{\mu_0,\mu_1,\Sigma,p_+\}\), \(y_i=1\) or \(y_i=0\)
\[p_+ = \frac{\sum_{i=1}^{m}y_i}{m}
\]
\[\mu_j=\frac{\sum_{i=1}^{m}I(y_i=j)x_i}{\sum_{i=1}^{m}I(y_i=j)}
\]
where \(j=0/1\)
\[\Sigma = \frac{1}{m}\sum_{i=1}^{m}(x_i-\mu_{y_i})(x_i-\mu_{y_i})^T
\]
Goal
in generative algorithm, we maximize the Joint Likelihood, commonly we use its form of log
\[L(\theta)=\Pi_{i=1}^{i=m} P(x_i,y_i;\theta)
\\
=\Pi_{i=1}^{i=m}P(x_i|y_i;\theta')P(y_i;p_+)
\]
while in discriminative model the goal is to max:
\[L(\theta)=\Pi_{i=1}^{i=m} P(y_i|x_i)
\]
relations between GDA and LR
GDA(stronger assumptions) can be reduced to LR(weaker assumptions)
Naive Bayes
conditionally independent, get the \(\theta\) by Maximize Likelihood Estimation
is not effective when processing item that never appears before.
solution: Laplace smoothing, add 1 for every class