高斯生成模型与朴素贝叶斯

cs229 Generative Model & Discriminative Model

Defination

learns P(x|y) : x is feature and y is the class

Bayes rule

\[P(y|x)=\frac{P(x|y)P(y)}{P(x)} \]

Gaussian Discriminative Analysis (GDA)

GDA is a generative algorithm.

\(x \in \R^{n}\), assume p(x|y) is Gaussian Distribution. \(\N(\mu,\Sigma)\)

we often assume +/- classes have different \(\mu\) and same \(\Sigma\)

\[P(y)=p_+^y (1-p_+)^{1-y} \]

Parameters

\(\theta = \{\mu_0,\mu_1,\Sigma,p_+\}\), \(y_i=1\) or \(y_i=0\)

\[p_+ = \frac{\sum_{i=1}^{m}y_i}{m} \]

\[\mu_j=\frac{\sum_{i=1}^{m}I(y_i=j)x_i}{\sum_{i=1}^{m}I(y_i=j)} \]

where \(j=0/1\)

\[\Sigma = \frac{1}{m}\sum_{i=1}^{m}(x_i-\mu_{y_i})(x_i-\mu_{y_i})^T \]

Goal

in generative algorithm, we maximize the Joint Likelihood, commonly we use its form of log

\[L(\theta)=\Pi_{i=1}^{i=m} P(x_i,y_i;\theta) \\ =\Pi_{i=1}^{i=m}P(x_i|y_i;\theta')P(y_i;p_+) \]

while in discriminative model the goal is to max:

\[L(\theta)=\Pi_{i=1}^{i=m} P(y_i|x_i) \]

relations between GDA and LR

GDA(stronger assumptions) can be reduced to LR(weaker assumptions)

Naive Bayes

conditionally independent, get the \(\theta\) by Maximize Likelihood Estimation

is not effective when processing item that never appears before.

solution: Laplace smoothing, add 1 for every class

posted @ 2022-08-07 23:32  19376273  阅读(37)  评论(0编辑  收藏  举报