高斯生成模型与朴素贝叶斯

cs229 Generative Model & Discriminative Model

learns P(x|y) : x is feature and y is the class

Bayes rule

\[P(y|x)=\frac{P(x|y)P(y)}{P(x)} \]

GDA is a generative algorithm.

\(x \in \R^{n}\), assume p(x|y) is Gaussian Distribution. \(\N(\mu,\Sigma)\)

we often assume +/- classes have different \(\mu\) and same \(\Sigma\)

\[P(y)=p_+^y (1-p_+)^{1-y} \]

Parameters

\(\theta = \{\mu_0,\mu_1,\Sigma,p_+\}\), \(y_i=1\) or \(y_i=0\)

\[p_+ = \frac{\sum_{i=1}^{m}y_i}{m} \]

\[\mu_j=\frac{\sum_{i=1}^{m}I(y_i=j)x_i}{\sum_{i=1}^{m}I(y_i=j)} \]

where \(j=0/1\)

\[\Sigma = \frac{1}{m}\sum_{i=1}^{m}(x_i-\mu_{y_i})(x_i-\mu_{y_i})^T \]

Goal

in generative algorithm, we maximize the Joint Likelihood, commonly we use its form of log

\[L(\theta)=\Pi_{i=1}^{i=m} P(x_i,y_i;\theta) \\ =\Pi_{i=1}^{i=m}P(x_i|y_i;\theta')P(y_i;p_+) \]

while in discriminative model the goal is to max:

\[L(\theta)=\Pi_{i=1}^{i=m} P(y_i|x_i) \]

GDA(stronger assumptions) can be reduced to LR(weaker assumptions)

conditionally independent, get the \(\theta\) by Maximize Likelihood Estimation

is not effective when processing item that never appears before.

solution: Laplace smoothing, add 1 for every class

posted @ 2022-08-07 23:32 19376273 阅读(37) 评论(0) 编辑收藏举报

刷新页面返回顶部