广义线性模型

引子

无论是哪一种线性模型,在概率意义上均可以视为在给定数据x、相关的线性参数\(\theta\)的条件下估计因变量\(y\)的过程。整个过程可以理解为求条件概率\(p(y|x,\theta)\)的过程。

指数族分布

指数分布族是指如下一大类分布\(p(y|\eta) = b(y)exp(\eta^TT(y)-a(\eta))\).
其中\(b(y)\)为underlying measure,\(T(y)\)为sufficient statistic,\(a(\eta)\)为log normalizer
由于满足概率为一,有:

\[\int p(y|\eta)dy=exp(-A(\eta))\int b(y)exp(\eta^TT(y)) dy = 1\\exp(A(\eta))=\int b(y)exp(\eta^TT(y)) dy \]

假设

假设1: \(y|x,\theta \sim exponentialFamily(\eta)\)
假设2:\(y\)的估计值就是\(P(y|x,\theta)\)的期望,\(h(x,\theta)=E(y|x,\theta)=E(y|\eta)\)
假设3:指数族参数\(\eta\)\(x\)是线性关系,\(\eta = \theta^T x\)

广义线性模型的性质

证明\(E(y|\eta) = \frac{d}{d\eta}a(\eta)\),\(D(y|\eta) = \frac{d^2}{d\eta^2}a(\eta)\)

proof: 定义似然函数: \(L(y,\eta)= log(p(y|\eta))=logb(y)+\eta^TT(y)-a(\eta)\),对此定义导数:

\[U(y,\eta)= \frac{d}{d\eta}L(y,\eta)=y-\frac{d}{d\eta}a(\eta) \]

\(U(y,\eta)\)的期望为:

\[E(U(y,\eta))=\int U(y,\eta)p(y|\eta)dy \]

则有

\[E(U(y,\eta))=\int \frac{d}{d\eta}log(p(y|\eta))p(y|\eta)dy \]

\[dlog(p(y|\eta))=\frac{1}{p(y|\eta)}dp(y|\eta) \]

得到

\[\begin{aligned} E(U(y,\eta))&=\int \frac{1}{p(y|\eta)}\frac{d}{d\eta}p(y|\eta)p(y|\eta)dy \\&=\int \frac{d}{d\eta}p(y|\eta)dy \\&=\frac{d}{d\eta}\int p(y|\eta)dy=0 \end{aligned} \]

因此:\(E(U(y,\eta)) = E(y-\frac{d}{d\eta}a(\eta))=0\), 则结论得证
方差:TODO

与其他线性模型的关系

推导线性回归

基本假设:假设y服从\(N(\mu,1)\)的分布(原因在于方差实际上对Θ的优化是没有影响的,见线性回归的概率解释,也可以不做假设,但推导更加复杂),因此有

\[\begin{aligned} p(y|x,\theta) &= \frac{1}{\sqrt{2\pi}}exp(-\frac{(y-\mu)^2}{2})\\ &=\frac{1}{\sqrt{2\pi}}exp(-\frac{y^2}{2})exp(\mu y-\frac{\mu^2}{2})) \end{aligned} \]

得到\(b(y) =\frac{1}{\sqrt{2\pi}}exp(-\frac{y^2}{2}), T(y)=y,a(\eta)=\frac{\mu^2}{2}, \eta=\mu\),因此其估计函数为\(h(x,\theta)=\eta=\theta^T x\)

推导逻辑分类

基本假设:y 服从于二项分布\(Bin(y,\theta)\),因此有:

\[\begin{aligned} p(y|x,\theta) &= \theta^y(1-\theta)^{1-y}\\ &=exp(ylog(\frac{\theta}{1-\theta})+log(1-\theta)) \end{aligned} \]

得到\(b(y) =1, T(y)=y,a(\eta)=log(1+e^\eta), \eta=log(\frac{\theta}{1-\theta})\),因此其估计函数为\(h(x,\theta)=\frac{1}{1-e^{-\eta}}\)

推导多分类

基本假设:y 服从于二项分布\(multi(M,y_1,\cdots,y_K)\),因此有

\[\begin{aligned} p(y|x,\theta) &= \frac{M!}{y_1!y_2!\cdots y_K!}\pi_1^{y_1}\cdots \pi_K^{y_K} \\&= \frac{M!}{y_1!y_2!\cdots y_K!}exp(\sum_{i=1}^K y_ilog\pi_i) \end{aligned} \]

这里\(exp(\sum_{i=1}^K y_ilog\pi_i)\)可化为:

\[\begin{aligned} exp(\sum_{i=1}^K y_ilog\pi_i) &= exp(\sum_{i=1}^{K-1} y_ilog\pi_i + (1-\sum_{i=1}^{K-1}\pi_i)log(1-\sum_{i=1}^{K-1}\pi_i)) \\&\xlongequal{\pi_K=1-\sum_{i=1}^{K-1}\pi_i)} exp(\sum_{i=1}^{K-1} y_ilog(\frac{\pi_i}{1-\sum_{i=1}^{K-1}\pi_i}) + log(1-\sum_{i=1}^{K-1}\pi_i) \\&=exp(\sum_{i=1}^{K-1} y_ilog(\frac{\pi_i}{\pi_K}) + log(\pi_K)) \end{aligned} \]

得到\(T(y_i)=y_i,a(\eta_i)=log(\pi_K), \eta_i=log(\frac{\pi_i}{\pi_K})\),注意到

\[e^\eta_i \pi_K = \pi_i \implies \pi_K \sum_{i=1}^K e^{\eta_i}=\sum_{i=1}^K \pi_i = 1 \]

可得到\(\pi_K=\frac{1}{\sum_{i=1}^K e^{\eta_i}},\pi_i=\frac{e^{\eta_i}}{\sum_{i=1}^K e^{\eta_i}}\),所以有:\(h(x,\theta) = \frac{e^{\eta_i}}{\sum_{i=1}^K e^{\eta_i}}\)

posted @ 2021-03-29 17:13  __Blog  阅读(254)  评论(0编辑  收藏  举报