Bayesian linear regression

Let $S={(x^{(i)}, y^{(i)})}_{i=1}^m$ be a training set of i.i.d. examples from unknow distribution. The standard probabilistic interpretation of linear regression states that

$y^{(i)} = \theta^T x^{(i)} + \varepsilon^{(i)}, \qquad i=1, \dots, m$

where the are i.i.d. “noise” variables with independent distributions. It follows that $y^{(i)} - \theta^T x^{(i)} \sim \mathcal N(0, \sigma^2)$ , or equivalently,

$P(y^{(i)} | x^{(i)}) = \frac{1}{\sqrt{2\pi} \sigma} \text{exp}(-\frac{(y^{(i)} - \theta^T x^{(i)})^2}{2\sigma^2})$

In Bayesian linear regression, we assume that a prior distribution over parameters is also given; a typical choice, for instance, is $\theta \sim \mathcal N(0, \tau^2 I)$ . Using Bayes’s rule, we obtain the parameter posterior,

$\begin{equation} \begin{aligned} \text{posterior} &= \frac{\text{likelihood}\times \text{prior}}{\text{marginal likelihood}} \\ p(\theta, | S) &=\frac{p(\theta) p(S | \theta)}{\int_{\theta’} p(\theta’) p(S | \theta’) d\theta’} = \frac{p(\theta) \prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, \theta)}{\int_{\theta’} p(\theta’) \prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, \theta’) d\theta’} \end{aligned}\label{ppostd} \end{equation}$

Assuming the same noise model on testing points as on training points, the “output” of Bayesian linear regression on a new test point $x_*$ is not just a single guess “ $y_*$ ”, but rather an entire probability distribution over possible outputs, knows as the posterior predictive distribution:

$\begin{equation}p(y_* | x_* , S) = \int_{\theta} p(y_* | x_* , \theta ) p(\theta | S) d\theta \label{postd}\end{equation}$

For many types of models, the integrals in ( $\ref{ppostd}$ ) and ( $\ref{postd}$ ), are difficult to compute, and hence, we often resort to approximations, such as maximum a posteriori MAP estimation. MAP1, MAP2. Also you can see Regularization and Model selection.

$\hat{\theta} = \text{arg max}_{\theta} p(\theta, | S) =\text{arg max}_{\theta} \prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, \theta)$

In the case of Bayesian linear regression, however, the integrals actually are tractable! In particular, for Bayesian linear regression, one can show that (in 2.1.1 The standard linear model: http://www.gaussianprocess.org/gpml/)

$\theta | S \sim \mathcal N(\frac{1}{\sigma^2}A^{-1}X^Ty, A^{-1})$

$y_* | x_*, S \sim \mathcal (\frac{1}{\sigma^2}x_*^TA^{-1}X^Ty, x_*^TA^{-1}x_* + \sigma^2)$

where $A = 1/\sigma^2 X^TX + 1/\tau^2 I$ . the derivation of these formulas is somewhat involved. Nonetheless, from these equations, we get at least a flavor of what Bayesian models are all about: the posterior distribution over the test output $y_*$ for a test input $x_*$ is a gaussian distribution – this distribution reflects the uncertainty in our predictions $y_* = \theta^Tx_* + \varepsilon_*$ arising from both the randomness in $\varepsilon_*$ and the uncertainty in our choice of parameter $\theta$ . In contrast, classical probabilistic linear regression models estimate parameters $\theta$ directly from the training data but provide no estimate of how reliabl

posted @ 2020-02-09 17:09 eliker 阅读(194) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 一个适用于 .NET 的开源整洁架构项目模板
· API 风格选对了，文档写好了，项目就成功了一半！
· 【开源】C#上位机必备高效数据转换助手
· .NET 9.0 使用 Vulkan API 编写跨平台图形应用
· MyBatis中的 10 个宝藏技巧！

公告

昵称： eliker
园龄： 8年8个月
粉丝： 8
关注： 0

+加关注

2025年1月

日

一

二

三

四

五

六

eliker

Bayesian linear regression

公告

搜索

常用链接

最新随笔

我的标签

随笔档案

相册

阅读排行榜

推荐排行榜