Processing math: 100%

Bayesian linear regression

Let S=(x(i),y(i))mi=1 be a training set of i.i.d. examples from unknow distribution. The standard probabilistic interpretation of linear regression states that

y(i)=θTx(i)+ε(i),i=1,,m

where the ε(i) are i.i.d. “noise” variables with independent N(0,σ2) distributions. It follows that y(i)θTx(i)N(0,σ2), or equivalently,

P(y(i)|x(i))=12πσexp((y(i)θTx(i))22σ2)

In Bayesian linear regression, we assume that a prior distribution over parameters is also given; a typical choice, for instance, is θN(0,τ2I). Using Bayes’s rule, we obtain the parameter posterior,

posterior=likelihood×priormarginal likelihoodp(θ,|S)=p(θ)p(S|θ)θp(θ)p(S|θ)dθ=p(θ)mi=1p(y(i)|x(i),θ)θp(θ)mi=1p(y(i)|x(i),θ)dθ

Assuming the same noise model on testing points as on training points, the “output” of Bayesian linear regression on a new test point x is not just a single guess “y”, but rather an entire probability distribution over possible outputs, knows as the posterior predictive distribution:

p(y|x,S)=θp(y|x,θ)p(θ|S)dθ

For many types of models, the integrals in (1) and (2), are difficult to compute, and hence, we often resort to approximations, such as maximum a posteriori MAP estimation. MAP1, MAP2. Also you can see Regularization and Model selection.

ˆθ=arg maxθp(θ,|S)=arg maxθmi=1p(y(i)|x(i),θ)

In the case of Bayesian linear regression, however, the integrals actually are tractable! In particular, for Bayesian linear regression, one can show that (in 2.1.1 The standard linear model: http://www.gaussianprocess.org/gpml/)

θ|SN(1σ2A1XTy,A1)

y|x,S(1σ2xTA1XTy,xTA1x+σ2)

where A=1/σ2XTX+1/τ2I. the derivation of these formulas is somewhat involved. Nonetheless, from these equations, we get at least a flavor of what Bayesian models are all about: the posterior distribution over the test output y for a test input x is a gaussian distribution – this distribution reflects the uncertainty in our predictions y=θTx+ε arising from both the randomness in ε and the uncertainty in our choice of parameter θ. In contrast, classical probabilistic linear regression models estimate parameters θ directly from the training data but provide no estimate of how reliabl

posted @   eliker  阅读(194)  评论(0编辑  收藏  举报
编辑推荐:
· .NET 依赖注入中的 Captive Dependency
· .NET Core 对象分配(Alloc)底层原理浅谈
· 聊一聊 C#异步 任务延续的三种底层玩法
· 敏捷开发:如何高效开每日站会
· 为什么 .NET8线程池 容易引发线程饥饿
阅读排行:
· 一个适用于 .NET 的开源整洁架构项目模板
· API 风格选对了,文档写好了,项目就成功了一半!
· 【开源】C#上位机必备高效数据转换助手
· .NET 9.0 使用 Vulkan API 编写跨平台图形应用
· MyBatis中的 10 个宝藏技巧!
点击右上角即可分享
微信分享提示