[Bayes] openBUGS: this is not the annoying bugs in programming
Bayesian inference Using Gibbs Sampling
允许用户指定复杂的多层模型,并可使用MCMC算法来估计模型中的未知参数。
We use DAGs to specify models.
这里只涉及简单的贝叶斯网络,具体学习可见:
Carnegie Mellon University course 10-708, Spring 2017, Probabilistic Graphical Models
Ref: http://www.cnblogs.com/Dzhouqi/p/3204481.html
例子一
X and Y 不独立,但
X and Y 在Z情况下 条件独立。
表示:X ⊥ Y |Z
f (X, Y |Z) = f (X|Z) f (Y |Z).
知道了C 反而条件独立了? 如此理解:
a 孩子的血型是AB,其实“反作用”于c双亲不可能是O型血。然后,这个推断也影响了b孩子的血型可能性,即:也不可能是O型血。
知道了c,比如c父母只有A and B血型因子,那么a and b孩子变为了在c已知的条件下的独立。
例子二
此例同上。
例子三
已知c,能得出a and b条件独立? 不能。
在Gibbs的一点用处
If we want to sample from p(A, B, C, D, F ) with a Gibbs sampler we define each marginal full conditional distribution using the conditional independence pattern of the DAG.
手头有这样的数据,如下:
建立一个模型吧:
简化模型:
指定”似然函数“和”先验分布”:
Deterministic functions can be removed/restructured, so that:
Bayesian inference Using Gibbs Sampling
Download:
http://www.openbugs.net/w/Downloads
Install:
To install this, unpack by typing
tar zxvf OpenBUGS-3.2.3.tar.gz
cd OpenBUGS-3.2.3
then compile and install by typing
./configure
make
sudo make install
Run:
lolo@lolo-UX303UB$ OpenBUGS
OpenBUGS version 3.2.3 rev 1012
type 'modelQuit()' to quit
OpenBUGS>
Use:
https://www.youtube.com/watch?v=UhYAz6d5_qg
打开三类文件:model, data, init。
Model --> Specification 依次加载各个文件。
Inference --> Samples 其实就是监视Init里的四个参数。
Model --> Update 迭代操作一次(Here 每次1000下)
Prior Sensitivity Analysis
The choice of prior(s) distribution must be determined with care, particularly, when the likelihood doesn't dominate the posterior.
If the likelihood dominates the posterior, the posterior distribution will essentially be invariant over a wide range of priors.
When the number of studies is large, the prior distribution will be less important. 数据量越大,先验的选择越不重要。
The non-informative prior distribution will be very useful in the situation when prior information, expectations and beliefs are minimal or not available.
Figure, 先验概率,后验概率 和 似然函数的关系
贝叶斯后验均值估计的最基本特性是伸缩性(shrinkage)。
- 当似然函数的精度h0较大时, 后验均值主要受样本均值支配; 相反,
- 当先验精度h1较大时, 后验均值主要受先验均值支配。
这就是为什么贝叶斯估计通常取先验精度较低的原因(方差给得较大),
也可以看出贝叶斯估计在调整先验精度下可以达到经典估计的效果,从某种意义上说经典估计是贝叶斯估计的特殊形式。
通过两种精度的调整达到对后验均值的估计叫做伸缩性估计特性, 所有贝叶斯估计的均值都具有伸缩性估计这个特性。
最后,推荐一篇至少题目看上去很牛的文章:
官方:http://www.openbugs.net/Manuals/InferenceMenu.html
民间1:http://www.biostat.jhsph.edu/~fdominic/teaching/bio656/labs/labs08/Lab8.IntroWinBUGS.pdf
民间2:http://www.stats.ox.ac.uk/~cholmes/Courses/BDA/Winbugs/winbugs-help.pdf
民间的更好,详细,良心作品。
收敛性检验:
• For models with many parameters, it is inpractical to check convergence for every parameter, so just chose a random selection of relevant parameters to monitor
一个一个地检查。
– For example, rather than checking convergence for every element of a vector of random effects, just chose a random subset (say, the first 5 or 10).
• Examine trace plots of the sample values versus iteration to look for evidence of when the simulation appears to have stabilised: (查看是否有稳定的迹象)
– To obtain ’live’ trace plots for a parameter:
∗ Select Samples from the Inference menu.
∗ Type the name of the parameter in the white box marked node.
∗ Click once with the LMB on the box marked trace: an empty graphics window will appear on screen.
∗ Repeat for each parameter required.
∗ Once you start running the simulations (using the Update Tool, trace plots for these parameters will appear ’live’ in the graphics windows.
– To obtain a trace plot showing the full history of the samples for any parameter for which you have previously set a sample monitor and carried out some updates:
∗ Select Samples from the Inference menu.
∗ Type the name of the parameter in the white box marked node (or select name from pull down list).
∗ Click once with the LMB on the box marked history: a graphics window showing the sample trace will appear.
∗ Repeat for each parameter required.
uniform时,将初始值调高,看明显看出收敛的速度。毕竟在实践当中,我们并不会知道,甚至无法估计参数的大概范围。
以下是model中变量的假设分布换为gaussian分布,收敛快了许多。