What Are Bayesian Neural Network Posteriors Really Like?

Summary

This paper investigates the foundational questions in BNN by using full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. The primary goal of this paper is to construct accurate samples from the posterior to understand the properties of BNN, without considering computational requirements and practicality. After showing the effective way to employ full batch HMC on modern neural architectures, the authors find that (1) BNNs can achieve significant performance gains over standard training and deep ensembles, but less robust to domain shift; (2) a single long HMC chain can provide a comparable performance to multiple shorter chains; (3) cold posterior effect is largely an artifact of data augmentation. (4) BMA performance is robust to the choice of prior scale; (5) while cheaper alternatives such as deep ensembles and SGMCMC can provide good generalization, their predictive distributions are distinct from HMC. 

Motivation

To understand the behaviour of true BNNs using HMC as a precise tool. (and not to argue for HMC as a practical method for Bayesian deep learning)

explore fundamental questions about posterior geometry, the performance of BNNs, approximate inference, effect of priors and posterior temperature.

Background

Bayesian deep learning methods are typically evaluated on their ability to generate useful, well-calibrated predictions on held-out or out-of-distribution data. However, strong performance on benchmark problems does not imply that the algorithm accurately approximates the true Bayesian model average (BMA). But None of approximate inference methods have been directly evaluated on their ability to match the true posterior distribution using practical architectures and datasets.

Conclusion

We establish several properties of Bayesian neural networks, including

  • good generalization performance
  • lack of a cold posterior effect
  • a lack of robustness to covariate shift.

notes

Suppose a procedure for effective HMC sampling.

Explore exciting questions about the fundamental behaviour of Bayesian neural networks:

1. the role of tempering

Cold posteriors are not needed to obtain near-optimal performance with Bayesian neural networks and may even hurt performance. Cold posterior effect is largely an artifact of data augmentation.

2. the prior over parameters

the prior over functions is more important than the prior over parameters.

3. generalization performance

BNNs achieve strong results in regression and classification tasks.

4. robustness to covariate shift.

higher fidelity representations of the predictive distribution can lead to decreased robustness

 

 

 

posted @ 2021-09-08 13:51  一泓喜悲vv  阅读(55)  评论(0编辑  收藏  举报