What Are Bayesian Neural Network Posteriors Really Like?
Summary
This paper investigates the foundational questions in BNN by using full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. The primary goal of this paper is to construct accurate samples from the posterior to understand the properties of BNN, without considering computational requirements and practicality. After showing the effective way to employ full batch HMC on modern neural architectures, the authors find that (1) BNNs can achieve significant performance gains over standard training and deep ensembles, but less robust to domain shift; (2) a single long HMC chain can provide a comparable performance to multiple shorter chains; (3) cold posterior effect is largely an artifact of data augmentation. (4) BMA performance is robust to the choice of prior scale; (5) while cheaper alternatives such as deep ensembles and SGMCMC can provide good generalization, their predictive distributions are distinct from HMC.
Motivation
To understand the behaviour of true BNNs using HMC as a precise tool. (and not to argue for HMC as a practical method for Bayesian deep learning)
explore fundamental questions about posterior geometry, the performance of BNNs, approximate inference, effect of priors and posterior temperature.
Background
Bayesian deep learning methods are typically evaluated on their ability to generate useful, well-calibrated predictions on held-out or out-of-distribution data. However, strong performance on benchmark problems does not imply that the algorithm accurately approximates the true Bayesian model average (BMA). But None of approximate inference methods have been directly evaluated on their ability to match the true posterior distribution using practical architectures and datasets.
Conclusion
We establish several properties of Bayesian neural networks, including
- good generalization performance
- lack of a cold posterior effect
- a lack of robustness to covariate shift.
notes
Suppose a procedure for effective HMC sampling.
Explore exciting questions about the fundamental behaviour of Bayesian neural networks:
1. the role of tempering
Cold posteriors are not needed to obtain near-optimal performance with Bayesian neural networks and may even hurt performance. Cold posterior effect is largely an artifact of data augmentation.
2. the prior over parameters
the prior over functions is more important than the prior over parameters.
3. generalization performance
BNNs achieve strong results in regression and classification tasks.
4. robustness to covariate shift.
higher fidelity representations of the predictive distribution can lead to decreased robustness