|
|
|
|
|
1 Introduction
- 2006, Deep Belief Networks, RBM pre-train, gradient descent finetune
- energy-based, MRF, dependency structure between random variables
- classification and representational learning
2 Classical Restricted Boltzmann Machines
- generative stochastic network, a layer of visible, a layer of hidden, parameters
- \(v,v_i,D;h,h_j,J\), \(P_{data}(v), P_{model}(v;\theta)\)
- \(\theta = \{W,b,c\}\)
- \(E(v,h;\theta)\): the \(v\)-th label "relating to" the \(h\)-th hidden
- \(P(v,h;\theta)\), \(P(v;\theta)\)
- e.g. \(v_i\) is activated? (\(v_i=1\))
- \(P(v_i = 1|h;\theta) = \frac{\sum_{v,v_i=1} exp(\sum_i\sum_jv_iW_{ij}h_j+\sum_jb_jh_j+\sum_ic_iv_i)}{\sum_v exp(\sum_i\sum_jv_iW_{ij}h_j+\sum_jb_jh_j+\sum_ic_iv_i)}\)
- PoE (Products of Experts)
in fact, let \(i=j=2\), then:
\(\sum_v exp(\sum_i\sum_jv_iW_{ij}h_j+\sum_i c_i v_i)\)
\(=1 + exp(\sum_j W_{1j}h_j + c_1) + exp(\sum_j W_{2j}h_j + c_2) + exp(\sum_j W_{1j}h_j + c_1+\sum_j W_{2j}h_j + c_2)\)
\(=(1+exp(\sum_j W_{1j} h_j + c_1))(1+exp(\sum_j W_{2j}h_j+c_2))\)
- then \(P(v_i=1|h)=\sigma(\sum_j W_{ij}h_j+c_i)\)
|
|