Deep Learning Tutorial (翻译) 之 RBM(下)

英文原文请参考http://www.deeplearning.net/tutorial/rbm.html

RBM代码实现

我们构造一个RBM类,其参数(主要是W,hbias,vbias,theano_rng)可以通过构造器初始化或通过形参传递。这样处理有助于将RBM用于深度网络的一个构成块,这样参数W和b可以与相应的MLP的sigmoidal层参数共享。代码如下:

class RBM(object):
    def __init__(self,
                 input=None,
                 n_visible=784,
                 n_hidden=500,
                 W=None,
                 hbias=None,
                 vbias=None,
                 numpy_rng=None,
                 theano_rng=None
                 ):
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        if numpy_rng is None:
            numpy_rng = numpy.random.RandomState(1234)
        if theano_rng is None:
            theano_rng = RandomStreams(numpy_rng.randint(2**30))
        if W is None:
            initial_W = numpy.asarray(
                numpy_rng.uniform(
                    low=-4 * numpy.sqrt(6. / (n_visible + n_hidden)),
                    high=4 * numpy.sqrt(6. /(n_visible + n_hidden)),
                    size=(n_visible, n_hidden)
                ),
                dtype= theano.config.floatX
            )
            W = theano.shared(value=initial_W, name='W',borrow=True)
        if hbias is None:
            hbias = theano.shared(
                value=numpy.zeros(
                    n_hidden,
                    dtype=theano.config.floatX
                ),
                name='hbias',
                borrow=True
            )
        if vbias is None:
            vbias = theano.shared(
                value=numpy.zeros(
                    n_visible,
                    dtype=theano.config.floatX
                ),
                name='vbias',
                borrow=True
            )
        # initialize input layer for standalone RBM or layer0 of DBN
        self.input = input
        if not input:
            self.input = T.matrix('input')

        self.W = W
        self.hbias = hbias
        self.vbias = vbias
        self.theano_rng = theano_rng

        self.params = [self.W, self.hbias, self.vbias]

 下一步定义构造符号图的函数根据公式7和8

代码如下:

def propup(self, vis):
        '''
        这个函数从可见层向隐藏层激活进行传播
        注意到这里也返回了pre_sigmoid_activation。这个符号变量在需要更稳定的计算图时可能用到
        '''
        pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

    def propdown(self, hid):
        pre_sigmoid_activation = T.dot(hid, self.W) + self.vbias
        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

    def sample_h_given_v(self, v0_sample):
        #这个函数给定可见层预测隐藏层
        #首先根据给定可见层样本计算隐藏层的activation
        pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
        #获得隐藏层样本通过上面的activation
        h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
                                             n=1, p=h1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_h1, h1_mean, h1_sample]

    def sample_v_given_h(self, h0_sample):
        pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
        v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
                                             n=1,p=v1_mean,
                                             dtype=theano.config.floatX)
        return [pre_sigmoid_v1, v1_mean, v1_sample]

我们可以使用这些函数为Gibbs采用步骤定义符号图。定义两个函数:

  • gibbs_vhv执行一步Gibbs采样,从可见层开始,我们将看到,这步对从RBM采用很有用
  • gibbs_hvh执行一步Gibbs采样,从隐藏层开始,对执行CD和PCD更新有用

代码如下:

def gibbs_hvh(self, h0_sample):
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
        return [pre_sigmoid_v1, v1_mean, v1_sample,
                pre_sigmoid_h1,h1_mean, h1_sample]

    def gibbs_vhv(self, v0_sample):
        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
        return [pre_sigmoid_h1, h1_mean, h1_sample,
                pre_sigmoid_v1, v1_mean, v1_sample]

注意到这里我们也返回了pre-sigmoid activation。理解这个我们需要了解Theano是怎样工作的,whenever你编译一个Theano函数,作为input的计算图为了加速和稳定获得优化,这是通过改变其他子图的几个部分实现的。(接下来的解释都不懂就不翻译了)Therefore the easiest and more efficient way is to get also the pre-sigmoid activation as an output of scan, and apply both the log and sigmoid outside scan such that Theano can catch and optimize the expression.

这个类也有个函数计算自由能,计算参数的梯度时用到。我们增加get_cost_updates方法,生成符号梯度为CD-k或PCD-k更新,代码如下:

def free_energy(self, v_sample):
        wx_b = T.dot(v_sample, self.W) + self.hbias
        vbias_term = T.dot(v_sample, self.vbias)
        hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
        return -hidden_term -vbias_term

def get_cost_updates(self, lr=0.1, persistent=None, k=1):
        '''
        这个函数用来实现一步CD-k或PCD-k
        :param lr: 学习率
        :param persistent: For PCD,共享变量包含Gibbs链的old state。
         size为(batch size, 隐藏单元个数)
        :param k: Gibbs步数
        :return:返回代价和updates,updates包括weights和biases,
         同时也有shared variable的更新,用于保存持久链,如果是PCD
        '''
        # 计算positive phase
        pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)
        # 决定如何初始化持久链
        # 对CD,使用新生成的隐藏层样本
        # 对PCD,用以前的链状态初始化
        if persistent is None:
            chain_start = ph_sample
        else:
            chain_start = persistent

        # 执行negative phase
        # 为了实现CD/PCD我们需要scan实现一步gibbs的函数k次
        # the scan 将返回整个Gibbs链
        (
            [
                pre_sigmoid_nvs,
                nv_means,
                nv_samples,
                pre_sigmoid_nhs,
                nh_means,
                nh_samples
            ],
            updates
        ) = theano.scan(
            self.gibbs_hvh,
            # None 是占位符place holders
            outputs_info=[None, None, None, None, None, chain_start],
            n_steps=k,
            name="gibbs_hvh"
        )
        # 如果我们直接使用T.grad,函数可能遍历Gibbs链来获得梯度,这不是我们想要的,
        # 因为会混淆,因此我们需要表明chain_end是一个常量by consider_constant
        chain_end = nv_samples[-1]
        cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))
        gparams = T.grad(cost, self.params, consider_constant=[chain_end])

        for gparam, param in zip(gparams, self.params):
            updates[param] = param - gparam * T.cast(lr, dtype=theano.config.floatX)
        if persistent:
            updates[persistent] = nh_samples[-1]
            # pseudo-likelihood is a better proxy for PCD
            monitoring_cost = self.get_pseudo_likelihood_cost(updates)
        else:
            # reconstruction cross-entropy is a better proxy for CD
            monitoring_cost = self.get_reconstruction_cost(updates, pre_sigmoid_nvs[-1])

        return monitoring_cost, updates

 跟踪进展

RBMs很难训练,因为partition函数Z,我们在训练中不能估计log-likelihood,我们没有直接有用的指标来选择最优的超参数。

下面有几个options

观察负样本

负样本在训练中可以可视化,随着训练进行,我们知道模型越来越接近于真实的分布。负样本应该看起来像是训练集的样本,显然不好的参数应该丢弃。

Filters可视化观察

通过模型学习到的filters可以被可视化。

 

由于网站关闭,翻译不能进行,sorry。

 

posted @ 2016-07-06 11:31  Vivian_liwei  阅读(337)  评论(0编辑  收藏  举报