Deep Learning Tutorial (翻译) 之使用逻辑回归分类手写数字MNIST

英文原文请参考http://www.deeplearning.net/tutorial/logreg.html

模型

这里，我们将使用Theano实现最基本的分类器：逻辑回归，以及学习数学表达式如何映射成Theano图。

逻辑回归是一个基于概率的线性分类器，W和b为参数。通过投射输入向量到一组超平面，每个对应一个类，输入到一个平面的距离反应它属于对应类的概率。

那么输入向量x为i类的概率，数值表示如下：

预测类别为概率最大的类，及：

用Theano实现的代码如下：

 # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
        self.W = theano.shared(
            value=numpy.zeros(
                (n_in, n_out),
                dtype=theano.config.floatX
            ),
            name='W',
            borrow=True
        )
        self.b = theano.shared(
            value=numpy.zeros(
                (n_out),
                dtype=theano.config.floatX
            ),
            name='b',
            borrow=True
        )
        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
        self.y_pred = T.argmax(self.p_y_given_x, axis=-1)

模型的参数在训练中维持一个持久的状态，我们将W,b设为共享变量，也是Theano符号变量。

目前定义的模型还没有做任何有用的事情，接下来将介绍如何学习最优参数。

定义损失函数（Loss Function）

对于多类回归，常见的是使用negative log-likelihood作为损失。

在参数θ下，最大化数据集D的似然函数，让我们先定义似然函数和损失：

这里使用随机梯度下降的方法求最小值。

创建逻辑回归类

代码请参考源网址：http://www.deeplearning.net/tutorial/logreg.html

def negative_log_likelihood(self, y):
        '''
        :type y: theano.tensor.TensorType
        :param y: correct label
        :return:
        Note: 我们使用mean而不是sum是为了学习率更少地依赖于batch size
        p_y_given_x是vector类型
        '''
        # y.shape返回y的行数和列数，则y.shape[0]返回y的行数，即样本的总个数，因为一行是一个样本。
        # T.arange(n)，则是产生一组包含[0,1,...,n-1]的向量。
        # T.log(x)，则是对x求对数。记为LP
        # LP[T.arange(y.shape[0]),y]是一组向量，其元素是[ LP[0,y[0]], LP[1,y[1]],
        # LP[2,y[2]], ...,LP[n-1,y[n-1]] ]
        # T.mean(x)，则是求向量x中元素的均值。
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
    def errors(self, y):
        if y.ndim != self.y_pred.ndim:
            raise TypeError('y should have the same shape as self.y_pred',
                            ('y',y.type, 'y_pred', self.y_pred.type))
        if y.dtype.startwith('int'):
            # T.neq(y1, y2)是计算y1与y2对应元素是否相同，如果相同便是0，否则是1。
            # 举例：如果y1=[1,2,3,4,5,6,7,8,9,0] y2=[1,1,3,3,5,6,7,8,9,0]
            # 则，err = T.neq(y1,y2) = [0,1,0,1,0,0,0,0,0,0],其中有3个1，即3个元素不同
            # T.mean()的作用就是求均值。那么T.mean(err) = (0+1+0+1+0+0+0+0+0+0)/10 = 0.3,即误差率为30%
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()

训练模型

若要在大多数的编程语言中实现梯度下降算法，需要手动的推导出梯度表达式，这是一个非常麻烦的推导，而且最终结果也很复杂，特别是考虑到数值稳定性的问题的时候。　

然而，在Theano这个工具中，这个变得异常简单。因为它已经把求梯度这种运算给封装好了，不需要手动推导公式，只需要按照格式传入数据即可。

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)

updates = [(classifier.W, classifier.W - learning_rate * g_W),
               (classifier.b, classifier.b - learning_rate * g_b)]
train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

每一次调用train_model(index)，都会计算并返回输入样本块的cost，然后执行一次MSGD，并更新

As you will see shortly, validate_model is key to our early-stopping implementation .

test_model = thenao.function(
    inputs = [index],
    outputs = classifier.errors(y),
    givens = {
         x: test_set_x[index * batch_size: (index + 1) * batch_size],
         y: test_set_y[index * batch_size: (index + 1) * batch_size]
    }
)
validate_model = theano.function(
    inputs=[index],
    outputs=classifier.errors(y),
    givens={
        x: valid_set_x[index * batch_size: (index + 1) * batch_size],
        y: valid_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

完整代码

略（请参考官方教程）

参考目录

1.深度学习(DL)与卷积神经网络(CNN)学习笔记随笔-03-基于Python的LeNet之LR

2.官方教程

posted @ 2016-06-12 16:40 Vivian_liwei 阅读(404) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Vivian_liwei