对deeplearningToolBox的一点理解(SAE篇)
<pre name="code" class="cpp"><span style="font-family: Arial, Helvetica, sans-serif;">function test_example_SAE</span>
load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); //将数据一开始初始化
%% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]);
这里跳入saesetup函数,由函数可知返回的是sae的结构体
function sae = saesetup(size) for u = 2 : numel(size) //numel(size)=2 sae.ae{u-1} = nnsetup([size(u-1) size(u) size(u-1)]); %size(1)=784 size(2)=100 size(3)=784 end end
这里调用了nnsetup函数,由该函数可知返回的也是nn结构体,可以看出训练后是把nn替代成sae.
function nn = nnsetup(architecture) %NNSETUP creates a Feedforward Backpropagate Neural Network % nn = nnsetup(architecture) returns an neural network structure with n=numel(architecture) % layers, architecture being a n x 1 vector of layer sizes e.g. [784 100 10] nn.size = architecture; //architecture表示每一层由多少个神经元,总共有多少层(3) nn.n = numel(nn.size);//网络层数3 nn.activation_function = 'tanh_opt'; % Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh). nn.learningRate = 2; % learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs. nn.momentum = 0.5; % Momentum nn.scaling_learningRate = 1; % Scaling factor for the learning rate (each epoch) nn.weightPenaltyL2 = 0; % L2 regularization nn.nonSparsityPenalty = 0; % Non sparsity penalty nn.sparsityTarget = 0.05; % Sparsity target nn.inputZeroMaskedFraction = 0; % Used for Denoising AutoEncoders nn.dropoutFraction = 0; % Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf) nn.testing = 0; % Internal variable. nntest sets this to one. nn.output = 'sigm'; % output unit 'sigm' (=logistic), 'softmax' and 'linear' //对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数 //vW是更新参数时的临时参数,p是所谓的sparsity, for i = 2 : nn.n %生成两层权值和p{i} % weights and weight momentum nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1))); <span style="font-family: Arial, Helvetica, sans-serif;">//</span><span style="font-family: Arial, Helvetica, sans-serif;">随机取从-0.5到 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)))的权值序列</span> nn.vW{i - 1} = zeros(size(nn.W{i - 1})); <span style="font-family: Arial, Helvetica, sans-serif;">//</span><span style="font-family: Arial, Helvetica, sans-serif;">使vW与W空间相同,但为0矩阵</span> % average activations (for use with sparsity) nn.p{i} = zeros(1, nn.size(i)); //生成两个空矩阵,p{i}用来表示隐藏神经元j的平均活跃度(详情可见UFLDL教程) end end
程序跳回这一段
sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; <span style="font-family: Arial, Helvetica, sans-serif;">//</span><span style="font-family: Arial, Helvetica, sans-serif;">修改sae里面的各个参数</span> opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts);
这里将nn里的各个参数在sae里部分更改,然后又跳到saetrain函数
function sae = saetrain(sae, x, opts) for i = 1 : numel(sae.ae); disp(['Training AE ' num2str(i) '/' num2str(numel(sae.ae))]);//训练到第几代 sae.ae{i} = nntrain(sae.ae{i}, x, x, opts); t = nnff(sae.ae{i}, x, x); x = t.a{2}; %remove bias term x = x(:,2:end); //把第一列去掉 end end
这里转到nntrain函数,跳过前面的assert判定
loss.train.e = []; loss.train.e_frac = []; loss.val.e = []; loss.val.e_frac = []; opts.validation = 0; if nargin == 6 opts.validation = 1; end fhandle = []; if isfield(opts,'plot') && opts.plot == 1 //检查结构体opts是否包含由‘plot’指定的域,如果包含则返回逻辑1 fhandle = figure(); end m = size(train_x, 1); //m是训练样本的数量 //注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小 batchsize = opts.batchsize; numepochs = opts.numepochs;//表示循环的次数 numbatches = m / batchsize; assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');
L = zeros(numepochs*numbatches,1); n = 1;
for i = 1 : numepochs tic; kk = randperm(m); //把1到m这些数随机打乱得到的一个数字序列。 for l = 1 : numbatches batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :); //一批一批进行训练,每一批数目为batchsize,即600 //Add noise to input (for use in denoising autoencoder) 加入noise,这是denoising autoencoder需要使用到的部分 if(nn.inputZeroMaskedFraction ~= 0) //请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文 batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);//具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例 end batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :); //同理对y也进行一批一批的调用,与前面的batch_x对应 nn = nnff(nn, batch_x, batch_y); nn = nnbp(nn); nn = nnapplygrads(nn); L(n) = nn.L; //nn最后结果 n = n + 1; end t = toc; //这里计算出整个运算过程用了多少second if opts.validation == 1 loss = nneval(nn, loss, train_x, train_y, val_x, val_y); str_perf = sprintf('; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end)); else loss = nneval(nn, loss, train_x, train_y); str_perf = sprintf('; Full-batch train err = %f', loss.train.e(end)); end if ishandle(fhandle) nnupdatefigures(nn, fhandle, loss, opts, i); end disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]); nn.learningRate = nn.learningRate * nn.scaling_learningRate; //加速学习速率 end end
函数转为nnff,意为前向传播算法
function nn = nnff(nn, x, y) %NNFF performs a feedforward pass % nn = nnff(nn, x, y) returns an neural network structure with updated % layer activations, error and loss (nn.a, nn.e and nn.L) n = nn.n; m = size(x, 1); x = [ones(m,1) x]; nn.a{1} = x; //feedforward pass for i = 2 : n-1 //根据选择的激活函数不同进行正向传播计算 //可以回过头看nnsetup里面的第一个参数activation_function //sigm就是sigmoid switch nn.activation_function case 'sigm' % Calculate the unit's outputs (including the bias term) nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}'); case 'tanh_opt' nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}'); end //dropout计算部分 dropoutFraction是nnsetup中可以设置的一个参数 if(nn.dropoutFraction > 0) //>0则执行,去除偏差较大的部分 if(nn.testing) nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction); else nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction); nn.a{i} = nn.a{i}.*nn.dropOutMask{i}; end end //计算sparsity,nonSparsityPenalty是对没达到sparsitytarget的参数的惩罚系数 //calculate running exponential activations for use with sparsity if(nn.nonSparsityPenalty>0) //>0则执行 nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1); end //Add the bias term nn.a{i} = [ones(m,1) nn.a{i}]; end switch nn.output //输出层的结果 case 'sigm' nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}'); case 'linear' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; case 'softmax' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2))); nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2)); end //error and loss //计算error (计算输出层的e) nn.e = y - nn.a{n}; %y-H w,b(x) switch nn.output case {'sigm', 'linear'} nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;//见公式P9(UFLDL) case 'softmax' nn.L = -sum(sum(y .* log(nn.a{n}))) / m; end end
接下来跳转到nnbp函数
function nn = nnbp(nn) //NNBP performs backpropagation // nn = nnbp(nn) returns an neural network structure with updated weights n = nn.n; sparsityError = 0; switch nn.output case 'sigm' d{n} = - nn.e .* (nn.a{n} .* (1 - nn.a{n})); //见UFLDL反向传导算法公式2 case {'softmax','linear'} d{n} = - nn.e; end for i = (n - 1) : -1 : 2 //Derivative of the activation function激活函数的导数 switch nn.activation_function case 'sigm' d_act = nn.a{i} .* (1 - nn.a{i}); //UFLDLP15 对f'(Zi)的求导 case 'tanh_opt' d_act = 1.7159 * 2/3 * (1 - 1/(1.7159)^2 * nn.a{i}.^2); end if(nn.nonSparsityPenalty>0) //这些其实都是开关 pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1); sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))]; end // Backpropagate first derivatives if i+1==n // in this case in d{n} there is not the bias term to be removed d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; % Bishop (5.56) else // in this case in d{i} the bias term has to be removed d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act; %P13 end if(nn.dropoutFraction>0) d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}]; end end for i = 1 : (n - 1) if i+1==n nn.dW{i} = (d{i + 1}' * nn.a{i}) / size(d{i + 1}, 1);//P14(UFLDL教程)
else nn.dW{i} = (d{i + 1}(:,2:end)' * nn.a{i}) / size(d{i + 1}, 1); end end end
接下来跳到nnapplygrads函数,算出权值W的变化量和更新结果
function nn = nnapplygrads(nn) %NNAPPLYGRADS updates weights and biases with calculated gradients % nn = nnapplygrads(nn) returns an neural network structure with updated % weights and biases for i = 1 : (nn.n - 1) if(nn.weightPenaltyL2>0) //这又是什么鬼因子。。。 dW = nn.dW{i} + nn.weightPenaltyL2 * [zeros(size(nn.W{i},1),1) nn.W{i}(:,2:end)]; else dW = nn.dW{i}; end dW = nn.learningRate * dW; if(nn.momentum>0) nn.vW{i} = nn.momentum*nn.vW{i} + dW; //momentum一个引子 dW = nn.vW{i}; end nn.W{i} = nn.W{i} - dW; end end
跳回ntrain函数,得到L(n)
L(n) = nn.L; //nn最后结果 n = n + 1; end t = toc; //这里计算出整个运算过程用了多少second if opts.validation == 1 //开关 loss = nneval(nn, loss, train_x, train_y, val_x, val_y); str_perf = sprintf('; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end)); else loss = nneval(nn, loss, train_x, train_y); str_perf = sprintf('; Full-batch train err = %f', loss.train.e(end)); end if ishandle(fhandle) nnupdatefigures(nn, fhandle, loss, opts, i); end disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]); nn.learningRate = nn.learningRate * nn.scaling_learningRate; //加速学习速率 end end
由于validation=0,所以跳转到nneval函数,nneval函数检验神经网络的表现
function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y) %NNEVAL evaluates performance of neural network % Returns a updated loss struct assert(nargin == 4 || nargin == 6, 'Wrong number of arguments'); nn.testing = 1; % training performance nn = nnff(nn, train_x, train_y); loss.train.e(end + 1) = nn.L; % validation performance if nargin == 6 nn = nnff(nn, val_x, val_y); loss.val.e(end + 1) = nn.L; end nn.testing = 0; %calc misclassification rate if softmax if strcmp(nn.output,'softmax') [er_train, dummy] = nntest(nn, train_x, train_y); loss.train.e_frac(end+1) = er_train; if nargin == 6 [er_val, dummy] = nntest(nn, val_x, val_y); loss.val.e_frac(end+1) = er_val; end end end
跳回到nntrain,执行完后续后跳回saetrain
function sae = saetrain(sae, x, opts) for i = 1 : numel(sae.ae); disp(['Training AE ' num2str(i) '/' num2str(numel(sae.ae))]); sae.ae{i} = nntrain(sae.ae{i}, x, x, opts); t = nnff(sae.ae{i}, x, x); //将sae结果返回结构体t x = t.a{2}; %remove bias term x = x(:,2:end); //把第一行去掉 end end
这里设了结构体t,更新了x的值,跳回test_example_SAE
% Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; %更新了nn的权值W<pre name="code" class="cpp">
% Train the FFNNopts.numepochs = 1;opts.batchsize = 100;nn = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.16, 'Too big error');
面的代码是用于检测test_x,test_y和训练集x和y的偏差,即采用SAE最后还是要把结果归为nn结构体进行检测
参考资料:【面向代码】学习 Deep Learning(一)Neural Network