2、Vectorization
这个实验其实主要是对于sparseAutoencoderCost.m向量化,但是由于之前已经进行了向量化,所以这其实没什么内容(之前其实感觉向量化,挺难的,但是第一个实验就向量化是有难度的,不过这个实验之前有很多向量化的提示)。就是自编码的结构参数,按照文档改一下,下载给的帮助代码。
- 问题:
1)
2)
3)
4)
5)
% Change the filenames if you've saved the files under different names % On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyte images = loadMNISTImages('train-images-idx3-ubyte'); labels = loadMNISTLabels('train-labels-idx1-ubyte'); % We are using display_network from the autoencoder code display_network(images(:,1:100)); % Show the first 100 images disp(labels(1:10));
下载MNIST数据集,和帮助导入MNIST的函数http://ufldl.stanford.edu/wiki/resources/mnistHelper.zip.
基本就是没什么内容了,然后自己根据前一节的函数sampleIMAGES.m,改写一下train.m 就可以了。
train.m
clear;clc;close all; %% CS294A/CS294W Programming Assignment Starter Code % Instructions % ------------ % % This file contains code that helps you get started on the % programming assignment. You will need to complete the code in sampleIMAGES.m, % sparseAutoencoderCost.m and computeNumericalGradient.m. % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. visibleSize = 28*28; % number of input units hiddenSize = 196; % number of hidden units sparsityParam = 0.1; % desired average activation of the hidden units. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", % in the lecture notes). % lambda = 0; lambda = 3e-3; % weight decay parameter % beta = 0; beta = 3; % weight of sparsity penalty term %从http://deeplearning.stanford.edu/wiki/index.php/Using_the_MNIST_Dataset获得的代码。 %下面这两个函数还要下载对应的 % Change the filenames if you've saved the files under different names % On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyte images = loadMNISTImages('train-images-idx3-ubyte'); labels = loadMNISTLabels('train-labels-idx1-ubyte'); % We are using display_network from the autoencoder code display_network(images(:,1:100)); % Show the first 100 images disp(labels(1:10)); set(gcf,'NumberTitle','off'); set(gcf,'Name','显示MNIST前100个数据'); %%====================================================================== %% STEP 1: %下面为根据本次试验修改的读取patches %patches = first 10000 images from the MNIST dataset % numpatches = 10000; patches = zeros(visibleSize, numpatches); for imageNum=1:numpatches patches(:,imageNum)=reshape(images(:,imageNum),visibleSize,1); end %%====================================================================== %% STEP 2: Implement sparseAutoencoderCost % % You can implement all of the components (squared error cost, weight decay term, % sparsity penalty) in the cost function at once, but it may be easier to do % it step-by-step and run gradient checking (see STEP 3) after each step. We % suggest implementing the sparseAutoencoderCost function using the following steps: % % (a) Implement forward propagation in your neural network, and implement the % squared error term of the cost function. Implement backpropagation to % compute the derivatives. Then (using lambda=beta=0), run Gradient Checking % to verify that the calculations corresponding to the squared error cost % term are correct. % % (b) Add in the weight decay term (in both the cost function and the derivative % calculations), then re-run Gradient Checking to verify correctness. % % (c) Add in the sparsity penalty term, then re-run Gradient Checking to % verify correctness. % % Feel free to change the training settings when debugging your % code. (For example, reducing the training set size or % number of hidden units may make your code run faster; and setting beta % and/or lambda to zero may be helpful for debugging.) However, in your % final submission of the visualized weights, please use parameters we % gave in Step 0 above. theta = initializeParameters(hiddenSize, visibleSize); [costBegin, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ... sparsityParam, beta, patches); %%====================================================================== %下面这个为检查函数computeNumericalGradient和函数sparseAutoencoderCost的代码 %检查完毕,就可以注释掉,不用再运行 %{ %% STEP 3: Gradient Checking % % Hint: If you are debugging your code, performing gradient checking on smaller models % and smaller training sets (e.g., using only 10 training examples and 1-2 hidden % units) may speed things up. % First, lets make sure your numerical gradient computation is correct for a % simple function. After you have implemented computeNumericalGradient.m, % run the following: %检查梯度的函数 computeNumericalGradient.m,开始编写computeNumericalGradient的时候开始验证 %然后就不用再检查 %checkNumericalGradient(); % Now we can use it to check your cost function and derivative calculations % for the sparse autoencoder. %下面这句语法上理解,其实一开始不懂,后面问了王鑫师姐,就完全懂了。 %下面的函数有两个参数,用逗号作为间隔,前一个参数是函数,后一个就是一个常数 %看 computeNumericalGradient(J, theta) 的函数定义,前一个参数J就是一个函数。 %这里就是用一个匿名函数作为参数,然后定义的x作为调用函数中的某一个参数变量 %所以后面如果调用这个参数的函数,那么送入的变量就会替换掉x %搞懂这块真不容易呀,不过弄懂了是很爽呀,真就看懂了 %不过当年MATLAB也不熟,看这块代码确实是有点费劲 numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ... hiddenSize, lambda, ... sparsityParam, beta, ... patches), theta); %下面就直接输出和theta[3289,1]维数一样的两个向量 % Use this to visually compare the gradients side by side disp([numgrad grad]); % Compare numerically computed gradients with the ones obtained from backpropagation diff = norm(numgrad-grad)/norm(numgrad+grad); fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n'); disp(diff); % Should be small. In our implementation, these values are % usually less than 1e-9. % When you got this working, Congratulations!!! %%====================================================================== %} %% STEP 4: After verifying that your implementation of % sparseAutoencoderCost is correct, You can start training your sparse % autoencoder with minFunc (L-BFGS). % Randomly initialize the parameters theta = initializeParameters(hiddenSize, visibleSize); % Use minFunc to minimize the function addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost % function. Generally, for minFunc to work, you % need a function pointer with two outputs: the % function value and the gradient. In our problem, % sparseAutoencoderCost.m satisfies this. options.maxIter = 400; % Maximum number of iterations of L-BFGS to run options.display = 'on'; [opttheta, costEnd] = minFunc( @(p) sparseAutoencoderCost(p, ... visibleSize, hiddenSize, ... lambda, sparsityParam, ... beta, patches), ... theta, options); %%====================================================================== %% STEP 5: Visualization W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); figure; display_network(W1', 12); set(gcf,'NumberTitle','off'); set(gcf,'Name','稀疏自编码后的第一层的权系数'); print -djpeg weights.jpg % save the visualization to a file
看这个输出的cost,也可以看出,模型越大,cost越大。
实验结果为如下,这些笔画。