Naive MLP using BP Algorithm

  本文使用Matlab语言在不借助任何现有神经网络工具包的前提下,实现了一个简单的基于误差反向传播(BP)算法的多层感知器(MLP)神经网络,选择的激活函数为对数S型函数(Sigmoid)和最简单的线性函数(Linear)。与其他简易MLP分类器不同的是,本分类器使用向量计算来代替for循环,以提高执行效率(由于Matlab是脚本语言,带有复杂执行语言的for循环会大大降低效率,改用向量或矩阵计算可以利用并行计算的特性,大大提高执行速度,这也是Matlab思想所在)。

  注意到,while循环体中使用了批量学习(Batch Learning)算法,而非在线学习(Online Learning)算法,原因是后者在实际测试中很难收敛。引用一篇博文[http://hi.baidu.com/lixinxin555518/item/28b832e09f89e90f8d3ea89e]的解释来说明:

批处理(Batch)算法和在线学习(online)算法都是基于梯度下降原理的。区别在于前者选用的是较保守的算法,在对所有观测数据处理后取得一个最速下降方向进行迭代,而在线学习算法只着眼于当前的某一观测值,是一种随机梯度算法。前者的优点是收敛速度快,缺点是计算复杂,无法处理实时数据;后者的优点是计算量小,收敛速度慢,甚至有时产生震荡无法收敛。

  代码依赖的训练样本和测试样本可参考此文:http://www.cnblogs.com/awarrior/p/3282959.html

% Naive-MLP Classifier of Three Layers
%   This classifier is realized in a simple way, using logarithm sigmoid
%   function as hidden layer active neuron and linear function as output
%   layer active neuron, and employs BP algorithm in order to refine the
%   weights. Each sample contains two features which can be involved from
%   outer text files. Unlike other naive MLP classifiers, this one uses
%   matrix data structure better than for-end which can improve the
%   calculation performance.
%
%   Please use search engine or leave your comments if you can't understand
%   the details of this code.
%
%   Author: Justin Green
%   Date: 2013.8.25
%
close all
clear
clc

trainFile = 'train.txt';
testFile = 'test.txt';

% initialize unit number in different layers
inputNum = 2;
hiddenNum = 3;
outputNum = 8;

% initialize weight
wInputToHidden = 1 / power(inputNum * hiddenNum, 1/2) * rand(inputNum, hiddenNum);
wHiddenToOutput = 1 / power(hiddenNum * outputNum, 1/2) * rand(hiddenNum, outputNum);

% read training dataset
trainDS = textread(trainFile);
trainFeature = trainDS(:, 1:2);
trainClass = trainDS(:, 3);
trainSize = size(trainDS, 1);

% training data normalization
trainMAX = max(trainFeature);
trainMIN = min(trainFeature);
tmp = [size(trainFeature, 1), 1];
rainFeature = (trainFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp);

% construct target matrix
targetMatrix = zeros(trainSize, outputNum);
for i = 1 : trainSize
    targetMatrix(i, trainClass(i)) = 1;
end

% set parameters of classifier
errorThreshold = 0.05;
learningRate = 0.01;
momentum = 0.95;
iterateTime = 1000;

% -------------------------------------------------------------------------
% train classifier
deltaInputToHidden = zeros(inputNum, hiddenNum);
deltaHiddenToOutput = zeros(hiddenNum, outputNum);
iterator = 0;
tic
while iterator < iterateTime
    for i = 1 : trainSize
        % active function
        net = trainFeature(i, :) * wInputToHidden;
        hiddenFunction = 1 ./ (1 + exp(-net));
        outputFunction = hiddenFunction * wHiddenToOutput;
        outputFunction = 1 ./ (1 + exp(-outputFunction));
        % square error
        error = targetMatrix(i, :) - outputFunction;
        errorPerSample(i) = 0.5 * error * error';
        % trade off output error
        gradientOutput = hiddenFunction' * error;
        deltaHiddenToOutput = learningRate * gradientOutput + momentum * deltaHiddenToOutput;
        wHiddenToOutput = wHiddenToOutput + deltaHiddenToOutput;
        % trade off hidden error
        hiddenFunctionD = hiddenFunction .* (1 - hiddenFunction);
        gradientHidden = trainFeature(i, :)' * (hiddenFunctionD .* (wHiddenToOutput * error')');
        deltaInputToHidden = learningRate * gradientHidden + momentum * deltaInputToHidden;
        wInputToHidden = wInputToHidden + deltaInputToHidden;
    end

    % compare errorThreshold
    errorSum = sum(errorPerSample) / length(trainFeature);
    if errorSum < errorThreshold
        break
    end

    % plot error visually
    if mod(iterator, 1) == 0
        figure(1)
        plot(iterator, errorSum, 'b.')
        hold on
        grid on
    end

    iterator = iterator + 1;
end
toc

% -------------------------------------------------------------------------
% read testing dataset
testDS = textread(testFile);
testFeature = testDS(:,1:2);
testClass = testDS(:,3);
testSize = size(testDS, 1);

% testing data normalization
tmp = [size(testFeature, 1), 1];
testFeature = (testFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp);

% -------------------------------------------------------------------------
% test testing data using trained classifier
hiddenFunction = 1 ./ (1 + exp(-(testFeature * wInputToHidden + 1)));
outputFunction = hiddenFunction * wHiddenToOutput;

% calculate testing accuracy
[mv, mi] = max(outputFunction, [], 2);
err = testClass - mi;
yes = length(find(err == 0));
acc = yes / testSize * 100;
disp(sprintf('Testing Accuracy: %3.3f%%', acc))

% -------------------------------------------------------------------------
% test training data using trained classifier
hiddenFunction = 1 ./ (1 + exp(-(trainFeature * wInputToHidden + 1)));
outputFunction = hiddenFunction * wHiddenToOutput;

% calculate testing accuracy<br />[mv, mi] = max(outputFunction, [], 2);
error = trainClass - mi;
match = length(find(error == 0));
accuracy = match / trainSize * 100;
disp(sprintf('Training Accuracy: %3.3f%%', accuracy))

>>>>>>>>>>>>>>输出>>>>>>>>>>>>>>

Elapsed time is 2.807000 seconds.
Testing Accuracy: 69.800%
Training Accuracy: 87.143%

下图为训练误差值的变化 (样本为DS1):

 

 

 

 

posted @ 2013-08-25 16:50  awarrior  阅读(433)  评论(0编辑  收藏  举报