Naive MLP using BP Algorithm
本文使用Matlab语言在不借助任何现有神经网络工具包的前提下,实现了一个简单的基于误差反向传播(BP)算法的多层感知器(MLP)神经网络,选择的激活函数为对数S型函数(Sigmoid)和最简单的线性函数(Linear)。与其他简易MLP分类器不同的是,本分类器使用向量计算来代替for循环,以提高执行效率(由于Matlab是脚本语言,带有复杂执行语言的for循环会大大降低效率,改用向量或矩阵计算可以利用并行计算的特性,大大提高执行速度,这也是Matlab思想所在)。
注意到,while循环体中使用了批量学习(Batch Learning)算法,而非在线学习(Online Learning)算法,原因是后者在实际测试中很难收敛。引用一篇博文[http://hi.baidu.com/lixinxin555518/item/28b832e09f89e90f8d3ea89e]的解释来说明:
批处理(Batch)算法和在线学习(online)算法都是基于梯度下降原理的。区别在于前者选用的是较保守的算法,在对所有观测数据处理后取得一个最速下降方向进行迭代,而在线学习算法只着眼于当前的某一观测值,是一种随机梯度算法。前者的优点是收敛速度快,缺点是计算复杂,无法处理实时数据;后者的优点是计算量小,收敛速度慢,甚至有时产生震荡无法收敛。
代码依赖的训练样本和测试样本可参考此文:http://www.cnblogs.com/awarrior/p/3282959.html
% Naive-MLP Classifier of Three Layers % This classifier is realized in a simple way, using logarithm sigmoid % function as hidden layer active neuron and linear function as output % layer active neuron, and employs BP algorithm in order to refine the % weights. Each sample contains two features which can be involved from % outer text files. Unlike other naive MLP classifiers, this one uses % matrix data structure better than for-end which can improve the % calculation performance. % % Please use search engine or leave your comments if you can't understand % the details of this code. % % Author: Justin Green % Date: 2013.8.25 % close all clear clc trainFile = 'train.txt'; testFile = 'test.txt'; % initialize unit number in different layers inputNum = 2; hiddenNum = 3; outputNum = 8; % initialize weight wInputToHidden = 1 / power(inputNum * hiddenNum, 1/2) * rand(inputNum, hiddenNum); wHiddenToOutput = 1 / power(hiddenNum * outputNum, 1/2) * rand(hiddenNum, outputNum); % read training dataset trainDS = textread(trainFile); trainFeature = trainDS(:, 1:2); trainClass = trainDS(:, 3); trainSize = size(trainDS, 1); % training data normalization trainMAX = max(trainFeature); trainMIN = min(trainFeature); tmp = [size(trainFeature, 1), 1]; rainFeature = (trainFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp); % construct target matrix targetMatrix = zeros(trainSize, outputNum); for i = 1 : trainSize targetMatrix(i, trainClass(i)) = 1; end % set parameters of classifier errorThreshold = 0.05; learningRate = 0.01; momentum = 0.95; iterateTime = 1000; % ------------------------------------------------------------------------- % train classifier deltaInputToHidden = zeros(inputNum, hiddenNum); deltaHiddenToOutput = zeros(hiddenNum, outputNum); iterator = 0; tic while iterator < iterateTime for i = 1 : trainSize % active function net = trainFeature(i, :) * wInputToHidden; hiddenFunction = 1 ./ (1 + exp(-net)); outputFunction = hiddenFunction * wHiddenToOutput; outputFunction = 1 ./ (1 + exp(-outputFunction)); % square error error = targetMatrix(i, :) - outputFunction; errorPerSample(i) = 0.5 * error * error'; % trade off output error gradientOutput = hiddenFunction' * error; deltaHiddenToOutput = learningRate * gradientOutput + momentum * deltaHiddenToOutput; wHiddenToOutput = wHiddenToOutput + deltaHiddenToOutput; % trade off hidden error hiddenFunctionD = hiddenFunction .* (1 - hiddenFunction); gradientHidden = trainFeature(i, :)' * (hiddenFunctionD .* (wHiddenToOutput * error')'); deltaInputToHidden = learningRate * gradientHidden + momentum * deltaInputToHidden; wInputToHidden = wInputToHidden + deltaInputToHidden; end % compare errorThreshold errorSum = sum(errorPerSample) / length(trainFeature); if errorSum < errorThreshold break end % plot error visually if mod(iterator, 1) == 0 figure(1) plot(iterator, errorSum, 'b.') hold on grid on end iterator = iterator + 1; end toc % ------------------------------------------------------------------------- % read testing dataset testDS = textread(testFile); testFeature = testDS(:,1:2); testClass = testDS(:,3); testSize = size(testDS, 1); % testing data normalization tmp = [size(testFeature, 1), 1]; testFeature = (testFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp); % ------------------------------------------------------------------------- % test testing data using trained classifier hiddenFunction = 1 ./ (1 + exp(-(testFeature * wInputToHidden + 1))); outputFunction = hiddenFunction * wHiddenToOutput; % calculate testing accuracy [mv, mi] = max(outputFunction, [], 2); err = testClass - mi; yes = length(find(err == 0)); acc = yes / testSize * 100; disp(sprintf('Testing Accuracy: %3.3f%%', acc)) % ------------------------------------------------------------------------- % test training data using trained classifier hiddenFunction = 1 ./ (1 + exp(-(trainFeature * wInputToHidden + 1))); outputFunction = hiddenFunction * wHiddenToOutput; % calculate testing accuracy<br />[mv, mi] = max(outputFunction, [], 2); error = trainClass - mi; match = length(find(error == 0)); accuracy = match / trainSize * 100; disp(sprintf('Training Accuracy: %3.3f%%', accuracy))
>>>>>>>>>>>>>>输出>>>>>>>>>>>>>>
Elapsed time is 2.807000 seconds.
Testing Accuracy: 69.800%
Training Accuracy: 87.143%
下图为训练误差值的变化 (样本为DS1):