梯度下降法
(梯度下降法,就是利用负梯度方向来决定每次迭代的新的搜索方向,使得每次迭代能使待优化的目标函数逐步减小。梯度下降法是2范数下的最速下降法。
最速下降法的一种简单形式是:x(k+1)=x(k)-a*g(k),其中a称为学习速率,可以是较小的常数。g(k)是x(k)的梯度。
直观的说,就是在一个有中心的等值线中,从初始值开始,每次沿着垂直等值线方向移动一个小的距离,最终收敛在中心。
对于某一个性能指数,我们能够运用梯度下降法,使这个指数降到最小。若该指数为均方误差,我们便得到了最小均方误差(LMS)算法。
BP算法也是要使性能指数-均方误差达到最小,它与LMS算法的差别在于对于权值导数(即梯度)的计算方式上。
LMS是运用在单层线性网络中的,误差是网络权值的显示线性函数,权值导数可直接求得。但其只能解决线性可分问题。
BP算法用于有隐藏层的多层网络中,权值和误差的关系比较复杂,需要用到微积分链法则。该算法可以解决线性不可分问题。 如何使用链法则,是个比较复杂的问题,可以参考《神经网络设计》一书)
最速下降梯度法matlab程序
% Steepest Descent Method
% By Kshitij Deshpande
clc
clear all
warning off
prompt = {'Coeficients if X1=','Coefficients of X2=','Coefficeint of X1X2=','Initial Point='};
def = {'[2 1 0]','[1 -1 0]','2','[0 0]'};
a=inputdlg(prompt,'Data',1,def);
a=char(a);
[m,n]=size(a);
x1 = eval(a(1,1:n));x2=eval(a(2,1:n));x1x2=eval(a(3,1:n));X1=eval(a(4,1:n));
delf1(1) = polyval(polyder(x1),X1(1));
delf1(1) = (delf1(1))+(x1x2*X1(2));
delf1(2) = polyval(polyder(x2),X1(1));
delf1(2) = (delf1(2))+(x1x2*X1(1));
s=-delf1;
%%%%%%%%%%
%report
srep(1,1:2)=s;
%%%%%%%%%%
x1new(1)=s(1)^2;x1new(2)=2*X1(1)*s(1);x1new(3) = X1(1)^2;
x1new=x1new*x1(1);
x1new_(2)=x1(2)*s(1);x1new_(3)=x1(2)*X1(1);
x1new = x1new+x1new_;
x2new(1)=s(2)^2;x2new(2)=2*X1(2)*s(2);x2new(3) = X1(2)^2;
x2new=x2new*x2(1);
x2new_(2)=x2(2)*s(2);x2new_(3)=x2(2)*X1(2);
x2new = x2new+x2new_;
x1x2new(1)=s(1)*s(2);x1x2new(2)=X1(1)*s(2)+X1(2)*s(1);x1x2new(3)=X1(1)*X1(2);
x1x2new=x1x2*x1x2new;
df = polyder(x1new+x2new+x1x2new);
lambda(1) = roots(df);
X1=X1+lambda(1)*s;
Xrep(1,1:2)=X1;
delf1(1) = polyval(polyder(x1),X1(1));
delf1(1) = (delf1(1))+(x1x2*X1(2));
delf1(2) = polyval(polyder(x2),X1(2));
delf1(2) = (delf1(2))+(x1x2*X1(1));
if all(X1)== 0
fprintf('%d %d is the optimum point',X1(1),X1(2));
end
itrep(1)=1;
it=2;
while all(delf1)==1
s=-delf1;
x1new(1)=s(1)^2;x1new(2)=2*X1(1)*s(1);x1new(3) = X1(1)^2;
x1new=x1new*x1(1);
x1new_(2)=x1(2)*s(1);x1new_(3)=x1(2)*X1(1);
x1new = x1new+x1new_;
x2new(1)=s(2)^2;x2new(2)=2*X1(2)*s(2);x2new(3) = X1(2)^2;
x2new=x2new*x2(1);
x2new_(2)=x2(2)*s(2);x2new_(3)=x2(2)*X1(2);
x2new = x2new+x2new_;
x1x2new(1)=s(1)*s(2);x1x2new(2)=X1(1)*s(2)+X1(2)*s(1);x1x2new(3)=X1(1)*X1(2);
x1x2new=x1x2*x1x2new;
df = polyder(x1new+x2new+x1x2new);
lambda(it) = roots(df);
X1=X1+lambda(it)*s;
delf1(1) = polyval(polyder(x1),X1(1));
delf1(1) = (delf1(1))+(x1x2*X1(2));
delf1(2) = polyval(polyder(x2),X1(2));
delf1(2) = (delf1(2))+(x1x2*X1(1));
itrep(it)=it;
srep(it,1:2)=s;
Xrep(it,1:2)=X1;
it=it+1;
end
[m,n]=size(itrep);
matrix=[itrep' srep(1:n,1) srep(1:n,2) Xrep(1:n,1) Xrep(1:n,2)];
answer = char(num2str(X1));
answer = ['The optimal point is [' answer ']'];
msgbox(answer,'Solution');
disp(' Press Any key to View Detailed Report............');
pause
echo off
report steep;
clc
用拟牛顿法求极小值点
[ux,sfval,uexit,uoutput,grid,hess]=fminunc(ff,x0)
Warning: Gradient must be provided for trust-region method;
using line-search method instead.
> In D:\MAT53\toolbox\optim\fminunc.m at line 202
Optimization terminated successfully:
Current search direction is a descent direction, and magnitude of
directional derivative in search direction less than 2*options.TolFun
ux =
1.0000 1.0000
sfval =
1.9118e-011
uexit =
1
uoutput =
iterations: 26
funcCount: 162
stepsize: 1.2992
firstorderopt: 5.0023e-004
algorithm: 'medium-scale: Quasi-Newton line search'
grid =
1.0e-003 *
-0.5002
-0.1888
hess =
820.4031 -409.5497
-409.5497 204.7720
在解寻优问题(Optimization)时,梯度下降法是最常使用的一个方法.某一纯量函数f(x),如果可微分的话,其梯度向量为▽xf,表示函数值在x方向增加最快的大小.▽xf=i*df/dx.如果g是两自变数x,y的函数,如g(x,y),则▽g= i*dg/dx+j*dg/dy可能在x-y平面上任意方向,这个方向上因Δx, Δy变动时Δg增加最大.是故▽g称之为方向性导数(Directional derivative).
以梯度下降法为基础的演算法,一旦梯度为零,演算的进展自然停止.当误差表面凹点不只一个时,演算的进行将可能停滞於任何凹点底部(梯度为零),而未必能寻得误差表面的真正最低凹点。
% 读入训练数据和测试数据
Input = [];
Output = [];
str = {'Test','Check'};
Data = textread([str{1},'.txt']);
% 读训练数据
Input = Data(:,1:end-1);
% 取数据表的前五列(主从成分)
Output = Data(:,end);
% 取数据表的最后一列(输出值)
Data = textread([str{2},'.txt']);
% 读测试数据
CheckIn = Data(:,1:end-1);
% 取数据表的前五列(主从成分)
CheckOut = Data(:,end);
% 取数据表的最后一列(输出值)
Input = Input';
Output = Output';
CheckIn = CheckIn';
CheckOut = CheckOut';
% 矩阵赚置
[Input,minp,maxp,Output,mint,maxt] = premnmx(Input,Output);
% 标准化数据
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 神经网络参数设置
%====可以修正处
Para.Goal = 0.0001;
% 网络训练目标误差
Para.Epochs = 800;
% 网络训练代数
Para.LearnRate = 0.1;
% 网络学习速率
%====
Para.Show = 5;
% 网络训练显示间隔
Para.InRange = repmat([-1 1],size(Input,1),1);
% 网络的输入变量区间
Para.Neurons = [size(Input,1)*2+1 1];
% 网络后两层神经元配置
Para.TransferFcn= {'logsig' 'purelin'};
% 各层的阈值函数
Para.TrainFcn = 'trainlm';
% 网络训练函数赋值
% traingd : 梯度下降后向传播法
% traingda : 自适应学习速率的梯度下降法
% traingdm : 带动量的梯度下降法
% traingdx :
% 带动量,自适应学习速率的梯度下降法
Para.LearnFcn = 'learngdm';
% 网络学习函数
Para.PerformFcn = 'sse';
% 网络的误差函数
Para.InNum = size(Input,1);
% 输入量维数
Para.IWNum = Para.InNum*Para.Neurons(1);
% 输入权重个数
Para.LWNum = prod(Para.Neurons);
% 层权重个数
Para.BiasNum = sum(Para.Neurons);
% 偏置个数
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Net = newff(Para.InRange,Para.Neurons,Para.TransferFcn,...
Para.TrainFcn,Para.LearnFcn,Para.PerformFcn);
% 建立网络
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Net.trainParam.show = Para.Show;
% 训练显示间隔赋值
Net.trainParam.goal = Para.Goal;
% 训练目标误差赋值
Net.trainParam.lr = Para.LearnRate;
% 网络学习速率赋值
Net.trainParam.epochs = Para.Epochs;
% 训练代数赋值
Net.trainParam.lr = Para.LearnRate;
Net.performFcn = Para.PerformFcn;
% 误差函数赋值
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 调试
Out1 =sim(Net,Input);
% 仿真刚建立的网络
Sse1 =sse(Output-Out1);
% 刚建立的网络误差
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[Net TR] = train(Net,Input,Output);
% 训练网络并返回
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Out3 =sim(Net,Input);
% 对学习训练后的网络仿真