飞夺泸定桥

我心飞扬

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

  本文使用LDA作为分类器在matlab下做实验。

  其中投影转换矩阵W按照LDA的经典理论生成,如下的LDA函数,并返回各个类的投影后的(k-1)维的类均值。

LDA.m代码如下:

View Code
function [W,centers]=LDA(Input,Target)
% Ipuut: n*d matrix,each row is a sample;
% Target: n*1 matrix,each is the class label
% W: d*(k-1) matrix,to project samples to (k-1) dimention
% cneters: k*(k-1) matrix,the means of each after projection


% 初始化
[n dim]
=size(Input);
ClassLabel
=unique(Target);
k
=length(ClassLabel);

nGroup
=NaN(k,1); % group count
GroupMean
=NaN(k,dim); % the mean of each value
W
=NaN(k-1,dim); % the final transfer matrix
centers
=zeros(k,k-1); % the centers of mean after projection
SB
=zeros(dim,dim); % 类间离散度矩阵
SW
=zeros(dim,dim); % 类内离散度矩阵

% 计算类内离散度矩阵和类间离散度矩阵
for i=1:k
group
=(Target==ClassLabel(i));
nGroup(i)
=sum(double(group));
GroupMean(i,:)
=mean(Input(group,:));
tmp
=zeros(dim,dim);
for j=1:n
if group(j)==1
t
=Input(j,:)-GroupMean(i,:);
tmp
=tmp+t'*t;
end
end
SW
=SW+tmp;
end
m
=mean(GroupMean);
for i=1:k
tmp
=GroupMean(i,:)-m;
SB
=SB+nGroup(i)*tmp'*tmp;
end

% % W 变换矩阵由v的最大的K-1个特征值所对应的特征向量构成
% v=inv(SW)*SB;
% [evec,eval]=eig(v);
% [x,d]=cdf2rdf(evec,eval);
% W=v(:,1:k-1);

% 通过SVD也可以求得
% 对K=(Hb,Hw)'进行奇异值分解可以转换为对Ht进行奇异值分解.P再通过K,U,sigmak求出来
% [P,sigmak,U]=svd(K,'econ');=>[U,sigmak,V]=svd(Ht,0);
[U,sigmak,V]
=svd(SW,0);
t
=rank(SW);
R
=sigmak(1:t,1:t);
P
=SB'*U(:,1:t)*inv(R);
[Q,sigmaa,W]=svd(P(1:k,1:t))
Y(:,
1:t)=U(:,1:t)*inv(R)*W;
W
=Y(:,1:k-1);

% 计算投影后的中心值
for i=1:k
group
=(Target==ClassLabel(i));
centers(i,:)
=mean(Input(group,:)*W);
end

  因为LDA是二类分类器,需要推广到多类的问题。常用的方法one-vs-all方法训练K个分类器(这个方法在综合时不知道怎么处理?),以及任意两个分类配对训练分离器最后得到k(k-1)/2个的二类分类器。本文采用训练后者对样本进行训练得到模型model。在代码中,model为数组struct。

用于训练的函数LDATraining.m

View Code
function [model,k,ClassLabel]=LDATraining(input,target)
%
input: n*d matrix,representing samples
% target: n
*1 matrix,class label
% model: struct type(see codes below)
% k: the total
class number
% ClassLabel: the
class name of each class
%
model
=struct;
[n
dim]=size(input);
ClassLabel
=unique(target);
k
=length(ClassLabel);

t
=1;
for i=1:k-1
for j=i+1:k
model(t).a
=i;
model(t).b
=j;
g1
=(target==ClassLabel(i));
g2
=(target==ClassLabel(j));
tmp1
=input(g1,:);
tmp2
=input(g2,:);
in=[tmp1;tmp2];
out
=ones(size(in,1),1);
out(
1:size(tmp1,1))=0;
% tmp3
=target(g1);
% tmp4
=target(g2);
% tmp3
=repmat(tmp3,length(tmp3),1);
% tmp4
=repmat(tmp4,length(tmp4),1);
% out
=[tmp3;tmp4];
[w m]
=LDA(in,out);
model(t).W
=w;
model(t).means
=m;
t
=t+1;
end
end

  在预测时,使用训练时生成的模型进行k(k-1)/2次预测,最后选择最多的分类作为预测结果。在处理二类分类器预测时,通过对预测样本作W的投影变换再比较与两个类的均值进行比较得到(不知道有没有更好的办法?)

用于预测的函数LDATesting.m

View Code
function target=LDATesting(input,k,model,ClassLabel)
%
input: n*d matrix,representing samples
% target: n
*1 matrix,class label
% model: struct type(see codes below)
% k: the total
class number
% ClassLabel: the
class name of each class
[n
dim]=size(input);
s
=zeros(n,k);
target
=zeros(n,1);

for j=1:k*(k-1)/2
a
=model(j).a;
b
=model(j).b;
w
=model(j).W;
m
=model(j).means;
for i=1:n
sample
=input(i,:);
tmp
=sample*w;
if norm(tmp-m(1,:))<norm(tmp-m(2,:))
s(i,a)
=s(i,a)+1;
else
s(i,b)
=s(i,b)+1;
end
end
end
for i=1:n
pos
=1;
maxV
=0;
for j=1:k
if s(i,j)>maxV
maxV
=s(i,j);
pos
=j;
end
end
target(i)
=ClassLabel(pos);
end

示例代码为:

function target=test(in,out,t)
[model,k,ClassLabel]
=LDATraining(in,out);
target
=LDATesting(t,k,model,ClassLabel);

  实验中对USPS数据集进行了测试,效果不怎么好,正确率才39%左右,而这个数据集使用KNN算法可以达到百分之百九十的正确率,汗!

posted on 2011-03-25 20:17  飞夺泸定桥  阅读(13706)  评论(4编辑  收藏  举报