关于近邻保持算法LLE,LPP,NPE等算法的实现---流行学习

需要注意的是:

1、在基于流行学习的方法中,所谓的M=(I-W)*(I-W)'中的I是单位矩阵。而不是1

2、对于图W,我们将系数按照一列一列的形式放,并且,Sw = Train_Ma*Mg*Train_Ma';Sb = Train_Ma*Train_Ma';

3、NPE在各种情况下的识别错误率对比,

第一种:ORL_56x46.mat数据库,不用PCA降维,选取每类样本的前k个样本,投影后的样本也做了归一化,SRC使用的是DALM_fast( [solution status]=SolveDALM_fast(Train_data,test_sample,0.01);)

         结论:(1) 样本投影后,可能并不能增加识别率,反而会降低识别率。

                  (2) 虽然NPE是基于重构约束的,但是投影后的样本并不具备很好的表示效果,即投影后的样本在使用SRC识别时,同样识别率下降很厉害。

                  (3) cai deng的代码中我没有调节降维维度,所以可能会差一点吧!!!

样本个数 2 3 4 5 6
原始数据上KNN 77.81 79.29 78.33 77 75
NPE(有监督,k=train_num-1)KNN识别 77.81 78.21 78.75 79 76.25
NPE(无监督,k=train_num-1)KNN识别 78.44 80.71 85 86 86.25
原始数据SRC 76.88 79.29 77.08 78 73.75
NPE(有监督,k=train_num-1)SRC识别 77.19 78.21 78.75 79 76.25
NPE(无监督,k=train_num-1)SRC识别 78.44 82.86 85.42 86.5 90.63
           
 our NPE 无监督 k=train_num-1 KNN dim=80    72.5      73.13
 our NPE 有监督k=traiin_num-1 KNN dim=80    74.29      73.13
 our NPE 无监督 k=train_num-1 SRC dim=80    74.29      74.38
 our NPE 有监督 k=train_num-1 SRC dim=80    74.29      71.88 
           

 

 

第二种:ORL_56x46.mat数据库,不用PCA降维,每类训练样本随机选取,循环20次,记录均值和标准差,投影后的样本也做了归一化,SRC使用的是DALM_fast( [solution status]=SolveDALM_fast(Train_data,test_sample,0.01);),识别错误率如下:

 

样本个数 2 3 4 5 6
原始数据上KNN  51.02±2.25  38.25±2.38      
NPE(有监督,k=train_num-1)KNN识别 52.45±2.16         
NPE(无监督,k=train_num-1)KNN识别  67.36±2.93 58.42±2.88       
原始数据SRC 49.36±2.20   35.31±2.23      
NPE(有监督,k=train_num-1)SRC识别 52.55±2.73         
NPE(无监督,k=train_num-1)SRC识别

 63.81±2.60

 65.06±3.65      
           
 our NPE 无监督 k=traiin_num-1 KNN dim=80  50.89±2.82  47.21±2.73      39.81±2.77
 our NPE 无监督 k=train_num-1 SRC dim=80  49.63±2.40  44.23±2.40      36.25±4.08
 our NPE 有监督 k=traiin_num-1 KNN dim=80  50.47±2.30  39.23±2.50      20.25±3.16
 our NPE 有监督 k=train_num-1 SRC dim=80  50.22±2.45  40.14±2.85      22.41±3.06
           

1 locally linear embedding(LLE)算法

LLE就是直接对M=(I-W)(I-W)进行特征值分解,取其最小特征向量集合。

参考文献:S. T. Roweis and L. K. Saul, 'Nonlinear dimensionality reduction by locally linear embedding', Science, vol. 290, no. 5500, pp. 2323-2326, 2000.

中文参考资料:http://wenku.baidu.com/view/19afb8d3b9f3f90f76c61bb0.html?from=search


clear all clc addpath ('\data set\'); load ORL_56x46.mat; % 40类 每类10 个样本 fea = double(fea); Train_Ma = fea'; % transformed to each column a sample % construct neighborhood matrix K_sample = zeros(size(Train_Ma,2),size(Train_Ma,2)); k = 10; % 近邻数 for i = 1:size(Train_Ma,2) NK = zeros(size(Train_Ma,2),1); for j = 1:size(Train_Ma,2) distance(i,j) = norm(Train_Ma(:,i)-Train_Ma(:,j)); end [value,state] = sort(distance(i,:),'ascend'); dd1(:,i) = value(2:k+1); neigh(:,i) = state(2:k+1); Sub_sample = Train_Ma(:,state(2:k+1)); Sub_sample = Sub_sample - repmat(Train_Ma(:,i),1,k);    % 这里计算表示权重的方式 貌似非常规 coeff = inv(Sub_sample'*Sub_sample)*ones(k,1); coeff = coeff/sum(coeff); W1(:,i) = coeff; NK(state(2:k+1)) = coeff; K_sample(:,i) = NK; % each row denotes the k nearest samples of the ith sample end M = (eye(size(Train_Ma,2))-K_sample)*(eye(size(Train_Ma,2))-K_sample)'; options.disp = 0; options.isreal = 1; options.issym = 1; [eigvector1, eigvalue] = eigs(M,101, 0, options); eigvalue = diag(eigvalue); 

2、matlab关于NPE算法的学习

主函数为:NPE_caideng_demo.m

% (neighborhood preserving embedding)NPE 算法学习 
% 基于蔡登的框架算法
% 文献请参考:He, X., Cai, D., Yan, S. and Zhang, H.-J. (2005)
% Neighborhood preserving embedding. In: Proceedings of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on. IEEE, 1208-1213.
clear all
clc
addpath ('data set\');
load ORL_56x46.mat;           % 40类 每类10 个样本 
fea = double(fea)';
sele_num = 3;                 % 选择每类的训练样本个数
nnClass = length(unique(gnd));  % The number of classes;
num_Class=[];
for i=1:nnClass
  num_Class=[num_Class length(find(gnd==i))]; %The number of samples of each class
end
%%------------------select training samples and test samples--------------%% 
Train_Ma=[];
Train_Lab=[];
Test_Ma=[];
Test_Lab=[];
for j=1:nnClass    
    idx=find(gnd==j);
%     randIdx=randperm(num_Class(j));       % 随机选择样本 
    randIdx  = [1:num_Class(j)];            % 选取每类样本的前几个样本作为训练样本
    Train_Ma = [Train_Ma; fea(idx(randIdx(1:sele_num)),:)];            % select select_num samples per class for training
    Train_Lab= [Train_Lab;gnd(idx(randIdx(1:sele_num)))];
    Test_Ma  = [Test_Ma;fea(idx(randIdx(sele_num+1:num_Class(j))),:)];  % select remaining samples per class for test
    Test_Lab = [Test_Lab;gnd(idx(randIdx(sele_num+1:num_Class(j))))];
end
Train_Ma = Train_Ma';                       % transform to a sample per column
Train_Ma = Train_Ma./repmat(sqrt(sum(Train_Ma.^2)),[size(Train_Ma,1) 1]);
Test_Ma = Test_Ma';
Test_Ma = Test_Ma./repmat(sqrt(sum(Test_Ma.^2)),[size(Test_Ma,1) 1]); 

% 调用 cai deng的 NPE代码
options = [];
options.k = sele_num-1;     % k近邻的个数
% options.NeighborMode = 'KNN';         % 选择无监督的话 用KNN
options.NeighborMode = 'Supervised';    % 选择有监督
options.gnd = Train_Lab;
[P_NPE, eigvalue] = NPE_caideng(options,Train_Ma');
Train_Maa = P_NPE'*Train_Ma;
Test_Maa = P_NPE'*Test_Ma;
Train_Maa = Train_Maa./repmat(sqrt(sum(Train_Maa.^2)),[size(Train_Maa,1) 1]);
Test_Maa = Test_Maa./repmat(sqrt(sum(Test_Maa.^2)),[size(Test_Maa,1) 1]);    
rate2 = KNN(Train_Maa',Train_Lab,Test_Maa',Test_Lab,1)*100;
error2 = 100-rate2

实验结果:选取ORL前3个样本的识别错误率为78.21%,而随机选取3个样本的话,识别率能够明显改善。  

 第二种,我们自己编写的NPE代码,需要说明是,代码段中举例说明了两处与别人代码中不一致的地方,我们按照标准的公式编写,但是实验发现,这两处地方并不影响识别结果。主函数NPE_jerry_demo.m

        

% (neighborhood preserving embedding)NPE 算法学习  无监督
% jerry 2016 3 22
clear all
clc
addpath ('G:\2015629房师兄代码\data set\');
load ORL_56x46.mat;           % 40类 每类10 个样本 
fea = double(fea)';
sele_num = 3;
nnClass = length(unique(gnd));  % The number of classes;
num_Class=[];
for i=1:nnClass
  num_Class=[num_Class length(find(gnd==i))]; %The number of samples of each class
end
%%------------------select training samples and test samples--------------%% 
Train_Ma=[];
Train_Lab=[];
Test_Ma=[];
Test_Lab=[];
for j=1:nnClass    
    idx=find(gnd==j);
%     randIdx=randperm(num_Class(j));
    randIdx  = [1:num_Class(j)];
    Train_Ma = [Train_Ma; fea(idx(randIdx(1:sele_num)),:)];            % select select_num samples per class for training
    Train_Lab= [Train_Lab;gnd(idx(randIdx(1:sele_num)))];
    Test_Ma  = [Test_Ma;fea(idx(randIdx(sele_num+1:num_Class(j))),:)];  % select remaining samples per class for test
    Test_Lab = [Test_Lab;gnd(idx(randIdx(sele_num+1:num_Class(j))))];
end
Train_Ma = Train_Ma';                       % transform to a sample per column
Train_Ma = Train_Ma./repmat(sqrt(sum(Train_Ma.^2)),[size(Train_Ma,1) 1]);
Test_Ma = Test_Ma';
Test_Ma = Test_Ma./repmat(sqrt(sum(Test_Ma.^2)),[size(Test_Ma,1) 1]); 
% construct neighborhood matrix

K_sample = zeros(size(Train_Ma,2),size(Train_Ma,2));
k = sele_num-1;                         % 近邻数
for i = 1:size(Train_Ma,2)
    NK = zeros(size(Train_Ma,2),1);
    for j = 1:size(Train_Ma,2)
        distance(i,j) = norm(Train_Ma(:,i)-Train_Ma(:,j));
    end
    [value,state]  = sort(distance(i,:),'ascend');
    dd1(:,i) = value(2:k+1);        % 第 i 个样本的 KNN距离值
    neigh(:,i) = state(2:k+1);      % 第 i 个样本的 KNN位置
    Sub_sample = Train_Ma(:,state(2:k+1));
%     Sub_sample = Sub_sample - repmat(Train_Ma(:,i),1,k);
%     coeff = inv(Sub_sample'*Sub_sample)*ones(k,1);% 上两句是      不同之处1
% %    别人NPE代码中使用的方法
    coeff = inv(Sub_sample'*Sub_sample)*Sub_sample'*Train_Ma(:,i); % 利用基于表示的方法 求解 表示系数p
    coeff = coeff/sum(coeff);
    W1(:,i) = coeff;       
    NK(state(2:k+1)) = coeff;
    K_sample(:,i) = NK;                % each row denotes the k nearest samples of the ith sample
end

M = (eye(size(Train_Ma,2))-K_sample)*(eye(size(Train_Ma,2))-K_sample)';         % 这里是I不是1
Sw = Train_Ma*M*Train_Ma';             
Sb = Train_Ma*Train_Ma';
% Sw = (Sw + Sw') / 2;      % 别人的NPE中 还加上了下面两行代码    不同之处2
% Sb = (Sb + Sb') / 2; % [eigvector1, eigvalue1] = eig((Sw+0.001*eye(size(Sw,1))),Sb); % inv(Sb)*Sw 求最小特征值对应的特征向量 这句有问题
% Pt = eigvector1(:,tt(1:dim)); %
[eigvector1, eigvalue1] = eig(Sw,Sb+0.001*eye(size(Sb,1))); % inv(Sb)*Sw 求最小特征值对应的特征向量 结果却求的最大值对应的特征向量
[eigvalue1,tt] = sort(diag(eigvalue1),'ascend');
dim = 80;
Pt = eigvector1(:,tt(end-dim+1:end));
Train_Maa = Pt'*Train_Ma;
Test_Maa = Pt'*Test_Ma;
Train_Maa = Train_Maa./repmat(sqrt(sum(Train_Maa.^2)),[size(Train_Maa,1) 1]);
Test_Maa = Test_Maa./repmat(sqrt(sum(Test_Maa.^2)),[size(Test_Maa,1) 1]);    
rate2 = KNN(Train_Maa',Train_Lab,Test_Maa',Test_Lab,1)*100;
error2 = 100-rate2

% 若用 基于表示的方法 会出现什么结果呢?
SRC_DP_accuracy = SRC_rec(Train_Maa,Train_Lab,Test_Maa,Test_Lab);
error_DP = 100-SRC_DP_accuracy

  

posted @ 2016-01-02 23:30  邪恶的亡灵  阅读(3721)  评论(1编辑  收藏  举报