4、PCA and Whitening
- 总结:
1)前一个实验在自然图像上的拓展。
2)主要是展示了PCA处理后的图像的协方差矩阵基本为对角阵的这个现象,方便以后检验,而且要加正则项。
3)代码中给了一个如果从图像中选择小块的代码样例sampleIMAGESRAW.m
4)PCA降维的数据还是可以查看的,但是白化后的数据就没办法看了。
5)
- 问题:
1)对于PCAWhite的对角阵,展示了有无正则化的作用。但是正则化系数大小不同,的作用更具体的数学上的解释不懂。这个系数的大小变化的意义不懂。主要也是白化后的数据,没办法看的原因,不知道怎么看这个的技巧。
2)
3)
4)
5)
- 想法:
1)
2)
3)
4)
5)
实验需要下载代码pca_exercise.zip
实验代码pca_gen.m
clear;close all;clc; disp('当前正在执行的程序是:'); disp([mfilename('fullpath'),'.m']); %%================================================================ %% Step 0a: Load data % Here we provide the code to load natural image data into x. % x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to % the raw image data from the kth 12x12 image patch sampled. % You do not need to change the code below. x = sampleIMAGESRAW(); figure('name','Raw images'); randsel = randi(size(x,2),200,1); % A random selection of samples for visualization display_network(x(:,randsel)); %%================================================================ %% Step 0b: Zero-mean the data (by row) % You can make use of the mean and repmat/bsxfun functions. % -------------------- YOUR CODE HERE -------------------- %x为[144,10000]的矩阵,有10000个样本,每个样本为12x12的小块图像 avg=mean(x,1); x=x-repmat(avg,size(x,1),1); %%================================================================ %% Step 1a: Implement PCA to obtain xRot % Implement PCA to obtain xRot, the matrix in which the data is expressed % with respect to the eigenbasis of sigma, which is the matrix U. % -------------------- YOUR CODE HERE -------------------- xRot = zeros(size(x)); % You need to compute this sigma=x*x'/size(x,2); [u,s,v]=svd(sigma); diags=diag(s); xRot=u'*x; %%================================================================ %% Step 1b: Check your implementation of PCA % The covariance matrix for the data expressed with respect to the basis U % should be a diagonal matrix with non-zero entries only along the main % diagonal. We will verify this here. % Write code to compute the covariance matrix, covar. % When visualised as an image, you should see a straight line across the % diagonal (non-zero entries) against a blue background (zero entries). % -------------------- YOUR CODE HERE -------------------- covar = zeros(size(x, 1)); % You need to compute this covar=xRot*xRot'/size(xRot,2); % Visualise the covariance matrix. You should see a line across the % diagonal against a blue background. figure('name','Visualisation the covariance matrix of xRot'); imagesc(covar); %%================================================================ %% Step 2: Find k, the number of components to retain % Write code to determine k, the number of components to retain in order % to retain at least 99% of the variance. % -------------------- YOUR CODE HERE -------------------- k = 0; % Set k accordingly %注意这里配合使用的sum(diag(s)),diag抽取s中对角线的元素,为一个列向量,sum一下得一个数字 %没有diag,sum默认是对于列累加的,得到一个行向量 total=sum(diag(s)); for k=1:size(s,1) if sum(diag(s(1:k,1:k)))/total > 0.99 break; end end %下面为应用MATLAB函数的快速程序 %首先diags就是抽取了s中的对角线上的元素的列向量 %cumsum(diags)为累加对角线上元素各项和的累加向量 %sum(diags)为对角线元素的总和 %cumsum(diags)/sum(diags)为除出来的,一个累加和向量的百分比 %(cumsum(diags)/sum(diags))<=0.99为一个逻辑判断,小于0.99的保留 %diags((cumsum(diags)/sum(diags))<=0.99)抽取了小于0.99的数组中的元素 %(对应上面的逻辑判断的结果为 0 1) %最后的length才是统计长度,统计的是重组后的数组的长度 %这样计算的两个结果有偏差,由于一个是>一个是<,上面计算的k为116,下面的是115,符合原理 % diags=diag(s); % k = length(diags((cumsum(diags)/sum(diags))<=0.99)); %%================================================================ %% Step 3: Implement PCA with dimension reduction % Now that you have found k, you can reduce the dimension of the data by % discarding the remaining dimensions. In this way, you can represent the % data in k dimensions instead of the original 144, which will save you % computational time when running learning algorithms on the reduced % representation. % % Following the dimension reduction, invert the PCA transformation to produce % the matrix xHat, the dimension-reduced data with respect to the original basis. % Visualise the data and compare it to the raw data. You will observe that % there is little loss due to throwing away the principal components that % correspond to dimensions with low variation. % -------------------- YOUR CODE HERE -------------------- xHat = zeros(size(x)); % You need to compute this xHat=u*[u(:,1:k),zeros(size(u,1),size(u,2)-k)]'*x; % Visualise the data, and compare it to the raw data % You should observe that the raw and processed data are of comparable quality. % For comparison, you may wish to generate a PCA reduced image which % retains only 90% of the variance. figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xHat(:,randsel)); %由于数据在前面进行了0均值处理,所以PCA还原后的数据要和0均值处理之后的数据,进行比较,也就是后面的图像 figure('name','Raw images'); display_network(x(:,randsel)); %%================================================================ %% Step 4a: Implement PCA with whitening and regularisation % Implement PCA with whitening and regularisation to produce the matrix % xPCAWhite. epsilon = 0.1; xPCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- %对于PCAWhite处理,如果要降维,那么是在PCAWhite白化之后进行降维处理更加简单 %不显示PCAWhite后的数据,是由于这个数据其实正如pca_2d实验中,和原始数据差距太大 %而本实验更多侧重于体现PCA降维后,能最大程度保留原始数据的信息的能力 xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x; %%================================================================ %% Step 4b: Check your implementation of PCA whitening % Check your implementation of PCA whitening with and without regularisation. % PCA whitening without regularisation results a covariance matrix % that is equal to the identity matrix. PCA whitening with regularisation % results in a covariance matrix with diagonal entries starting close to % 1 and gradually becoming smaller. We will verify these properties here. % Write code to compute the covariance matrix, covar. % % Without regularisation (set epsilon to 0 or close to 0), % when visualised as an image, you should see a red line across the % diagonal (one entries) against a blue background (zero entries). % With regularisation, you should see a red line that slowly turns % blue across the diagonal, corresponding to the one entries slowly % becoming smaller. % -------------------- YOUR CODE HERE -------------------- %由于实验说要对比xPCAWhite有无regularization的情况,所以就同时显示两幅图 covar=xPCAWhite*xPCAWhite'/size(xPCAWhite,2); % Visualise the covariance matrix. You should see a red line across the % diagonal against a blue background. covarwithregu=diag(covar); figure('name','Visualisation the covariance matrix of xPCAWhite with regularization'); imagesc(covar); %上面为xPCAWhite with regularization,下面为xPCAWhite with no regularization epsilon = 1e-10; xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x; covar=xPCAWhite*xPCAWhite'/size(xPCAWhite,2); covarwithnoregu=diag(covar); figure('name','Visualisation the covariance matrix of xPCAWhite with no regularization'); imagesc(covar); %从上面的图,可以看出,加入正则化项后,可以避免由于特征值越来越少,导致的数据溢出 %这也可以从36行,提取出来的diags看出,具体值的变化,最大的特征值和倒数第二小的特征值差距900倍。 %%================================================================ %% Step 5: Implement ZCA whitening % Now implement ZCA whitening to produce the matrix xZCAWhite. % Visualise the data and compare it to the raw data. You should observe % that whitening results in, among other things, enhanced edges. xZCAWhite = zeros(size(x)); % -------------------- YOUR CODE HERE -------------------- %UFLDL中说ZCAWhite能够加强边缘,其实这算看出一点来了。 %但是对于epsilon的值的增加的变化的情况,没看出效果来 %由于上面对于不降维的ZCAWhite的epsilon的值增加没看出效果来 %所以想着对于降维的ZCAWhite看看epsilon的值增加的效果 %但是还是没有看出效果来,以后数学更好,再来解析吧 epsilon = 1; xZCAWhite=u*diag(1./sqrt(diag(s)+epsilon))*u'*x; % Visualise the data, and compare it to the raw data. % You should observe that the whitened images have enhanced edges. figure('name','ZCA whitened images with epsilon 1'); display_network(xZCAWhite(:,randsel)); xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x; xDZCAWhite=u*[xPCAWhite(1:k,:);zeros(size(xPCAWhite,1)-k,size(xPCAWhite,2))]; figure('name',['ZCA whitened images with epsilon 1',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xDZCAWhite(:,randsel)); epsilon = 0.1; xZCAWhite=u*diag(1./sqrt(diag(s)+epsilon))*u'*x; % Visualise the data, and compare it to the raw data. % You should observe that the whitened images have enhanced edges. figure('name','ZCA whitened images with epsilon 0.1'); display_network(xZCAWhite(:,randsel)); xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x; xDZCAWhite=u*[xPCAWhite(1:k,:);zeros(size(xPCAWhite,1)-k,size(xPCAWhite,2))]; figure('name',['ZCA whitened images with epsilon 0.1',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xDZCAWhite(:,randsel)); epsilon = 0.01; xZCAWhite=u*diag(1./sqrt(diag(s)+epsilon))*u'*x; % Visualise the data, and compare it to the raw data. % You should observe that the whitened images have enhanced edges. figure('name','ZCA whitened images with epsilon 0.01'); display_network(xZCAWhite(:,randsel)); xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x; xDZCAWhite=u*[xPCAWhite(1:k,:);zeros(size(xPCAWhite,1)-k,size(xPCAWhite,2))]; figure('name',['ZCA whitened images with epsilon 0.01',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']); display_network(xDZCAWhite(:,randsel)); figure('name','Raw images'); display_network(x(:,randsel));
图片太多,意义不大。