机器学习(Andrew Ng)作业代码(Exercise 3~4)

Programming Exercise 3: Multi-class Classification and Neural Networks

带正则化的多分类Logistic回归

lrCostFunction

K(K>2)分类Logistic回归中,可以构造K个分类器,第K个分类器的假设函数\(h_\theta(X)=P(y=K|X;\theta)\),即,其输出的是样本分类为K的概率。

损失函数与Ex2中的一样,

\[J(\theta)=\frac 1 m \sum_{i=1}^m[-y^{(i)}log(h_\theta(X^{(i)}))-(1-y^{(i)})log(1-h_\theta(X^{(i)}))]+\frac \lambda {2m} \sum_{i=1}^n \theta_i^2 \]

矩阵

\[X=\begin{pmatrix} X^{(1)}\\ \vdots\\ X^{(m)} \end{pmatrix}\]

\[y=\begin{pmatrix} y^{(1)}\\ \vdots\\ y^{(m)} \end{pmatrix}\]

\[\theta=\begin{pmatrix} \theta_{0}\\ \vdots\\ \theta_n \end{pmatrix}\]

\[h_\theta(X)=\begin{pmatrix} g(X^{(1)}\theta)\\ \vdots\\ g(X^{(m)}\theta) \end{pmatrix}=g(X\theta)\]

\[J(\theta)=\frac 1 m [-log(h_\theta(X))^Ty-log(1-h_\theta(X))^T(1-y)]+\frac \lambda {2m}\theta'^T\theta' \]

其中

\[\theta'=\begin{pmatrix} 0\\ \theta_{1}\\ \vdots\\ \theta_n \end{pmatrix}\]

梯度下降公式推导:

与Ex2相同,

\[\frac{\partial J(\theta)}{\partial \theta_0}=\frac 1 m \sum_{i=1}^m\ (g(X^{(i)}\theta)-y^{(i)})x_0^{(i)}=\frac 1 m (g(X\theta)-y)^T \begin{pmatrix} x_0^{(1)}\\ \vdots\\ x_0^{(m)} \end{pmatrix}\]

\[\frac{\partial J(\theta)}{\partial \theta_t}=\frac 1 m \sum_{i=1}^m\ (g(X^{(i)}\theta)-y^{(i)})x_t^{(i)}+\frac \lambda m \theta_t \]

\[=\frac 1 m (g(X\theta)-y)^T \begin{pmatrix} x_t^{(1)}\\ \vdots\\ x_t^{(m)} \end{pmatrix}+\frac \lambda m \theta_t ,\ \ \ \ t> 1\]

\[\begin{pmatrix} \frac{\partial J(\theta)}{\partial \theta_0} & \cdots & \frac{\partial J(\theta)}{\partial \theta_n} \end{pmatrix}= \frac 1 m (g(X\theta)-y)^TX+ \frac \lambda m \theta'^T \]

\[\begin{pmatrix} \frac{\partial J(\theta)}{\partial \theta_0} \\ \vdots \\ \frac{\partial J(\theta)}{\partial \theta_n} \end{pmatrix}= \frac 1 m X^T(g(X\theta)-y) +\frac \lambda m \theta' \]

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations. 
%
% Hint: When computing the gradient of the regularized cost function, 
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta; 
%           temp(1) = 0;   % because we don't add anything for j = 0  
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

    tmp=-log(sigmoid(X*theta))'*y-log(1-sigmoid(X*theta))'*(1-y);
    regterm=(lambda/(2*m))*(theta(2:length(theta))'*theta(2:length(theta)));
    
    J=tmp/m+regterm;
    
    theta2=theta;
    theta2(1)=0;
    grad=(X'*(sigmoid(X*theta)-y))/m+(lambda/m)*theta2;
% =============================================================

grad = grad(:);

end

oneVsAll

对应K分类问题,训练K个带正则化的二分类Logistic回归分类器,其中第i个分类器输出输入样本分类为i的概率

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logisitc regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda. 
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use 
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%
    for poslabel=1:10
        newy=(y==poslabel);
        initial_theta = zeros(n + 1, 1);
        options = optimset('GradObj', 'on', 'MaxIter', 50);
        %  Run fminunc to obtain the optimal theta
        %  This function will return theta and the cost 
        [theta, cost] = ...
            fmincg(@(t)(lrCostFunction(t, X, newy,lambda)), initial_theta, options);
        all_theta(poslabel,:)=theta;
    end

% =========================================================================


end

predictOneVsAll

用K分类的logistic回归对输入样本分类

只需输出预测概率最大的那个分类即可

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the 
%       max element, for more information see 'help max'. If your examples 
%       are in rows, then, you can use max(A, [], 2) to obtain the max 
%       for each row.
%       

    [~,p]=max(X*all_theta',[],2);
    
% =========================================================================


end

神经网络正向传播

predict

\(\theta_{i,j}^{(L)}\)表示第L层结点j到第L+1层结点i的参数,\(s_L\)表示第L层结点个数,\(a^{(L)}\)为第L层从L-1层接收数据,经激励函数Sigmoid处理后的输出值

\[\Theta^{(L)}=(\theta_{i,j}^{(L)})_{s_{L+1}\times s_{L}} \]

设输入数据为n维列向量\(x\),则令

\[a^{(1)}=\begin{pmatrix}1\\x\end{pmatrix} \]

\[a^{(2)}=\begin{pmatrix}1\\Sigmoid(\Theta^{(1)}a^{(1)})\end{pmatrix} \]

\[a^{(3)}=Sigmoid(\Theta^{(2)}a^{(2)}) \]

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%

    a1=[ones(m,1),X];
    a2=[ones(1,m);sigmoid(Theta1*a1')];
    a3=sigmoid(Theta2*a2);
    [~,p]=max(a3,[],1);
    p=p';

% =========================================================================


end

Programming Exercise 4: Neural Networks Learning

带正则化的两层MLP,损失函数为交叉熵

sigmoidGradient

直接写Sigmoid函数的导函数即可

\[Sigmoid'(x)=Sigmoid(x)(1-Sigmoid(x)) \]

function g = sigmoidGradient(z)
%SIGMOIDGRADIENT returns the gradient of the sigmoid function
%evaluated at z
%   g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function
%   evaluated at z. This should work regardless if z is a matrix or a
%   vector. In particular, if z is a vector or matrix, you should return
%   the gradient for each element.

g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the gradient of the sigmoid function evaluated at
%               each value of z (z can be a matrix, vector or scalar).
    g=sigmoid(z).*(1-sigmoid(z));
% =============================================================
end

nnCostFunction

前向传播过程与Ex3一样,这里不再赘述

交叉熵损失函数

\[J(\theta)=\frac 1 m \sum_{i=1}^m\sum_{k=1}^K[-y_k^{(i)}log((h_\theta(x^{(i)}))_k)-(1-y_k^{(i)})log(1-(h_\theta(x^{(i)}))_k)] \]

\[\delta_k^{(3)}=a_k^{(3)}-y_k \]

若输入样本的真实分类为k,则\(y_k=1\),否则为0

\[\delta^{(2)}=(\Theta^{(2)})^T\delta^{(3)}.*g'(z^{(2)}) \]

\[\Delta^{l}:=\Delta^{l}+\delta^{l+1}(a^{(l)})^T \]

\(\Delta^{l}_{i,j}\)表示第l层第j个结点到第l+1层第i个结点的参数,对应m个训练样本的梯度之和

则m个样本的平均梯度可以表示为

\[\frac \partial {\partial \Theta_{ij}^{(l)}}J(\Theta)=\frac 1 m \Delta_{ij}^{(l)} \]

再给损失函数加入正则化:

\[J(\theta)=\frac 1 m \sum_{i=1}^m\sum_{k=1}^K[-y_k^{(i)}log((h_\theta(x^{(i)}))_k)-(1-y_k^{(i)})log(1-(h_\theta(x^{(i)}))_k)]+ \frac \lambda {2m}[\sum_{j=1}^{s_2}\sum_{k=1}^{s_1}(\Theta_{j,k}^{(1)})^2+\sum_{j=1}^{s_3}\sum_{k=1}^{s_2}(\Theta_{j,k}^{(2)})^2]\]

\[\frac \partial {\partial \Theta_{ij}^{(l)}}J(\Theta)=\frac 1 m \Delta_{ij}^{(l)}+\frac \lambda m \Theta_{ij}^{(l)} \]

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
%               following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
%         variable J. After implementing Part 1, you can verify that your
%         cost function computation is correct by verifying the cost
%         computed in ex4.m
%
% Part 2: Implement the backpropagation algorithm to compute the gradients
%         Theta1_grad and Theta2_grad. You should return the partial derivatives of
%         the cost function with respect to Theta1 and Theta2 in Theta1_grad and
%         Theta2_grad, respectively. After implementing Part 2, you can check
%         that your implementation is correct by running checkNNGradients
%
%         Note: The vector y passed into the function is a vector of labels
%               containing values from 1..K. You need to map this vector into a 
%               binary vector of 1's and 0's to be used with the neural network
%               cost function.
%
%         Hint: We recommend implementing backpropagation using a for-loop
%               over the training examples if you are implementing it for the 
%               first time.
%
% Part 3: Implement regularization with the cost function and gradients.
%
%         Hint: You can implement this around the code for
%               backpropagation. That is, you can compute the gradients for
%               the regularization separately and then add them to Theta1_grad
%               and Theta2_grad from Part 2.
%


    a1=[ones(1,m);X'];
    z2=Theta1*a1;
    a2=[ones(1,m);sigmoid(z2)];
    z3=Theta2*a2;
    a3=sigmoid(z3);
    
    for i=1:m
        for k=1:size(a3,1)
            if(y(i)==k)
                J=J-log(a3(k,i));
            else
                J=J-log(1-a3(k,i));
            end
        end
    end
    J=J/m;
    
    J=J+lambda*(sum(sum(Theta1.*Theta1))+sum(sum(Theta2.*Theta2)))/(2*m);
    
    ay=a3;
    for i=1:m
        for num=1:size(a3,1)
            if(y(i)==num)
                ay(num,i)=1;
            else
                ay(num,i)=0;
            end
        end
    end
    
    for i=1:m
        delta3=(a3(:,i)-ay(:,i));
        delta2=(Theta2'*delta3).*sigmoidGradient([1;z2(:,i)]);
        Theta2_grad=Theta2_grad+delta3*a2(:,i)';
        Theta1_grad=Theta1_grad+delta2(2:end)*a1(:,i)';
    end
    
    Theta1_grad=Theta1_grad/m;
    Theta2_grad=Theta2_grad/m;
    
    %Regularization terms
    Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m)*Theta1(:,2:end);
    Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m)*Theta2(:,2:end);
% -------------------------------------------------------------

% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
posted @ 2018-07-08 15:10  YongkangZhang  阅读(328)  评论(0编辑  收藏  举报