Logistic regression中regularization失败的解决方法探索(文末附解决后code)

在matlab中做Regularized logistic regression

原理:

 

 

 

我的代码:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

    h = sigmoid(X*theta);
    theta2=[0;theta(2:end)];

    J_partial = sum((-y).*log(h)+(y-1).*log(1-h))./m;
    J_regularization= (lambda/(2*m)).*sum(theta2.^2);
    J = J_partial+J_regularization;

    grad_partial = sum((h-y).*X)/m;
    grad_regularization = lambda.*theta2./m;
    grad = grad_partial+grad_regularization;

% =============================================================

end

运行结果:

 

 

 

 

 

 

 

 

 

 

 

 

 标黄的与下面的预期对比发现不同

 

尝试删去

+grad_regularization

 

 

 

 

 部分结果符合预期,部分不符合

 

 

 

 

 

 尝试大佬代码

%Hypotheses
hx = sigmoid(X * theta);
%%The cost without regularization
J_partial = (-y' * log(hx) - (1 - y)' * log(1 - hx)) ./ m;
%%Regularization Cost Added
J_regularization = (lambda/(2*m)) * sum(theta(2:end).^2);
%%Cost when we add regularization
J = J_partial + J_regularization;
%Grad without regularization
grad_partial = (1/m) * (X' * (hx -y));
%%Grad Cost Added
grad_regularization = (lambda/m) .* theta(2:end);
grad_regularization = [0; grad_regularization];
grad = grad_partial + grad_regularization;

 

 

 

 

 

 完全成功!?我不李姐……

观察大佬代码发现,我和大佬的区别在于:

最开始的theta向量和计算J(theta)和grad时候使用sum的数目

故尝试修改👉和大佬数目一样多的sum

    h = sigmoid(X*theta);
    theta2=[0;theta(2:end)];

    J_partial = (-y).*log(h)+(y-1).*log(1-h)./m;
    J_regularization= (lambda/(2*m)).*sum(theta2.^2);
    J = J_partial+J_regularization;

    grad_partial = (h-y).*X/m;
    grad_regularization = lambda.*theta2./m;
    grad = grad_partial+grad_regularization;

结果:incompatible不兼容

 

 

 文档对该错误的解释如下

 

 事已至此,只好向大佬更近一步!

    h = sigmoid(X*theta);
   
    J_partial = (-y).*log(h)+(y-1).*log(1-h)./m;
    J_regularization= (lambda/(2*m)).*sum(theta(2:end).^2);
    J = J_partial+J_regularization;

    grad_partial = (h-y).*X/m;
    grad_regularization = lambda.*theta(2:end)./m;
    grad_regularization2=[0;grad_regularization];

    grad = grad_partial+grad_regularization2;

 

 

 

为什么还是不兼容?

到底哪里出了问题?

 

最后,尝试离大佬更近一步,把grad_partial里的(h-y).*X/m变成了(1/m) * (X' * (h -y))

    h = sigmoid(X*theta);
   
    J_partial = (1/m).*((-y).*log(h)+(y-1).*log(1-h));
    J_regularization= (lambda/(2*m)).*sum(theta(2:end).^2);
    J = J_partial+J_regularization;

    grad_partial = (1/m) * (X' * (h -y));
    grad_regularization = (lambda/m).*theta(2:end);
    grad_regularization = [0; grad_regularization];
    grad = grad_partial+ grad_regularization;

 

 

舒服了!

但,等等,上面怎么那么多行,数值还不对?看来不能完全靠大佬,还得自己改!!!

    h = sigmoid(X*theta);
   
    J_partial = (1/m).*sum((-y).*log(h)+(y-1).*log(1-h));
    J_regularization= (lambda/(2*m)).*sum(theta(2:end).^2);
    J = J_partial+J_regularization;

    grad_partial = (1/m) * (X' * (h -y));
    grad_regularization = (lambda/m).*theta(2:end);
    grad_regularization = [0; grad_regularization];
    grad = grad_partial+ grad_regularization;

最终,得到了满意的答案

 

 

 

 以及

 

 总结一下出现的问题

01不兼容,就像上面说明的那样,行列不匹配

(解决方法:查看有无sum、是值还是array,把系数往前放,修改两数相乘的顺序)

02加入grad_regularization后,grad(1,5)的后四项都出现了问题(很神奇地值相等),

一旦去掉又与正确值有小范围差距(缺少grad_regularization导致的)

说明grad_regularization存在问题

而如果一开始就将theta变为第一行元素是0的矩阵,很容易出现不兼容的问题

大佬的代码提示我们特殊情况可以分出来特殊处理,也就是:

在计算J(θ)不使用矩阵,而是用除0外、后面的θ直接产出需要的值

在计算grad时,由于输出也是矩阵,所以可以创建一个含0和其他θ的矩阵

这样既可以避免不兼容,也可以得出正确的结果

最终的部分code如下

    h = sigmoid(X*theta);
   
    J_partial = (1/m).*sum((-y).*log(h)+(y-1).*log(1-h));
    J_regularization= (lambda/(2*m)).*sum(theta(2:end).^2);
    J = J_partial+J_regularization;

    grad_partial = (1/m) * (X' * (h -y));
    grad_regularization = (lambda/m).*theta(2:end);
    grad_regularization = [0; grad_regularization];
    grad = grad_partial+ grad_regularization;

 

posted @ 2022-03-25 14:44  asandstar  阅读(90)  评论(0编辑  收藏  举报