证明最优化loss function+penalty等价于最优化带限制条件的loss function

Equivalence of constrained and unconstrained form for lasso

Problem 1 The unconstrained form of lasso

\[\operatorname{min}_{\beta}\|y-X \beta\|_{2}^{2}+\lambda\|\beta\|_{1} \tag{1} \]

Suppose we solve Problem 1 for a given \(\lambda\) and obtain its solution \(\beta_{\text{problem1}}^*(\lambda)\).

Problem 2 The constrained form of lasso

\[\operatorname{min}_{\beta}\|Y-X \beta\|_{2}^{2} \]

\[s.t. {\|\beta\|_{1} \leq s} \]

We can rewrite the constrained form into unconstrained form using Lagrangian mutiplier method.

The unconstrained form for the problem is given by:

\[\operatorname{min}_{\beta,v}\|Y-X \beta\|_{2}^{2}+v\left(\|\beta\|_{1}-s\right) \tag{2} \]

Since the objective and the constraints are convex, so we have the pair \((\beta^*,v^*)\) is primal-dual optimal if and only if it is a saddle-point of the Lagrangian.

\[\operatorname{min}_{\beta}\operatorname{max}_{v}\|Y-X \beta\|_{2}^{2}+v\left(\|\beta\|_{1}-s\right) =\operatorname{max}_{v}\operatorname{min}_{\beta}\|Y-X \beta\|_{2}^{2}+v\left(\|\beta\|_{1}-s\right) \]

First we solve the

\[\operatorname{min}_{\beta}\|Y-X \beta\|_{2}^{2}+v\left(\|\beta\|_{1}-s\right) \]

The form is the same with eq(1), so we have the same solution with Problem 1, i.e. \(\beta_{\text{problem2}}^*=\beta_{\text{problem1}}^*(v)\)

Then we solve

\[\operatorname{max}_{v}\|Y-X \beta^*(v)\|_{2}^{2}+v\left(\|\beta^*(v)\|_{1}-s\right) \]

The solution is \(v^*\).

Finally, we have that \(\beta^*_{\text{problem2}}=\beta_{\text{problem1}}^*(v^*)\)

Therefore,if we let \(\lambda\) in Problem 1 be \(v^*\), the solution in Problem 1 is \(\beta_{\text{problem1}}^*=\beta_{\text{problem1}}^*(\lambda)=\beta_{\text{problem1}}^*(v^*)\), this is the same with solution in Problem 2.

So the two forms are equivalent.

Equivalence of constrained and unconstrained form for Ridge Regression

Problem 1 The unconstrained form of ridge regression

\[\operatorname{min}_{\beta}\|y-X \beta\|_{2}^{2}+\lambda\|\beta\|_{2}^{2} \tag{3} \]

Suppose we solve Problem 3 using F.O.C for a given \(\lambda\) and obtain its solution \(\beta^*(\lambda)\).

Problem 2 The constrained form of ridge regression

\[\operatorname{min}_{\beta}\|Y-X \beta\|_{2}^{2} \]

\[s.t. {\|\beta\|_{2}^2 \leq s} \]

We can rewrite the constrained form into unconstrained form using Lagrangian mutiplier method.

The unconstrained form for the problem is given by:

\[\operatorname{min}_{\beta,v}\|Y-X \beta\|_{2}^{2}+v\left(\|\beta\|_{2}^{2}-s\right) \tag{4} \]

The first KKT condition (stationarity) says that the gradient with respect to \(\beta\) of the lagrangian equals to 0. Since s is independent on \(\beta\), so solving for the derivative of eq (3) is thus equivalent to solving for the derivate of eq (4) when \(\lambda=v\) .

The second KKT condition (complementarity) says that

\[v\left(\|\beta\|_{2}^{2}-s\right)=0 \]

Let \(s=\|\beta^*(\lambda)\|^2\), then we can find that \(v^*=\lambda\) and \(\beta^*=\beta^*(\lambda)\) satisfy the KKT conditions for Problem 2, so they are the solution of Problem 2, which is the same as the solution in Problem 1.