【SI152笔记】part5:约束非线性规划基础

SI152: Numerical Optimization

Lec 12: Gradient Projection and Frank-Wolfe algorithms

optimize \(f\) within a feasible set \(Ω\).

Convex set constraint

Theorem 1 (Normal Cone)
Given a nonempty convex \(Ω ⊂ \mathbb{R}^n\) and \(x\inΩ\), the normal cone of \(Ω\) at \(x\) is

\[\mathcal{N}_\Omega(x) := \{g | g^T(\bar{x} − x) ≤ 0 ,\forall \bar{x} \in Ω\} \]

Theorem 2
If \(x^∗\) is a minimizer of \(f\) in \(Ω\), then \(−\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\).

General set constraint

Definition 3 (Tangent direction)
A direction \(d \in \mathbb{R}^n\) is tangent to \(Ω ⊂ \mathbb{R}^n\) at a point \(x \in Ω\) if there exists a sequence of points \(\{x_k\} \in Ω\) and positive scalars \(\{τ_k\}\) such that \(\displaystyle\lim_{k\to\infty} = 0\) and \(d = \displaystyle\lim_{k\to\infty} \frac{1}{\tau_k} (x_k - x)\).

Definition 4 (Tangent cone)
The tangent cone corresponding to a set \(Ω ⊂ \mathbb{R}^n\) at \(x \in Ω\) is

\[\mathcal{T}_\Omega(x) := \{d | \text{$d$ is tangent to $Ω$ at $x$}\} \]

Theorem 5
If \(x^∗\) is a minimizer of \(f\) in \(\Omega\), then \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x)\).

For a set \(C \subset\mathbb{R}^n\), the polar cone of \(C\) is the set \(C^o = \{y \in\mathbb{R}^n | y^T x ≤ 0, \forall d\in C\}\)

For a convex set Ω, the normal cone \(\mathcal{N}_\Omega(x)\) is precisely the polar of \(\mathcal{T}_\Omega(x)\), meaning:
\(\mathcal{N}_\Omega(x) = \mathcal{T}_\Omega(x)^o = \{v\in\mathbb{R} | v^T d ≤ 0 , \forall d \in \mathcal{T}_\Omega(x)\}.\)
In this case, \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x) \iff −\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\)

Gradient Projection Method

Frank-Wolfe Algorithm

Let 13: KKT and CQ

\[\begin{aligned} \min_{x}~ & f(x) \\ \text{s.t. }~ & c_i(x) = 0 , i\in\mathcal{E} \\ & c_i(x) \leq 0 , i\in\mathcal{I} \end{aligned} \]

It is difficult to describe \(\mathcal{T}_\Omega(x)\) with

\[Ω := \{ x\in\mathbb{R}^n | c_i(x) = 0 , i\in\mathcal{E}; c_i(x) \leq 0 , i\in\mathcal{I} \} \]

KKT conditions

Under some conditions, an optimal solution satisfy the KKT conditions:

\[\begin{aligned} \nabla_x L(x, λ) = \nabla f(x) + \sum_{i\in \mathcal{E}\cup\mathcal{I}} \nabla c_i(x)λ = 0 ,& &\text{(Stationarity)}\\ c_i(x) = 0 ,& i\in\mathcal{E} &\text{(Primal feasibility)} \\ c_i(x) \leq 0 ,& i\in\mathcal{I} &\text{(Primal feasibility)} \\ λ_i \geq 0 ,& i\in\mathcal{I} &\text{(Dual feasibility)} \\ λ_i c_i(x) = 0 ,& i\in\mathcal{I} &\text{(Complementary slackness)} \end{aligned} \]

Let \(\mathcal{A}(x)\) be the set of active constraint at \(x\). Under some conditions, an optimal solution is feasible and satisfy:

\[\begin{aligned} \nabla_x L(x, λ) = \nabla f(x) + \sum_{i\in \mathcal{E}\cup\mathcal{A}(x)} \nabla c_i(x)λ &= 0 \\ λ_i &\geq 0 ,& i\in\mathcal{\mathcal{A}(x)} \\ \end{aligned} \]

Farkas’s Theorem implies that there is no solution \(d\) to

\[(\nabla f(x)^T d < 0) ~\land~ (\nabla c_i(x)^T d \leq 0, i\in\mathcal{A}(x)) ~\land~ (\nabla c_i(x)^T d = 0, i\in\mathcal{E}) \]

Equivalently, under some conditions, an optimal solution is feasible and satisfy

\[\begin{gathered} \nabla f(x)^T d \geq 0, \forall d \in\mathcal{F}_\Omega (x) \\ \mathcal{F}_\Omega (x) := \{ d\in\mathbb{R}^n | \nabla c_i(x)^T d \leq 0, i\in\mathcal{A}(x);\nabla c_i(x)^T d = 0, i\in\mathcal{E} \} \\ \end{gathered} \]

Generally, \(\mathcal{T}_\Omega (x) ⊂ \mathcal{F}_\Omega (x)\). Set conditions to make them equivalence.

Constraint Qualification

LICQ

The linear independence constraint qualification (LICQ) is said to hold at \(x\) if the set
of active constraint gradients \(\{\nabla c_i(x) | i \in\mathcal{E}\cup\mathcal{A}(x)\}\) is linearly independent at x.

Affine CQ

  • all active constraints are affine, \(x^*\) is is necessarily a KKT point..
  • Quadratic problems are also safe.

Weak Slater condition

The weak Slater condition is satisfied if there exists a feasible point strictly satisfying all non-affine inequalities, i.e., \(\exists x\) such that

\[c_i(x) = 0, i \in \mathcal{E};~ c_i(x) ≤ 0, i \in\mathcal{I}(\text{affine});~ c_i(x) < 0, i \in\mathcal{I}(\text{non-affine}) \]

  • The weak condition implies the existence of a nonempty, closed, convex set \(Λ^∗\) such that for all \(λ^∗ \in Λ^∗\), the point \((x^∗, λ^∗)\) satisfies the KKT conditions

Strong Slater condition

The strong Slater condition is satisfied if

\[\nabla c_i(x), i\in\mathcal{E}\text{ are linearly independent} \]

and there exists a feasible point strictly satisfying all inequalities, i.e., \(\exists x\) such that

\[c_i(x) = 0, i \in \mathcal{E}; c_i(x) < 0, i \in \mathcal{I} \]

  • The strong condition implies the existence of such a \(Λ^∗\) that is bounded.

MFCQ

Duality

Let \(Λ = \{λ : λ_i \geq 0, i\in\mathcal{I}\}\).

The primal problem is to find

\[\min_x L_P (x) = \min_x \sup_{\lambda\in\Lambda} L(x,\lambda) \]

The dual problem is to find

\[\max_{\lambda\in\Lambda} L_D (\lambda) = \max_{\lambda\in\Lambda} \inf_x L(x,\lambda) \]

The dual function \(L_D(λ)\) is always concave, so that the dual problem is a convex problem!

Theorem 6 (Weak duality)
For every \(x\) and \(λ\in Λ\), we have \(L_D(λ) ≤ L_P(x)\).

Definition 7 (Saddle points)
A point \((x^∗, λ^∗)\) with \(λ^∗ \in Λ\) is called a saddle point of the Lagrangian if for all \(x\in\mathbb{R}\) and \(λ\in Λ\) we have \(L(x^∗, λ) ≤ L(x^∗, λ^∗) ≤ L(x, λ^∗)\)

Theorem 8
If the Lagrangian has a saddle point \((x^∗, λ^∗)\), then \(x^∗\) is a solution of the primal problem, \(λ^∗ \in Λ\) is a solution of the dual problem, and the following duality holds

\[\max_{\lambda\in\Lambda} L_D(\lambda) = \min_x L_P(x) \]

For nonconvex cases,

  • A saddle of the Lagrangian doesn’t mean the exists of KKT points (CQ needed).
  • KKT points may not be a saddle (may be local optimal, could be a saddle of the primal, or a maximum)

Theorem 9
If \(f\) is convex, \(c_i(x) , i \in \mathcal{E}\) are affine, and \(c_i(x) , i \in \mathcal{I}\) are convex, then a point \(x^∗\) satisfies the KKT conditions with \(λ^∗\in Λ\) if and only if \((x^∗, λ^∗)\) is a saddle point of the Lagrangian.

Second-order conditions

Theorem 10
Suppose x∗ is a local solution at which the LICQ holds, and let λ∗ be the corresponding multipliers such that (x∗, λ∗) is a KKT point. Then,

\[d^T \nabla^2_{xx} L(x∗, λ∗)d \geq 0 , \forall d \in C(x∗, λ∗). \]

Here, \(\nabla^2_{xx} L(x, λ)\) is the Hessian of the Lagrangian:

\[\nabla^2 L(x,\lambda) = \nabla^2 f(x) + \sum \lambda_i\nabla^2 c_i(x) \]

Theorem 11
Suppose \(x^∗\) is a feasible point for which there is a Lagrange multiplier vector such that the KKT conditions hold. Suppose also that

\[d^T \nabla^2_{xx} L(x∗, λ∗)d > 0 , \forall d \in C(x∗, λ∗)\setminus\{0\}. \]

Then, \(x^∗\) is a strict local solution.

posted @ 2025-01-02 00:38  Coinred  阅读(2)  评论(0编辑  收藏  举报