【SI152笔记】part5：约束非线性规划基础

SI152: Numerical Optimization

Lec 12: Gradient Projection and Frank-Wolfe algorithms

optimize $f$ within a feasible set $Ω$.

Convex set constraint

Theorem 1 (Normal Cone)
Given a nonempty convex $Ω ⊂ \mathbb{R}^n$ and $x\inΩ$, the normal cone of $Ω$ at $x$ is

\[\mathcal{N}_\Omega(x) := \{g | g^T(\bar{x} − x) ≤ 0 ,\forall \bar{x} \in Ω\} \]

Theorem 2
If $x^∗$ is a minimizer of $f$ in $Ω$, then $−\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)$.

General set constraint

Definition 3 (Tangent direction)
A direction $d \in \mathbb{R}^n$ is tangent to $Ω ⊂ \mathbb{R}^n$ at a point $x \in Ω$ if there exists a sequence of points $\{x_k\} \in Ω$ and positive scalars $\{τ_k\}$ such that $\displaystyle\lim_{k\to\infty} = 0$ and $d = \displaystyle\lim_{k\to\infty} \frac{1}{\tau_k} (x_k - x)$.

Definition 4 (Tangent cone)
The tangent cone corresponding to a set $Ω ⊂ \mathbb{R}^n$ at $x \in Ω$ is

\[\mathcal{T}_\Omega(x) := \{d | \text{$d$ is tangent to $Ω$ at $x$}\} \]

Theorem 5
If $x^∗$ is a minimizer of $f$ in $\Omega$, then $\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x)$.

For a set $C \subset\mathbb{R}^n$, the polar cone of $C$ is the set $C^o = \{y \in\mathbb{R}^n | y^T x ≤ 0, \forall d\in C\}$

For a convex set Ω, the normal cone $\mathcal{N}_\Omega(x)$ is precisely the polar of $\mathcal{T}_\Omega(x)$, meaning:
$\mathcal{N}_\Omega(x) = \mathcal{T}_\Omega(x)^o = \{v\in\mathbb{R} | v^T d ≤ 0 , \forall d \in \mathcal{T}_\Omega(x)\}.$
In this case, $\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x) \iff −\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)$

Gradient Projection Method

Frank-Wolfe Algorithm

Let 13: KKT and CQ

\[\begin{aligned} \min_{x}~ & f(x) \\ \text{s.t. }~ & c_i(x) = 0 , i\in\mathcal{E} \\ & c_i(x) \leq 0 , i\in\mathcal{I} \end{aligned} \]

It is difficult to describe $\mathcal{T}_\Omega(x)$ with

\[Ω := \{ x\in\mathbb{R}^n | c_i(x) = 0 , i\in\mathcal{E}; c_i(x) \leq 0 , i\in\mathcal{I} \} \]

KKT conditions

Under some conditions, an optimal solution satisfy the KKT conditions:

\[\begin{aligned} \nabla_x L(x, λ) = \nabla f(x) + \sum_{i\in \mathcal{E}\cup\mathcal{I}} \nabla c_i(x)λ = 0 ,& &\text{(Stationarity)}\\ c_i(x) = 0 ,& i\in\mathcal{E} &\text{(Primal feasibility)} \\ c_i(x) \leq 0 ,& i\in\mathcal{I} &\text{(Primal feasibility)} \\ λ_i \geq 0 ,& i\in\mathcal{I} &\text{(Dual feasibility)} \\ λ_i c_i(x) = 0 ,& i\in\mathcal{I} &\text{(Complementary slackness)} \end{aligned} \]

Let $\mathcal{A}(x)$ be the set of active constraint at $x$. Under some conditions, an optimal solution is feasible and satisfy:

\[\begin{aligned} \nabla_x L(x, λ) = \nabla f(x) + \sum_{i\in \mathcal{E}\cup\mathcal{A}(x)} \nabla c_i(x)λ &= 0 \\ λ_i &\geq 0 ,& i\in\mathcal{\mathcal{A}(x)} \\ \end{aligned} \]

Farkas’s Theorem implies that there is no solution $d$ to

\[(\nabla f(x)^T d < 0) ~\land~ (\nabla c_i(x)^T d \leq 0, i\in\mathcal{A}(x)) ~\land~ (\nabla c_i(x)^T d = 0, i\in\mathcal{E}) \]

Equivalently, under some conditions, an optimal solution is feasible and satisfy

\[\begin{gathered} \nabla f(x)^T d \geq 0, \forall d \in\mathcal{F}_\Omega (x) \\ \mathcal{F}_\Omega (x) := \{ d\in\mathbb{R}^n | \nabla c_i(x)^T d \leq 0, i\in\mathcal{A}(x);\nabla c_i(x)^T d = 0, i\in\mathcal{E} \} \\ \end{gathered} \]

Generally, $\mathcal{T}_\Omega (x) ⊂ \mathcal{F}_\Omega (x)$. Set conditions to make them equivalence.

Constraint Qualification

LICQ

The linear independence constraint qualification (LICQ) is said to hold at $x$ if the set
of active constraint gradients $\{\nabla c_i(x) | i \in\mathcal{E}\cup\mathcal{A}(x)\}$ is linearly independent at x.

Affine CQ

all active constraints are affine, $x^*$ is is necessarily a KKT point..
Quadratic problems are also safe.

Weak Slater condition

The weak Slater condition is satisfied if there exists a feasible point strictly satisfying all non-affine inequalities, i.e., $\exists x$ such that

\[c_i(x) = 0, i \in \mathcal{E};~ c_i(x) ≤ 0, i \in\mathcal{I}(\text{affine});~ c_i(x) < 0, i \in\mathcal{I}(\text{non-affine}) \]

The weak condition implies the existence of a nonempty, closed, convex set $Λ^∗$ such that for all $λ^∗ \in Λ^∗$, the point $(x^∗, λ^∗)$ satisfies the KKT conditions

Strong Slater condition

The strong Slater condition is satisfied if

\[\nabla c_i(x), i\in\mathcal{E}\text{ are linearly independent} \]

and there exists a feasible point strictly satisfying all inequalities, i.e., $\exists x$ such that

\[c_i(x) = 0, i \in \mathcal{E}; c_i(x) < 0, i \in \mathcal{I} \]

The strong condition implies the existence of such a $Λ^∗$ that is bounded.

MFCQ

Duality

Let $Λ = \{λ : λ_i \geq 0, i\in\mathcal{I}\}$.

The primal problem is to find

\[\min_x L_P (x) = \min_x \sup_{\lambda\in\Lambda} L(x,\lambda) \]

The dual problem is to find

\[\max_{\lambda\in\Lambda} L_D (\lambda) = \max_{\lambda\in\Lambda} \inf_x L(x,\lambda) \]

The dual function $L_D(λ)$ is always concave, so that the dual problem is a convex problem!

Theorem 6 (Weak duality)
For every $x$ and $λ\in Λ$, we have $L_D(λ) ≤ L_P(x)$.

Definition 7 (Saddle points)
A point $(x^∗, λ^∗)$ with $λ^∗ \in Λ$ is called a saddle point of the Lagrangian if for all $x\in\mathbb{R}$ and $λ\in Λ$ we have $L(x^∗, λ) ≤ L(x^∗, λ^∗) ≤ L(x, λ^∗)$

Theorem 8
If the Lagrangian has a saddle point $(x^∗, λ^∗)$, then $x^∗$ is a solution of the primal problem, $λ^∗ \in Λ$ is a solution of the dual problem, and the following duality holds

\[\max_{\lambda\in\Lambda} L_D(\lambda) = \min_x L_P(x) \]

For nonconvex cases,

A saddle of the Lagrangian doesn’t mean the exists of KKT points (CQ needed).
KKT points may not be a saddle (may be local optimal, could be a saddle of the primal, or a maximum)

Theorem 9
If $f$ is convex, $c_i(x) , i \in \mathcal{E}$ are affine, and $c_i(x) , i \in \mathcal{I}$ are convex, then a point $x^∗$ satisfies the KKT conditions with $λ^∗\in Λ$ if and only if $(x^∗, λ^∗)$ is a saddle point of the Lagrangian.

Second-order conditions

Theorem 10
Suppose x∗ is a local solution at which the LICQ holds, and let λ∗ be the corresponding multipliers such that (x∗, λ∗) is a KKT point. Then,

\[d^T \nabla^2_{xx} L(x∗, λ∗)d \geq 0 , \forall d \in C(x∗, λ∗). \]

Here, $\nabla^2_{xx} L(x, λ)$ is the Hessian of the Lagrangian:

\[\nabla^2 L(x,\lambda) = \nabla^2 f(x) + \sum \lambda_i\nabla^2 c_i(x) \]

Theorem 11
Suppose $x^∗$ is a feasible point for which there is a Lagrange multiplier vector such that the KKT conditions hold. Suppose also that

\[d^T \nabla^2_{xx} L(x∗, λ∗)d > 0 , \forall d \in C(x∗, λ∗)\setminus\{0\}. \]

Then, $x^∗$ is a strict local solution.

posted @ 2025-01-02 00:38 Coinred 阅读(48) 评论(0) 收藏举报

刷新页面返回顶部

Coinred 的手稿们

——"An AC a day,keep the WA away."

【SI152笔记】part5：约束非线性规划基础

SI152: Numerical Optimization

Lec 12: Gradient Projection and Frank-Wolfe algorithms

Convex set constraint

General set constraint

Gradient Projection Method

Frank-Wolfe Algorithm

Let 13: KKT and CQ

KKT conditions

Constraint Qualification

LICQ

Affine CQ

Weak Slater condition

Strong Slater condition

MFCQ

Duality

Second-order conditions

公告

Coinred 的 手稿们

——"An AC a day,keep the WA away."

【SI152笔记】part5：约束非线性规划基础

SI152: Numerical Optimization

Lec 12: Gradient Projection and Frank-Wolfe algorithms

Convex set constraint

General set constraint

Gradient Projection Method

Frank-Wolfe Algorithm

Let 13: KKT and CQ

KKT conditions

Constraint Qualification

LICQ

Affine CQ

Weak Slater condition

Strong Slater condition

MFCQ

Duality

Second-order conditions

公告

Coinred 的手稿们