【SI152笔记】part5:约束非线性规划基础
SI152: Numerical Optimization
Lec 12: Gradient Projection and Frank-Wolfe algorithms
optimize \(f\) within a feasible set \(Ω\).
Convex set constraint
Theorem 1 (Normal Cone)
Given a nonempty convex \(Ω ⊂ \mathbb{R}^n\) and \(x\inΩ\), the normal cone of \(Ω\) at \(x\) is
Theorem 2
If \(x^∗\) is a minimizer of \(f\) in \(Ω\), then \(−\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\).
General set constraint
Definition 3 (Tangent direction)
A direction \(d \in \mathbb{R}^n\) is tangent to \(Ω ⊂ \mathbb{R}^n\) at a point \(x \in Ω\) if there exists a sequence of points \(\{x_k\} \in Ω\) and positive scalars \(\{τ_k\}\) such that \(\displaystyle\lim_{k\to\infty} = 0\) and \(d = \displaystyle\lim_{k\to\infty} \frac{1}{\tau_k} (x_k - x)\).
Definition 4 (Tangent cone)
The tangent cone corresponding to a set \(Ω ⊂ \mathbb{R}^n\) at \(x \in Ω\) is
Theorem 5
If \(x^∗\) is a minimizer of \(f\) in \(\Omega\), then \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x)\).
For a set \(C \subset\mathbb{R}^n\), the polar cone of \(C\) is the set \(C^o = \{y \in\mathbb{R}^n | y^T x ≤ 0, \forall d\in C\}\)
For a convex set Ω, the normal cone \(\mathcal{N}_\Omega(x)\) is precisely the polar of \(\mathcal{T}_\Omega(x)\), meaning:
\(\mathcal{N}_\Omega(x) = \mathcal{T}_\Omega(x)^o = \{v\in\mathbb{R} | v^T d ≤ 0 , \forall d \in \mathcal{T}_\Omega(x)\}.\)
In this case, \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x) \iff −\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\)
Gradient Projection Method
Frank-Wolfe Algorithm
Let 13: KKT and CQ
It is difficult to describe \(\mathcal{T}_\Omega(x)\) with
KKT conditions
Under some conditions, an optimal solution satisfy the KKT conditions:
Let \(\mathcal{A}(x)\) be the set of active constraint at \(x\). Under some conditions, an optimal solution is feasible and satisfy:
Farkas’s Theorem implies that there is no solution \(d\) to
Equivalently, under some conditions, an optimal solution is feasible and satisfy
Generally, \(\mathcal{T}_\Omega (x) ⊂ \mathcal{F}_\Omega (x)\). Set conditions to make them equivalence.
Constraint Qualification
LICQ
The linear independence constraint qualification (LICQ) is said to hold at \(x\) if the set
of active constraint gradients \(\{\nabla c_i(x) | i \in\mathcal{E}\cup\mathcal{A}(x)\}\) is linearly independent at x.
Affine CQ
- all active constraints are affine, \(x^*\) is is necessarily a KKT point..
- Quadratic problems are also safe.
Weak Slater condition
The weak Slater condition is satisfied if there exists a feasible point strictly satisfying all non-affine inequalities, i.e., \(\exists x\) such that
- The weak condition implies the existence of a nonempty, closed, convex set \(Λ^∗\) such that for all \(λ^∗ \in Λ^∗\), the point \((x^∗, λ^∗)\) satisfies the KKT conditions
Strong Slater condition
The strong Slater condition is satisfied if
and there exists a feasible point strictly satisfying all inequalities, i.e., \(\exists x\) such that
- The strong condition implies the existence of such a \(Λ^∗\) that is bounded.
MFCQ
Duality
Let \(Λ = \{λ : λ_i \geq 0, i\in\mathcal{I}\}\).
The primal problem is to find
The dual problem is to find
The dual function \(L_D(λ)\) is always concave, so that the dual problem is a convex problem!
Theorem 6 (Weak duality)
For every \(x\) and \(λ\in Λ\), we have \(L_D(λ) ≤ L_P(x)\).
Definition 7 (Saddle points)
A point \((x^∗, λ^∗)\) with \(λ^∗ \in Λ\) is called a saddle point of the Lagrangian if for all \(x\in\mathbb{R}\) and \(λ\in Λ\) we have \(L(x^∗, λ) ≤ L(x^∗, λ^∗) ≤ L(x, λ^∗)\)
Theorem 8
If the Lagrangian has a saddle point \((x^∗, λ^∗)\), then \(x^∗\) is a solution of the primal problem, \(λ^∗ \in Λ\) is a solution of the dual problem, and the following duality holds
For nonconvex cases,
- A saddle of the Lagrangian doesn’t mean the exists of KKT points (CQ needed).
- KKT points may not be a saddle (may be local optimal, could be a saddle of the primal, or a maximum)
Theorem 9
If \(f\) is convex, \(c_i(x) , i \in \mathcal{E}\) are affine, and \(c_i(x) , i \in \mathcal{I}\) are convex, then a point \(x^∗\) satisfies the KKT conditions with \(λ^∗\in Λ\) if and only if \((x^∗, λ^∗)\) is a saddle point of the Lagrangian.
Second-order conditions
Theorem 10
Suppose x∗ is a local solution at which the LICQ holds, and let λ∗ be the corresponding multipliers such that (x∗, λ∗) is a KKT point. Then,
Here, \(\nabla^2_{xx} L(x, λ)\) is the Hessian of the Lagrangian:
Theorem 11
Suppose \(x^∗\) is a feasible point for which there is a Lagrange multiplier vector such that the KKT conditions hold. Suppose also that
Then, \(x^∗\) is a strict local solution.