SI152: Numerical Optimization
Lec 12: Gradient Projection and Frank-Wolfe algorithms
optimize \(f\) within a feasible set \(Ω\).
Convex set constraint
Theorem 1 (Normal Cone)
Given a nonempty convex \(Ω ⊂ \mathbb{R}^n\) and \(x\inΩ\), the normal cone of \(Ω\) at \(x\) is
Theorem 2
If \(x^∗\) is a minimizer of \(f\) in \(Ω\), then \(−\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\).
General set constraint
Definition 3 (Tangent direction)
A direction \(d \in \mathbb{R}^n\) is tangent to \(Ω ⊂ \mathbb{R}^n\) at a point \(x \in Ω\) if there exists a sequence of points \(\{x_k\} \in Ω\) and positive scalars \(\{τ_k\}\) such that \(\displaystyle\lim_{k\to\infty} = 0\) and \(d = \displaystyle\lim_{k\to\infty} \frac{1}{\tau_k} (x_k - x)\).
Definition 4 (Tangent cone)
The tangent cone corresponding to a set \(Ω ⊂ \mathbb{R}^n\) at \(x \in Ω\) is
Theorem 5
If \(x^∗\) is a minimizer of \(f\) in \(\Omega\), then \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x)\).
For a set \(C \subset\mathbb{R}^n\), the polar cone of \(C\) is the set \(C^o = \{y \in\mathbb{R}^n | y^T x ≤ 0, \forall d\in C\}\)
For a convex set Ω, the normal cone \(\mathcal{N}_\Omega(x)\) is precisely the polar of \(\mathcal{T}_\Omega(x)\), meaning:
\(\mathcal{N}_\Omega(x) = \mathcal{T}_\Omega(x)^o = \{v\in\mathbb{R} | v^T d ≤ 0 , \forall d \in \mathcal{T}_\Omega(x)\}.\)
In this case, \(\nabla f(x^∗)^T d ≥ 0, \forall d \in\mathcal{T}_\Omega(x) \iff −\nabla f(x^∗) \in \mathcal{N}_\Omega(x^∗)\)
Gradient Projection Method
Frank-Wolfe Algorithm
Let 13: KKT and CQ
It is difficult to describe \(\mathcal{T}_\Omega(x)\) with
KKT conditions
Under some conditions, an optimal solution satisfy the KKT conditions:
Let \(\mathcal{A}(x)\) be the set of active constraint at \(x\). Under some conditions, an optimal solution is feasible and satisfy:
Farkas’s Theorem implies that there is no solution \(d\) to
Equivalently, under some conditions, an optimal solution is feasible and satisfy
Generally, \(\mathcal{T}_\Omega (x) ⊂ \mathcal{F}_\Omega (x)\). Set conditions to make them equivalence.
Constraint Qualification
The linear independence constraint qualification (LICQ) is said to hold at \(x\) if the set
of active constraint gradients \(\{\nabla c_i(x) | i \in\mathcal{E}\cup\mathcal{A}(x)\}\) is linearly independent at x.
Affine CQ
- all active constraints are affine, \(x^*\) is is necessarily a KKT point..
- Quadratic problems are also safe.
Weak Slater condition
The weak Slater condition is satisfied if there exists a feasible point strictly satisfying all non-affine inequalities, i.e., \(\exists x\) such that
- The weak condition implies the existence of a nonempty, closed, convex set \(Λ^∗\) such that for all \(λ^∗ \in Λ^∗\), the point \((x^∗, λ^∗)\) satisfies the KKT conditions
Strong Slater condition
The strong Slater condition is satisfied if
and there exists a feasible point strictly satisfying all inequalities, i.e., \(\exists x\) such that
- The strong condition implies the existence of such a \(Λ^∗\) that is bounded.
Let \(Λ = \{λ : λ_i \geq 0, i\in\mathcal{I}\}\).
The primal problem is to find
The dual problem is to find
The dual function \(L_D(λ)\) is always concave, so that the dual problem is a convex problem!
Theorem 6 (Weak duality)
For every \(x\) and \(λ\in Λ\), we have \(L_D(λ) ≤ L_P(x)\).
Definition 7 (Saddle points)
A point \((x^∗, λ^∗)\) with \(λ^∗ \in Λ\) is called a saddle point of the Lagrangian if for all \(x\in\mathbb{R}\) and \(λ\in Λ\) we have \(L(x^∗, λ) ≤ L(x^∗, λ^∗) ≤ L(x, λ^∗)\)
Theorem 8
If the Lagrangian has a saddle point \((x^∗, λ^∗)\), then \(x^∗\) is a solution of the primal problem, \(λ^∗ \in Λ\) is a solution of the dual problem, and the following duality holds
For nonconvex cases,
- A saddle of the Lagrangian doesn’t mean the exists of KKT points (CQ needed).
- KKT points may not be a saddle (may be local optimal, could be a saddle of the primal, or a maximum)
Theorem 9
If \(f\) is convex, \(c_i(x) , i \in \mathcal{E}\) are affine, and \(c_i(x) , i \in \mathcal{I}\) are convex, then a point \(x^∗\) satisfies the KKT conditions with \(λ^∗\in Λ\) if and only if \((x^∗, λ^∗)\) is a saddle point of the Lagrangian.
Second-order conditions
Theorem 10
Suppose x∗ is a local solution at which the LICQ holds, and let λ∗ be the corresponding multipliers such that (x∗, λ∗) is a KKT point. Then,
Here, \(\nabla^2_{xx} L(x, λ)\) is the Hessian of the Lagrangian:
Theorem 11
Suppose \(x^∗\) is a feasible point for which there is a Lagrange multiplier vector such that the KKT conditions hold. Suppose also that
Then, \(x^∗\) is a strict local solution.