数学知识:Convex Optimization A

1. Introduction

  • mathematical optimization
  • least-squares and linear programing
  • convex optimization
  • example
  • course goals and topics
  • nonlinear optimization
  • brief history of convex optimization

mathmetical optimization

optimization problem

minimize  f0(x)

subject to  fi(x)bi,i=1,...,m

  • x=(x1,...,xn): optization variables
  • f0:RnR: objective function
  • fi:RnR, i=1,...,m: constraint functions

optimal solution x has smallest value of f0 among all vectors that satisfy the constraints

examples

portfolio optimization

  • variables:amounts inveated in different assets
  • constraints:budget,max./min. investment per asset, minimum return
  • objective:overall risk or return variance

device sizing in eletronic circuits

  • variables: device widths and lengths
  • constraints: manufacturing limits, timing requirements, maximum area
  • objective: power consumption

data fiting

  • variables: model parameters
  • constraints: prior information, parameter limits
  • objective: measure of misfit or prediction error

solving optimization problems

general optimization problem

  • very difficult to solve
  • methods involve some compromise, e.g., very long computation time or not always finding the solution

examples: certain problem classes can be solved efficiently and reliably

  • least-squares problems
  • linaer programming problems
  • convex optimization problems

least-squares

minimum  Axb22

solving least-square problems

  • analytical solution: x=(ATA)1ATb
  • reliabe and efficient algorithems and software
  • computation time proportional to n2k(ARk×n); less if structured
  • a mature technology

using least-squares

  • least-squares problems are easy to recognize
  • a few standard techniques increase flexibility(e.g., including weights, adding regularization terms)

linear programming problem

minimize  cTx

subject to  aiTxbi, i=1,...,m

solving linear programs

  • no analytical formula for solution
  • reliable and efficient algorithms and software
  • computation time proportional to n2m if mn; less with structure
  • a mature technology

using linaer programming

  • not as easy as least-square problems
  • a few standard tricks used to convert problems to linear programs(e.g., problems involving l1 or l norms, piecewise-linear functions)

convex optimization problem

minimum  f0(x)

subject to fi(x)bi,i=1,...,m

  • objective and constraint are convex:

fi(αx+βy)αfi(x)+βfi(y)

if α+β=1,α0,β0

  • includes least-square problems and linaer program problems as special cases

solving convex optimization problem

  • no analytic solution
  • reliable and efficient algorithms
  • consumption time proportional to max{n3,n2m,nm2,F}, where F is cost of evaluating fi's and their first and second derivatives
  • almost a technology

using convex optimization

  • often difficult to recognize
  • many triks for transforming problems into convex form
  • surprisingly many problems can be solved via convex optimization

example

m lamps illuminating n(small, flat) patches
intensity Ik at patch k depends linearly on lamp power pj:

Ik=j=1makjpj,       akj=rkj2max{cosθkj,0}

problem: achieve desired illumination Ides with bounded lamp powers

minimium maxk=1,...,n |logIklogIdes|

subject to  0pjpmax, j=1,...,m

solution

  1. use uniform power: pj=p, vary p
  2. use least-squares:

minimum  k=1n(IkIdes)2

round pj if pj>pmax or pj<0

  1. use weighted least-squares:

minimize k=1n(IkIdes)2+j=1mwj(pjpmax/2)2

iteratively adjust weights wj until 0pjpmax

  1. use linear programming:

minimium  maxk=1,...,n |IkIdes|

subject to  0pjpmax, j=1,...,m

which can be solved via linear programming

  1. use convex optimization: problem is equivalent to

minimum  f0(p)=maxk=1,...,mh(Ik/Ides)

subject to  0pjpmax, j=1,...,m

with h(u)=max{u,1/u}
f0 is convex because maximum of convex functions is convex

additional constraints:
Does add 1 or 2 below complecate the problem?

  1. no more than 50% total power is in 10 lamps
  2. no more than half of lamps are on (pj>0)
  • answer: whit (1), still easy to solve; whit (2), extremely difficult
  • moral: (untrained) intuition doesn't always work; whitout the proper background very easy problems can appear quite similiar to very difficult

course goals and topics

goals

  1. recognize/formulate problems (such as illumination problem) as convex optimization problems
  2. develop code for problems of moderate size (1000 lamps, 5000 patchs)
  3. characterize optimal solutin (optimal power distribution), give limits of performace, etc.

topics

  1. convex sets, functions, optimization problems
  2. examples and applications
  3. algorithms

nonlinear optimization

...

2. Convex sets

  • affine and convex sets
  • some important example
  • operations that preserve convexity
  • genaralized inequalities
  • separating and supporting hyperplanes
  • dual cones and generalied inequalized

affine set

line through x1,x2: all points

x=θx1+(1θ)x2    (θR)

affine set: contains the line through any two distinct points in the set
example: soultion set of linear equation {x|Ax=b}
(conversely, every affine set can be expressed as solution set of system of linear equations)

convex set

line segment between x1 and x2: all points

x=θx+(1θ)x2

with 0θ1

convex set: contains any line segment between two points in the set

x1,x2C,  0θ1      θx1+(1θ)x2C

examples:

convex combination and convex hull

convex combination of x1,x2,...,xk: any point x of the form

x=θ1x1+θ2x2+...+θkxk

with θ1+...+θk=1,θk0

convex hull conv S:set of all convex combination of points in S

convex cone

conic (nonnegative) combination of x1 and x2: any point of the form

x=θ1x1+θ2x2

with θ10,θ20

convex cone: set that contains all conic combinations of points in the set

hyperplane and half-sapces

hyperplane: set of the form {x|aTx=b}(a0)

halfspace: set of the form {x|aTxb}(a0)

  • a is the normal vector
  • hyperplanes are affine and convex; halfspaces are convex

euclidean balls and ellipsoids

(euclipsoid) ball with center xc and radius r:

B(xc,r)={x| xxc2r}={xc+ux| u21}

ellipsoid: set of the form

{x| (xxc)TP1(xxc)1}

with PS++n (i.e.,P symmetic positive definite matrix)
other representation: {xc+Ax|u21} with A suqare and nonsigular

norm balls and cones

norm: a function that satisfis

  • x0; x=0 if and only if x=0
  • tx=|t| x for tR
  • x+yx+y
    notation: is general (unspecified) norm; symb is particular norm

norm ball with center xc and radius r: {x| xxcr}

norm cone: {(x,t)| xt}
euclidean norm cone is called second-order cone;
norm balls and cones are convex

polyhedra

solution set of finitely many linear inequalities and equalities

Axb,    Cx=d

(ARm×n, CRp×n,  is componentwise inequality)
polyhedron is intersection of finite number of halfspaces and hyperplanes

positive semidefinite cone

notation:

  • Sn is set of symmetric n×n matrices
  • S+n={XSn|X0}: positive semidefinite n×n matices

XS+n    zTXz0 for all z

S+n is a convex cone

  • S++n={XSn|X0}: positive definite n×n matrices

example: [xyyz]S+2

operations that preserve convexity

practical methods to establishing convexity of a set C

  1. apply definition

x1,x2C, 0θ1    θx1+(1θ)x2C

  1. show that C is obtained from simple convex sets (hyperplanes, halfspaces, norm balls, ...) by operations that preserve convexity
  • intersection
  • affine function
  • perspective function
  • linear-fractional function

intersection

the intersection of (any number of) convex sets is convex
example:

S={xRm| |p(t)|1 for |t|π/3}

where p(t)=x1cost+x2cos2t+...+xmcosmt

affine function

suppose f : RnRm is affine (f(x)=Ax+b with ARm×n, bRm)

  • the image of a convex set under f is convex

SRn convex    f(S)={f(x)|xS} convex

  • the inverse image f1(C) of a convex set under f is convex

CRm convexf1(C)={x|x1A1+...xmAmB} convex

example:

  • scaling, translationg, projection
  • solution set of linear matrix inequality {x|x1A1+...+xmAmB} (with Ai,BSp)
  • hyperbolic cone {x|xTPx(cTx)2,cTx0} (with PSn)

perspective and linear-fractional function

perspective function P:Rn+1Rn:

P(x,t)=x/t,   dom P={(x,t)|x>0}

images and inverse images of convex sets under perspective are convex

linear-fractional function f:RnRm:

f(x)=Ax+bcTx+d,   dom f={x|cTx+d>0}

images and inverse images of convex sets under linear-fractional functions are convex
example of a linear-fractional function

f(x)=1x1+x2+1x

generalized inequalities

a convex cone KRn is a proper cone if

  • K is closed (contains its boundary)
  • K is solid (has nonempty interior)
  • K is pointed (contains no line)

examples

  • nonnegtive orthant K=R+n={xRn|xi0,i=1,...,n}
  • positive semidefinite cone K=S+n
  • nonnegtive polynomials on [0,1]:

K={xRn | x1+x2t+x3t2+...+xntn10 for t[0,1]}

generalized inequality defined by a proper cone K:

xKyyxK,     xKyyxint K

examples

  • componentwise inequality (K=R+n)

xR+nyxiyi, i=1,...,n

  • martrix ineqaulity (K=S+n)

XS+nYYX positive semidefinite

these two types are so common that we drop the subscript in K

properties: many properties of K are much similar to on R, e.g.,

xKy, uKvx+uKy+v

minimum and minimal elements

K is not in general a linear ordering: we can have xKy and yKx

xS is the minimum element of S with respect to K if

ySxKy

xS is a minimal element of S with respect of K if

yS, yKxy=x

example (K=R+2)
x1 is the minimum element of S1
x2 is a minimal elementof S2

separating hyperplane thoerem

if C and D are disjoint convex sets, then there exists a0,b such that

aTxb for xC,   aTxb for xD

the hyperplane {x|aTx=b} separates C and D

strict separation requires additional assuptions (e.g., C is closed, D is a singleton)

supporting hyperplane theorem

supporting hyperplane to set C at boundary point x0:

{x | aTx=aTx0}

where a0 and aTxaTx0 for all xC

supporting hyperplane theorem: if C is convex, then there exists a supporting hyperplane at every boundary point of C

dual cones and generalized inequalities

dual cone of a cone K:

K={y | yTx0 for all xK}

examples

  • K=R+n: K=R+n
  • K=S+n: K=S+n
  • K={(x,t) | x2t}: K={(x,t) | x2t}
  • K={(x,t) | x1t}: K={(x,t) | xt}

first three examples are self-dual cones
dual cones of paper cones are proper, hence define generalized inequalities:

yK0yTx0 for all xK0

minimum and minimal elements via dual inequality

minimum element w.r.t. K
x is minimum element of S if for all λK0, x is the unique minimizer of λTz over S

minimal element w.r.t. K

  • if x minimizes λTz over S for some λK0, then x is minimal
  • if x is a minimal element of a convex set S, then there exists a nonzero λK0 such that x minimizes λTz over S

optimal production frontier

  • different production methods use different amounts of resources xRn
  • production set P: resource vectors x for all possibel production methods
  • efficient (Pareto optimal) methods correspond to resource vectors x that are minimal w.r.t. R+n

example (n=2)
x1,x2,x3 are efficient; x4,x5 are not

Convex functions

  • basic properties and examples
  • oerations that preserve convexity
  • the conjugation function
  • quasiconvex functions
  • log-concave and log-convex functions
  • convexity with respect to generalized inequality

definition

f : RnR is convex if dom f is a convex set and

f(θx+(1θ)y)θf(x)+(1θ)f(y)

for all x,y dom f, 0θ1

  • f is concave if f is convex
  • f is strictly convex if dom f is convex and

f(θx+(1θ)y)<θf(x)+(1θ)f(y)

for x,y dom f, xy, 0<θ<1

examples on R

convex:

  • affine: ax+b on R, for any a,bR
  • exponential: eax, for any aR
  • powers: xα on R++, for α1 or α0
  • powers of absolute value: |x|p on R, for p1
  • negative entropy: xlogx on R++

concave:

  • affine: ax+b on R, for any a,bR
  • powers: xα on R++, for 0α1
  • logarithm: logx on R++

example on Rn and Rm×n

affine functions are convex; all norms are convex
example on Rn

  • affine function fx=aTx+b
  • norms: xp=(i=1n|xi|p)1/p for p1; x=maxk|xk|
    example on Rm×n (m×n matrices)
  • affine function

f(X)=tr(ATX)+b=i=1mj=1nAijXij+b

  • spectral (maximum singular value) norm

f(X)=X2=σmax(X)=(λmax(XTX))1/2

restriction of a convex function to a line

f:RnR is convexif and only if the function g:RR,

g(t)=f(x+tv),    dom g={t | x+tvdom f}

is convex (int t) for any xdom f, vRn
can check convexity of F by checking convexity of functions of one variable

example: f:SnR with f(X)=logdetX, dom f=S++n

(1)g(t)=logdet(X+tV)=logdetX+logdet(I+tX1/2VX1/2)=logdetX+i=1nlog(1+tλi)

where λi are the eigenvalues of X1/2VX1/2
g is concave in t (for any choice of X0,V); hence f is concave

extended-value extension

(extended value extension: 拓展值延伸)
extended-value extension f~ of f is

f~(x)=f(x),  xdom f,      f~(x)=,  xdom f

often simplifies notation; for example, the condition

0θ1  f~(θx+(1θ)y)θf~(x)+(1θ)f~(y)

(as an inequality in R{}), means the same as the two conditions

  • dom f is convex
  • for x,ydom f,

0θ1  f(θx+(1θ)y)θf(x)+(1θ)f(y)

first-order condition

f is differentiable if dom f is open and the gradient

f(x)=(f(x)x1,f(x)x2,...,f(x)xn)

exists at each xdom f
1st-order condition: differentiable f with convex domain is convex if

f(y)f(x)+f(x)T(yx) for all x,ydom f

first-order approximation of f is global underestimator

second-order condition

f is twice differentiable if dom f is open and the Hessian 2f(x)Sn,

2f(x)ij=2f(x)xixj,  i,j=1,...,n,

exists at each xdom f

2nd-order conditions: for twice differentiable f with convex domain

  • f is convex if and only if

2f(x)0 for all xdom f

  • if 2f(x)0  for all xdom f, then f is strictly convex

examples

quadratic function: f(x)=(1/2)xTPx+qTx+r (with PSn)

f(x)=Px+q,    2f(x)=P

convex if P0
least-suqares objective: f(x)=Axb22

f(x)=2AT(Axb),    2f(x)=2ATA

convex (for any A)
quadratic-over-linear: f(x,y)=x2/y

2f(x,y)=2y3[yx][yx]T0

convex for y>0

log-sum-exp f(x)=logk=1nexpxk is convex

2f(x)=11Tzdiag(z)1(1Tz)2zzT      (zk=expxk)

to show 2f(z)0, we must werify that vT2f(x)v0 for all v:

vT2f(x)v=(kzkvk2)(kzk)(kvkzk)2(kzk)20

since ((kvkzk)2(kzkvk2)(kzk)) (from Cauchy-Schwarz inequality)

geometric mean: f(x)=(k=1nxk)1/n on R++n is concave (similar proof as for log-sum-exp)

epigraph and sublevel set

(epigraph:上境图;sublevel set:下水平集)
α-sublevel set of f:RnR:

Cα={xdom f | f(x)α}

sublevel sets of convex functions are convex (converse is false)
epigraph of f:RnR:

epi f={(x,t)Rn+1 | xdom f,f(x)t}

f is convex if and only if epi f is a convex set

Jense's inequality

basic inequality: if f is convex, then for 0θ1,

f(θx+(1θ)y)θf(x)+(1θ)f(y)

extension: if f is convex, then

f(Ez)Ef(z)

for any random variable z

basic inequality is special case with discrete distribution

prob(z=x)=θ,      prob(z=y)=1θ

operations that preserve convexity

practical methods for establishing convexity of a function

  1. verify definition (often simplified restricting to a line<1>)
  2. for twice differentiable function, show 2f(x)0
  3. show that f is obtainted form simple convex functions by operation that preserve convexity
  • nonnegative weighted sum
  • composition with affine function
  • pointwise maximum or supremum
  • composition
  • minimization
  • perspective

<1>: Generally we know that a function is convex it is convex even after we restrict it to a line. "Restricting a function to a line" simply means that you draw a line in the domain of that function and evaluate the function along that line.

positive weighted sum & composition with affine function

nonnegative multipe: αf is convex if f is convex, α0
sum: f1+f2 convex if f1, f2 convex (extends to infinite sums and integrals)
composition with affine function: f(Ax+b) is convex if f is convex

examples:

  • log barrier for linear inequalities

f(x)=i=1mlog(biaiTx),      dom f={x|aiTx<bi, i=1,...,m}

  • (any) norm of affine function: f(x)=Ax+b

pointwise maximum

if f1,...,fm is convex, then f(x)=max{f1(x),...,fm(x)} is convex

examples

  • piecewisw-linear function: f(x)=maxi=1,...,m(aiTx+bi)
  • sum of r largest components of xRn:

f(x)=x[1]+x[2]+...+x[r]

is convex (x[i] is ith largest component of x)
proof:

f(x)=max{xi1+xi2+...+xir|1i1<i2<...<irn}

pointwise supremum

(supremum:上界)
if f(x,y) is convex in x for each yA, then

g(x)=supyAf(x,y)

is convex
examples

  • support function of a set C:SC(x)=supyCyTx is convex

  • distance to farthest point in a set C:

f(x)=supyCxy

  • maximun eigenvalue of symmetric matrix: for XSn,

λmax(X)=supy2=1yTXy

composition with scalar functions

composition of g:RnR and h:RR:

f(x)=h(g(x))

f is convex if:
g convex, h convex, h~ nondecreasing;
g convave, h convex, h~ nonincreasing

  • proof (for n=1, differentiable g,h)

f(x)=h(g(x))g(x)2+h(g(x))g(x)

  • note: monotonicity must hold for extended-value extension h~

examples

  • expg(x) is convex if g is convex
  • 1/g(x) is convex if g is concave and positive

vector composition

composition of g:RnRk and h:RkR:

f(x)=h(g(x))=h(g1(x),...,gk(x))

f is convex if
gi convex, h convex, h~ nondecreasing in each argument
gi concave, h convex, h~ nonincreasing in each argument
proof (for n=1, differentiable g,h)

f(x)=g(x)T2h(g(x))g(x)+h(g(x))Tg(x)

examples

  • i=1mloggi(x) is concave if gi are concave and positive
  • logi=1mexpgi(x) is convex if gi is covex

minimization

(infimum: 下界;Schur complement(舒尔补):https://blog.csdn.net/sheagu/article/details/115771184)
if f(x,y) is convex in (x,y) and C is a convex set, then

g(x)=infyCf(x,y)

is convex
examples

  • f(x,y)=xTAx+2xTBy+yTCy with

[ABBTC]0,    C0

minimizing over y gives g(x)=infyf(x,y)=xT(ABC1BT)x g is convex, hence Schur complement ABC1BT0

-distance to a set: dist(x,S)=infySxy is convex if S is convex

perspective

the perspective of a function f:RnR is the function g:Rn×RR,??有问题!

g(x,t)=tf(x/t),      dom g={(x,t)|x/tdom f,t>0}

g is convex if f is convex

examples

  • f(x)=xTx is convex; hence g(x,t)=xTx/t is convex for t>0
  • negative logrithm f(x)=logx is convex; hence relative entropy g(x,t)=tlogttlogx is convex on R++2
  • if f is convex, then

g(x)=(cTx+d)f((Ax+b)/(cTx+d))

is convex on x|cTx+d>0,(Ax+b)/(cTx+d)dom f

the conjugate function

the conjugate of a function f is

f(y)=supxdom f(yTxf(x))

  • f is convex (even if f is not)
  • will be useful in chapter 5

examples

  • negative logarithm f(x)=logx

f(y)=supx>0(xy+logx)={1log(y)y<0otherwise

  • strictly convex quadratic f(x)=(1/2)xTQx with QS++n

f(y)=supx(yTx(1/2)xTQx)=12yTQ1y

quasiconvex functions

f:RnR is quasiconvex if dom f is convex and the sublevel sets

Sα={xdom f|f(x)α}

are convex for all α

  • f is quasiconcave if f is quasiconvex
  • f is quasilinear if it is quasiconvex and quasiconcave

注:拟凸

examples

  • |x| is convex on R
  • ceil(x)=inf{z(Z)|zx} is quasilinear
  • logx is quasilinear on R++
  • f(x1,x2)=x1x2 is quasicave on R++2
  • linear-fractional function

f(x)=aTx+bcTx+d,    dom f={x|cTx+d0}

is quasilinaer

  • distance ratio

f(x)=xa2xb2,     dom f={x|xa2xb2}

is quasiconvex
注:距离比

internal rate of return

注:内部收益率

properties

modified Jeson inequality: for quasiconvex f

0θ1f(θx+(1θ)y)max{f(x),f(y)}

first-order condition: differentiable f with convex domain is quasiconvex if

f(y)f(x)f(x)T(yx)0

sums of quasiconvex functions are not necessarily quasiconvex

log-concave and log-convex functions

a positive function f is log-concave if logf is concave:

f(θx+(1θ)y)f(x)θf(y)1θ  for 0θ1

f is log-covex if logfisconvex

  • powers: xa on R++ is log-convex for a0,log-convave for a0
  • many common probability densities are log-concave, e.g., normal:

f(x)=1(2π)ndete12(xx~)T1(xx~)

上式表示什么????

  • cumulative Gaussian distribution function Φ is log-cocave

Φ(x)=12πxeu2/2du

properties of log-concave functions

  • twice differentiable f with convex domain is log-concave if and only if

f(x)2f(x)f(x)f(x)T

for all xdom f

  • product of log-concave functions is log-concave
  • sum of log-concave is not always log-concave
  • integration:if f:Rn×RmR is log-concave, then

g(x)=f(x,y)dy

is log-concave (not easy to show)

consequences of integration property

  • convolution fg of log-concave functions f,g is log-concave

(fg)(x)=f(y)g(xy)dy

  • if CRn concex and y is a random variable with log-concave pdf then

f(x)=prob (x+yC)

is log-concave
proof: write f(x) as integral of product of log-concave functions

f(x)=g(x+y)p(y)dy,    g(u)={1uC0uC,

p is pdf of y

注:pdf(probability density function)概率密度函数;prob() 求概率运算?

example: yield function

Y(x)=prob (x+wS)

  • xRn: nominal parameter vlues for product
  • wRn: random variations of parameters in manufactured peoduct
  • S: set of acceptable values

if S is convex and w has a log-concave pdf, then

  • Y is log-concave
  • yield regions {x|Y(x)α}

convexity with respect to generalized inequalities

f:RnRm is K-convex if f is convex and

f(θx+(1θ)y)Kθf(x)+(1θ)f(y)

for x,ydom f,0θ1

example f:SmSm, f(X)=X2 is S+m-convex

proof: for fixed zRm, zTX2z=Xz22 is convex in X, i.e.,

zT(θX+(1θ)Y)2zθzTX2z+(1θzTY2z)

for X,Y(S)m, 0θ1
therefore f(θX+(1θ)Y)2KθX2+(1θ)Y2

posted @   工大鸣猪  阅读(81)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
点击右上角即可分享
微信分享提示