Exploring Recursion in Convex Optimization

Recursion in optimization

In this blog post, I aim to provide a overview of the various recursive methods I have seen in convex optimization. Optimization methods often yield a sequence denoted as {xt}t0, prompting a detailed analysis of the convergence properties inherent in these sequences.

Convergent sequence

The concept of convergent and divergent sequences is fundamental in advanced mathematical studies, such as the well-known Cauchy sequence. In the following discussion, I will narrow our focus to specific recursion techniques, shedding light on their convergence properties. To delve deeper into this subject, let's explore a particular recursion method:

(1)xt+1=γtxt+εt.

The formulation mentioned above holds universal significance in the field of optimization (see Polyak ch 2.2.3).

Now, let's delve into two fundamental lemmas that play a crucial role in determining the convergence of a sequence governed by Eq.(1).

Lemma 1 Let xt0 and

xt+1=γtxt+εt,

where 0γt<1, ε0, t=0(1γt)=, and εt/(1γt)0, then

xt+10.

The condition ε1γt0 implies that ε=o(1γt).

Lemma 2 Let xt0 and

xt+1=(1+αt)xt+εt,

where t=0αt< and t=0εt<, then

xt+1x¯.

Drawing from these two lemmas, it becomes evident that various convergent recursions can be derived, as illustrated in Franci, B..

Convergence rate

When dealing with a convergent sequence, a critical question arises: How many iterations are needed to obtain an ϵ-approximate solution? In other words, what is the convergence rate for this sequence? Understanding the convergence rate allows us to select the most efficient method based on the least number of iterations required.

To begin our analysis, let's consider a scenario where both γtγ and εtε are fixed. While this violates the assumption in Lemma 1, it provides a starting point for our exploration. Expanding the recursion (1), we obtain:

xt+1γT+1x0+ε1γ,

This indicates that xt+1 lies within the ε1γ-neighborhood of 0. Achieving a convergence guarantee necessitates driving ε1γ to approach 0, as outlined in Lemma 1.

As highlighted in Bottou thm4.6, employing a constant learning rate results in the linear convergence of expected objective values to a neighborhood of the optimal value. While a smaller stepsize might degrade the contraction constant in the convergence rate, it facilitates approaching closer to the optimal value.

To ensure convergence, we opt for a diminishing stepsize strategy, leading to a linear convergence rate of O(1T).

In general, we reformulate (1) as follows:

xt+1(1ηt)xt+ηt2σ2.

Theorem 1 Let xt0 and

xt+1(1ηt)xt+ηt2σ2,

where ηt=ba+t, b,σ2>0, a0, then

xTva+T,

where v=max{(a+1)x0,b2σ2b1}.

We prove it by induction. When t=1, it obviously holds. Assume that it holds for t, then

xt+1t+a1(t+a)2vbv1(a+t)2v+b2σ2(t+a)2,

which leads to xt+1vt+1+a.

In Polyak lemma 2.2.4, there is the convergence rate for the recursion

xt+1=(1ct)xt+dtp+1,

xtd(cp)1tp+o(tp),c>p,xt=O(tclogt),p=c,xt=O(tc),p>c.

As shown in Bottou thm5.1, when εt converges geometrically, the recursion exhibits a linear convergence rate with a constant stepsize, a technique often referred to as dynamic sampling.

Reference

posted @   Neo_DH  阅读(27)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
点击右上角即可分享
微信分享提示