论文笔记(2)—"Adaptive Federated Optimization"

Intuition

Authors demonstrated that the gap between centralized and federated performance was caused by two reasons: 1)client drift, 2) a lack of adaptive.

Different from variance reduction methods, they extended federated learning with adaptive methods, like adam.

They rewrote the update rule of FedAvg

xt+1=xt1|S|iS(xtxit)

Let Δit=xitxt, where xit denotes the model of client i after local training.

The server learning rate η is FedAvg is 1 with applying SGD and Δit is a kind of pseudo-gradient. They purposed that apart from SGD, the server optimizer could utilize adaptive methods to update server model x. Their framework is following:

sotnsA.png

Convergence

Multi steps local update, concretely, E[Δit]KF(xt)), obstacles the analysis of convergence. In my opinion, they offered a roughly bound of error.

I'll only give my personal analysis of their proof of Theorem 1 and thoughts of Theorem 2 are similar.

Firstly, we should build relationships between xt+1 and xt. According to the update rule of adaptive methods and L-smooth assumption, we have

soUwCT.png

Furthermore, like in Adagrad, we will have

soU5xe.png

Now, we should bound these two terms T1 and T2

So far, there is no local training involving T2 and we follow the same process of Adagrad

sowiCV.png

sowZDJ.png

sodXjg.png

To bound T1, they tried to link Δt containing local update with f(xt).

As mentioned above, Δt is a kind of pseudo-gradient of f(xt)

sowDxS.png

Again, note that xi,k is k-th model during local training in client i and xt is the server model at round t.

sosrHe.png

In my opinion, how to bound xi,ktxt is the most impressive part of the whole paper.

sosvuT.png

Honestly, local gradient gi,kt builds the bridge between xi,kt and xt and E[Fi(xi,k1t)]Fi(xt)

soy12t.png

The second inequity is very rough and unclear. Known E[ηl()They used E[z1+z2++zr2]rE[z12+z22]++zr2]

posted @   Neo_DH  阅读(1234)  评论(0编辑  收藏  举报
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· C#/.NET/.NET Core技术前沿周刊 | 第 29 期(2025年3.1-3.9)
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
点击右上角即可分享
微信分享提示