Operator与优化

Relation

关系这个词跟映射有点相似,对于一个关系R,其是(x,y)的一个集合集合。其中dom R={x|(x,y)R}R(x)={y|(x,y)R},其零集合是{x|(x,y)R,y=0}

Operations on Relation

  • inverse. R1={(y,x)|(x,y)R}
  • composition. RS={(x,y)|(x,z)R,(z,y)S}
  • scalar multiplication. αR={(x,αy)|(x,y)R}
  • addition. R+S={(x,y+z)|(x,y)R,(x,z)S}
  • resolvent operator. S=(I+λR)1

通过以上的运算可以看出,relation有点类似于凸函数中epigraph的那种集合定义。

Monotone Operations

对于一个单调的relation F,其定义为

(uv)T(xy)0

对于任意的(x,y),(u,v)R. 一个最大单调F的定义为,没有其他单调relation包含F

F是最大单调当且仅当F是一个连接的曲线,其斜率不存在负值。

Case: Subgradient F=f(x)

Nonexpansive and contractive operator

对于一个LLipschitz连续的operator F,其nonexpansive和contraction的定义分别为L=1L<1

Characters:

Resolvent operation and Cayley operator

对于一个relation F,当F是单调且nonexpansive时,R operator是contractive的。F的cayley operator定义为

C=2RI=2(I+λF)1I

同样当F是单调的时候,其cayley operator C是nonexpansive。

Proof:

Case:

  1. Proximal
  1. Indicator

Fixed point of operators & zero set of F

这里有个很重要的定理就是Cayleyresolvent的Fixed point等价于F relation的zero set。也就是

F(x)0C(x)=xR(x)=x

Theorem: Banach fixed point theorem

F是contraction,dom F=Rn,那么F(x)会收敛到一个唯一的fixed point。

Damped iteration of a nonexpansive operator

相对于

xk+1=F(xk)

Damped iteration为一个xkF(xk)的组合

xk+1=θkxk+(1θk)F(xk)

Proof:

Case:

Operator Splitting

这里要解决的问题是一个relation F=A+B,单独队F进行求解可能比较麻烦而分开对AB求解更简单。

Theorem: 如果A和B是maximal monotone,那么

0A(x)+B(x)CACB(z)=z

其中x=RB(z)

Proof:

证明也是比较简单,使用定义就可以得到。

Peaceman-Rachford & Douglas-Rachfold Splitting

(1)Peaceman-Rachford:zk+1=CACB(zk)(2)Douglas-Rachfold:zk+1=12(I+CACB)(zk)

  1. Douglas-Rachfold updating

The last equation:

Case: Alternating direction method of multipliers

Case: Constrained optimization

  1. Peaceman-Rachford updating

(3)xk+12=proxαf(zk)(4)zk+12=2xk+12zk(5)xk+1=proxαg(zk+12)(6)zk+1=2xk+1zk+12

Case: FedSplit, a consensus problem

对于loss函数F,以及consensus constrain,利用一阶方法求解最小值等价于

0F(x)+N

其中N为其consensus的normal corn。

上图为其论文中的算法流程,这里的A operator为NB operator为F而且由于x=z¯在最后执行所以整个顺序都提前,并且算法中的第一步(a)直接整合了PR的中间两步。

Consensus Optimization

贴一下Boyd课程的代码吧(注释掉的是我修改的,更新就和公式一样了)
% Solves the QP
%       mininimze   (1/2)||Ax - b||_2^2
%       subject to  Fx <= g
% using D-R consensus. Note that the code has not been optimized for
% runtime and is only presened to give an idea of D-R consensu. For better
% performance, the inner loop should be run in parallel and should use a
% fast QP solver for small problems (e.g., CVXGEN).
%
% EE364b Convex Optimization II, S. Boyd
% Written by Eric Chu, 04/25/11
% 

close all; clear all
randn('state', 0); rand('state', 0);

%%% Generate problem instance
m = 1000;
n = 100;
k = 50;

xtrue = randn(n,1);
A = randn(m,n);
b = A*xtrue + randn(m,1);

F = randn(k,n);
g = F*xtrue;

%%% Use CVX to find solution
cvx_begin
    variable x(n)
    minimize ((1/2)*sum_square(A*x - b))
    subject to
        F*x <= g
cvx_end
xcvx = x;
fstar = cvx_optval; 
  
%%% Douglas-Rachford consensus splitting
N           = 10;      % number of subproblems
MAX_ITERS   = 50;
rho         = 200;

z           = zeros(n,N); 
xbar        = zeros(n,1);

for j = 1:MAX_ITERS,
    
    % x = prox_f(z), could be done in parallel
    for i = 1:N,
        Ai = A(m/N*(i-1) + 1:i*m/N,:);
        bi = b(m/N*(i-1) + 1:i*m/N);
        
        Fi = F(k/N*(i-1) + 1:i*k/N,:);
        gi = g(k/N*(i-1) + 1:i*k/N);
        
        % use CVX to solve prox operator
        zi = z(:,i);
        cvx_solver sdpt3
        cvx_begin quiet
            variable xi(n)
            minimize ( (1/2)*sum_square(Ai*xi - bi) + (rho/2)*sum_square(xi - zi) )
            subject to
                Fi*xi <= gi
        cvx_end
        x(:,i) = xi;
    end
    
    %% standard 
    %z_midterm = 2*x-z;
    %xbar_prev = xbar;
    %xbar = mean(z_midterm,2);
    
    %infeas(j) = sum(pos(F*xbar - g));
    %f(j) = (1/2)*sum_square(A*xbar - b);
    %z = z + (xbar*ones(1,N) - x);
    
    %% Boyd
    
    xbar_prev = xbar;
    xbar = mean(x,2);
    
    % record infeasibilities
    infeas(j) = sum(pos(F*xbar - g));
    
    % record objective value
    f(j) = (1/2)*sum_square(A*xbar - b);
    
    % update
    z = z + (xbar*ones(1,N) - x) + (xbar - xbar_prev)*ones(1,N);
end

%%% Make plots
subplot(2,1,1)
semilogy(1:MAX_ITERS, infeas);
ylabel('infeas'); set(gca, 'FontSize', 18); axis([1 MAX_ITERS 10^-2 10^2])
subplot(2,1,2)
plot(1:MAX_ITERS, f, [1 MAX_ITERS], [fstar fstar], 'k--');
xlabel('k'); ylabel('f'); axis([1 MAX_ITERS 300 2000]); set(gca, 'FontSize', 18);
print -depsc dr_consensus_qp.eps

左边是我修改的,右边是Boyd的代码。看下来效果好像差不多,但是我还没搞懂他的代码为啥这样写。

参考资料

posted @   Neo_DH  阅读(212)  评论(0编辑  收藏  举报
编辑推荐:
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
阅读排行:
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· C#/.NET/.NET Core技术前沿周刊 | 第 29 期(2025年3.1-3.9)
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
点击右上角即可分享
微信分享提示