Part4: Appendix

本文是前几篇文章中相关公式的详细推导部分，主要对论文中一些被省略的推导进行补充说明，对“扩散模型”感兴趣请查看前几篇文章。

高斯分布

概率密度函数

若 $x \sim N (μ, σ^{2})$ ，则：

f (x; μ, σ) = \frac{1}{σ \sqrt{2 π}} \exp (- \frac{(x - μ)^{2}}{2 σ^{2}})

两个高斯的KL散度

D_{KL} (N (μ_{1}, σ_{1}^{2}) ∣∣ N (μ_{2}, σ_{2}^{2})) = \ln \frac{σ_{2}}{σ_{1}} + \frac{σ_{1}^{2} + (μ_{1} - μ_{2})^{2}}{2 σ_{2}^{2}} - \frac{1}{2}

性质1

如果存在一个随机变量 $x \sim N (μ, σ^{2})$ 服从高斯分布，那么存在实数 $a, b$ ，满足：

a x + b \sim N (a μ + b, (a σ)^{2})

因此，对于任意高斯分布 $x \sim N (μ, σ^{2})$ ，可以将其表示为服从标准正态分布的随机变量 $ϵ$ 的变换，即:

x = ϵ * σ + μ, ϵ \sim N (0, I)

性质2

假定两个随机变量都服从高斯分布且相互独立，记作 $x \sim N (μ_{x}, σ_{x}^{2}), y \sim N (μ_{y}, σ_{y}^{2})$ ，则两个随机变量的和或差仍服从高斯分布，即：

\begin{aligned} U = x + y \sim N (μ_{x} + μ_{y}, σ_{x}^{2} + σ_{y}^{2}) \\ V = x - y \sim N (μ_{x} - μ_{y}, σ_{x}^{2} + σ_{y}^{2}) \end{aligned}

推导一

在 $Diffusion Forward process$ 中，任意时刻 $t$ 的状态 $x_{t}$ 如何基于 $x_{0}$ 表示？

解：
已知前向过程中，状态间的转换服从高斯分布，有：

\begin{matrix} (1) & q (x_{t} ∣ x_{t - 1}) = N (\sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) \end{matrix}

对 $β_{t}$ 进行变换，定义：

\begin{aligned} α_{t} & = 1 - β_{t} \\ {\bar{α}}_{t} & = \prod_{i = 1}^{t} α_{i} \end{aligned}

对 $(1)$ 式展开如下：

\begin{matrix} (2) & \begin{aligned} q (x_{t} ∣ x_{t - 1}) & = N (\sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) \\ x_{t} & = \sqrt{1 - β_{t}} x_{t - 1} + \sqrt{β_{t}} ϵ, ϵ \sim N (0, I) \\ = \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ϵ \end{aligned} \end{matrix}

已知 $x_{t} = \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ϵ$ ，同理可得 $x_{t - 1} = \sqrt{α_{t - 1}} x_{t - 2} + \sqrt{1 - α_{t - 1}} \bar{ϵ}$ ，对 $(2)$ 改写，有：

\begin{matrix} (3) & \begin{aligned} \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ϵ \\ = & \sqrt{α_{t}} (\sqrt{α_{t - 1}} x_{t - 2} + \sqrt{1 - α_{t - 1}} \bar{ϵ}) + \sqrt{1 - α_{t}} ϵ \\ = & \sqrt{α_{t} α_{t - 1}} x_{t - 2} + \sqrt{α_{t} (1 - α_{t - 1})} \bar{ϵ} + \sqrt{1 - α_{t}} ϵ \end{aligned} \end{matrix}

为了与 $ϵ$ 进行区分，使用 $\bar{ϵ}$ 表示另外一个服从标准高斯分布 $N (0, I)$ 的变量。

根据高斯分布的性质1，任意的高斯分布可由标准高斯分布转换得到，故：

\begin{matrix} (a) & \begin{aligned} ϵ \sim N (0, I) & \Rightarrow \sqrt{1 - α_{t}} ϵ \sim N (0, (1 - α_{t}) I) \\ \bar{ϵ} \sim N (0, I) & \Rightarrow \sqrt{α_{t} (1 - α_{t - 1})} ϵ \sim N (0, α_{t} (1 - α_{t - 1}) I) \end{aligned} \end{matrix}

由于 $\sqrt{1 - α_{t}} ϵ$ 与 $\sqrt{α_{t} (1 - α_{t - 1})} \bar{ϵ}$ 独立且都服从高斯分布，记 $U = \sqrt{1 - α_{t}} ϵ + \sqrt{α_{t} (1 - α_{t - 1})} \bar{ϵ}$ ，由性质2可知 $U$ 也服从高斯分布，有：

\begin{matrix} (b) & \begin{aligned} \sqrt{1 - α_{t}} ϵ + \sqrt{α_{t} (1 - α_{t - 1})} \bar{ϵ} & \sim N (0, (1 - α_{t}) I + α_{t} (1 - α_{t - 1}) I) \\ \Rightarrow U & \sim N (0, (1 - α_{t} α_{t - 1}) I) \end{aligned} \end{matrix}

基于高斯分布的性质1，将 $U$ 使用标准高斯分布表示：

\begin{matrix} (c) & \begin{aligned} U & \sim N (0, (1 - α_{t} α_{t - 1}) I) \Rightarrow U = \sqrt{1 - α_{t} α_{t - 1}} ϵ \end{aligned} \end{matrix}

将 $(c)$ 代入 $(3)$ ，可得：

\begin{aligned} \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ϵ \\ = & \sqrt{α_{t}} (\sqrt{α_{t - 1}} x_{t - 2} + \sqrt{1 - α_{t - 1}} \bar{ϵ}) + \sqrt{1 - α_{t}} ϵ \\ = & \sqrt{α_{t} α_{t - 1}} x_{t - 2} + \sqrt{α_{t} (1 - α_{t - 1})} \bar{ϵ} + \sqrt{1 - α_{t}} ϵ \\ = & \sqrt{α_{t} α_{t - 1}} x_{t - 2} + \sqrt{1 - α_{t} α_{t - 1}} ϵ \end{aligned}

由数学归纳法，易知：

\begin{aligned} q (x_{t} ∣ x_{t - 1}) & = N (\sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) \\ x_{t} & = \sqrt{1 - β_{t}} x_{t - 1} + \sqrt{β_{t}} ϵ, ϵ \sim N (0, I) \\ = \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ϵ \\ = \sqrt{α_{t} α_{t - 1}} x_{t - 2} + \sqrt{1 - α_{t} α_{t - 1}} ϵ \\ = \dots \\ = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ \end{aligned}

因此， $q (x_{t} ∣ x_{0}) = N (\sqrt{{\bar{α}}_{t}} x_{0}, \sqrt{1 - {\bar{α}}_{t}} I)$

推导二

在 $d i f f u s i o n$ 中，定义 $q$ 服从高斯分布，故对 $q (x_{t - 1} ∣ x_{t}, x_{0})$ 定义如下：

\begin{aligned} q (x_{t - 1} ∣ x_{t}, x_{0}) & = N (x_{t - 1}; {\tilde{μ}}_{t} (x_{t}, x_{0}), {\tilde{β}}_{t} I) \end{aligned}

那其中 ${\tilde{μ}}_{t} (x_{t}, x_{0})$ 与 $\tilde{β_{t}}$ 如何得到？

此处先给出结论，下方是更详细的推导。

\begin{aligned} {\tilde{μ}}_{t} (x_{t}, x_{0}) & := \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{1 - {\bar{α}}_{t}} x_{0} + \frac{\sqrt{α_{t}} (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} x_{t}, \\ {\tilde{β}}_{t} & := \frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}} β_{t} \end{aligned}

解：
回顾贝叶斯公式，对 $q (x_{t - 1} ∣ x_{t}, x_{0})$ 改写，有：

\begin{matrix} (1) & q (x_{t - 1} ∣ x_{t}, x_{0}) = q (x_{t} ∣ x_{t - 1}, x_{0}) \frac{q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \end{matrix}

由于Diffusion基于马尔可夫链建模，由马尔可夫性易知每个状态只依赖于前一个状态，故

q (x_{t} ∣ x_{t - 1}, x_{0}) = q (x_{t} ∣ x_{t - 1})

$(1)$ 式写作 $(2)$ 式：

\begin{matrix} (2) & \begin{aligned} q (x_{t - 1} ∣ x_{t}, x_{0}) & = q (x_{t} ∣ x_{t - 1}, x_{0}) \frac{q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \\ = q (x_{t} ∣ x_{t - 1}) \frac{q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \end{aligned} \end{matrix}

基于推导一的结论，易知：

\begin{aligned} q (x_{t} ∣ x_{0}) & = N (\sqrt{{\bar{α}}_{t}} x_{0}, \sqrt{1 - {\bar{α}}_{t}} I) \\ q (x_{t - 1} ∣ x_{0}) & = N (\sqrt{{\bar{α}}_{t - 1}} x_{0}, \sqrt{1 - {\bar{α}}_{t - 1}} I) \end{aligned}

由高斯分布的概率密度函数，对 $(2)$ 展开，有：

\begin{matrix} (3) & \begin{aligned} q (x_{t - 1} ∣ x_{t}, x_{0}) & = q (x_{t} ∣ x_{t - 1}) \frac{q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \\ \propto \exp (- \frac{1}{2} (\frac{{(x_{t} - \sqrt{α_{t}} x_{t - 1})}^{2}}{β_{t}} + \frac{{(x_{t - 1} - \sqrt{{\bar{α}}_{t - 1}} x_{0})}^{2}}{1 - {\bar{α}}_{t - 1}} - \frac{{(x_{t} - \sqrt{{\bar{α}}_{t}} x_{0})}^{2}}{1 - {\bar{α}}_{t}})) \end{aligned} \end{matrix}

不论是 $β_{t}$ 或是 ${\bar{α}}_{t}$ 皆非随机变量，故可省略。最终目标是使用随机变量 $x_{0}$ 与 $x_{t}$ 表示 $x_{t - 1}$ 。对 $(3)$ 式继续展开，有 $(4)$ ：

\begin{matrix} (4) & \begin{aligned} q (x_{t - 1} ∣ x_{t}, x_{0}) = q (x_{t} ∣ x_{t - 1}) \frac{q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \\ \propto \exp (- \frac{1}{2} (\frac{{(x_{t} - \sqrt{α_{t}} x_{t - 1})}^{2}}{β_{t}} + \frac{{(x_{t - 1} - \sqrt{{\bar{α}}_{t - 1}} x_{0})}^{2}}{1 - {\bar{α}}_{t - 1}} - \frac{{(x_{t} - \sqrt{{\bar{α}}_{t}} x_{0})}^{2}}{1 - {\bar{α}}_{t}})) \\ = \exp (- \frac{1}{2} (\frac{x_{t}^{2} - 2 \sqrt{α_{t}} x_{t} x_{t - 1} + α_{t} x_{t - 1}^{2}}{β_{t}} + \frac{x_{t - 1}^{2} - 2 \sqrt{{\bar{α}}_{t - 1}} x_{0} x_{t - 1} + {\bar{α}}_{t - 1} x_{0}^{2}}{1 - {\bar{α}}_{t - 1}} - \frac{{(x_{t} - \sqrt{{\bar{α}}_{t}} x_{0})}^{2}}{1 - {\bar{α}}_{t}})) \\ = \exp (- \frac{1}{2} ((\frac{α_{t}}{β_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}) x_{t - 1}^{2} - (\frac{2 \sqrt{α_{t}}}{β_{t}} x_{t} + \frac{2 \sqrt{{\bar{α}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} x_{0}) x_{t - 1} + C (x_{t}, x_{0}))) \end{aligned} \end{matrix}

其中，倒数第二个等号右边是对上一步的平方展开；最后一个等号右边是以 $x_{t - 1}$ 为变量， $x_{0}$ 与 $x_{t}$ 为参数，构造完全平方公式，以形成高斯分布概率密度函数中的指数部分，形如 $- \frac{(x_{t - 1} - \tilde{μ_{t}})^{2}}{2 \tilde{β_{t}}}$ 。因此，不难得出：

\begin{matrix} (5) & \begin{aligned} {\tilde{μ}}_{t} & = \frac{1}{\frac{α_{t}}{β_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}} * (\frac{\sqrt{α_{t}}}{β_{t}} x_{t} + \frac{\sqrt{{\bar{α}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} x_{0}) \\ = \frac{(1 - {\bar{α}}_{t - 1}) β_{t}}{α_{t} (1 - {\bar{α}}_{t - 1}) + β_{t}} * (\frac{\sqrt{α_{t}}}{β_{t}} x_{t} + \frac{\sqrt{{\bar{α}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} x_{0}) \\ = \frac{(1 - {\bar{α}}_{t - 1}) \sqrt{α_{t}}}{α_{t} (1 - {\bar{α}}_{t - 1}) + β_{t}} x_{t} + \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{α_{t} (1 - {\bar{α}}_{t - 1}) + β_{t}} x_{0} \end{aligned} \end{matrix}

$α_{t} = 1 - β_{t}$ ，故：

\begin{matrix} (6) & \begin{aligned} α_{t} (1 - {\bar{α}}_{t - 1}) + β_{t} & = α_{t} - α_{t} {\bar{α}}_{t - 1} + β_{t} \\ = 1 - β_{t} - α_{t} {\bar{α}}_{t - 1} + β_{t} \\ = 1 - α_{t} {\bar{α}}_{t - 1} \\ = 1 - {\bar{α}}_{t} \end{aligned} \end{matrix}

将 $(6)$ 式代入 $(5)$ ，有：

{\tilde{μ}}_{t} (x_{t}, x_{0}) := \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{1 - {\bar{α}}_{t}} x_{0} + \frac{\sqrt{α_{t}} (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} x_{t}

对于 ${\tilde{β}}_{t}$ ，有：

\begin{aligned} {\tilde{β}}_{t} & = \frac{1}{\frac{α_{t}}{β_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}} \\ = \frac{(1 - {\bar{α}}_{t - 1}) β_{t}}{α_{t} (1 - {\bar{α}}_{t - 1}) + β_{t}} \\ = \frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}} β_{t} \end{aligned}

以上内容即 $DDPM$ 中一些被省略的数学推导。

Papers

posted @ 2024-02-25 21:39 小王点点阅读(44) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· Part3: Dive into DDPM

· Part1: Overview of Diffusion Process

· 从DDPM到DDIM (二) 前向过程与反向过程的概率分布

· Diffusion model笔记

· Diffusion Model 原理解析

阅读排行：
· TypeScript + Deepseek 打造卜卦网站：技术与玄学的结合
· Manus的开源复刻OpenManus初探
· AI 智能体引爆开源社区「GitHub 热点速览」
· 从HTTP原因短语缺失研究HTTP/2和HTTP/3的设计差异
· 三行代码完成国际化适配，妙~啊~

历史上的今天：
2020-02-25 如何利用dokcer提交我的比赛代码

Shayue'Log

Part4: Appendix

高斯分布

概率密度函数

两个高斯的KL散度

性质1

性质2

推导一

推导二

Papers

公告

合集