High-Resolution Image Synthesis with Latent Diffusion Models

概
大概流程
代码

Rombach R., Blattmann A., Lorenz D., Esser P. and Ommer B. High-resolution image synthesis with latent diffusion models. In IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2022.

概

将模型投射到更低维的子空间中, 以节省计算量.

大概流程

原本的扩散模型开始和结束都是基于原始的图像空间, 所以如果想要生成特别高清的图像的话所需的计算开销是不菲的.
于是作者希望先训练 Encoder, Decoder, 然后首先:
1. 将原本的图像 \(x \in \mathbb{R}^{C \times H \times W}\) 映射到一个低维的隐空间中.
2. 然后整个前向扩散和反向恢复的过程都在这个隐空间进行.
3. 在实际推断的时候, 假设我们得到了一个隐空间中的一个采样 \(\hat{z}\), 再通过 decoder 映射回来即可.
注意, 本文还提出了一种一种 cross-attention 的方式来建模条件分布:

\[\text{Attention}(Q, K, V) = \text{softmax}(\frac{QK^T}{\sqrt{d}}) \cdot V, \\ Q = W_Q^{(i)} \cdot \varphi_i (z_t), K = W_K^{(i)}, \tau_{\theta}(y), V = W_V^{(i)} \cdot \tau_{\theta}(y). \]

代码

official

posted @ 2023-03-16 20:12 馒头and花卷阅读(376) 评论(2) 收藏举报

刷新页面返回顶部

馒头and花卷

High-Resolution Image Synthesis with Latent Diffusion Models

概

大概流程

代码

公告