fariver

2024年3月27日

[Paper Reading] KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models

摘要：名称 KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models 时间：23.05 机构：Microsoft TL;DR 一种输入多模型信息的大语言模型，作者称之为多模型大语言模型(MLLM)，可以图多阅读全文

posted @ 2024-03-27 00:12 fariver 阅读(33) 评论(0) 推荐(0) 编辑

2024年3月26日

[Paper Reading] VQ-VAE: Neural Discrete Representation Learning

摘要：名称 VQ-VAE: Neural Discrete Representation Learning 时间：17.11 机构：Google TL;DR VQ全称为Vector Quantised，故名思义，本文相对于VAE最大改进是将VAE的latent representation由连续建模为离散阅读全文

posted @ 2024-03-26 00:12 fariver 阅读(212) 评论(0) 推荐(0) 编辑

2024年3月22日

[Paper Reading] Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

摘要：名称 Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding 时间：22/05 机构：Google TL;DR 发现使用LLM(T5)可以作为text2image任务的text en 阅读全文

posted @ 2024-03-22 20:31 fariver 阅读(64) 评论(0) 推荐(0) 编辑

2024年3月21日

[基础] DiT: Scalable Diffusion Models with Transformers

摘要：名称 DiT: Scalable Diffusion Models with Transformers 时间：23/03 机构：UC Berkeley && NYU TL;DR 提出首个基于Transformer的Diffusion Model，效果打败SD，并且DiT在图像生成任务上随着Flops 阅读全文

posted @ 2024-03-21 23:35 fariver 阅读(1116) 评论(0) 推荐(0) 编辑

2024年3月20日

[Paper Reading] DALLE3: Improving Image Generation with Better Captions

摘要： DALLE3: Improving Image Generation with Better Captions DALLE3: Improving Image Generation with Better Captions 时间：23/10 机构：OpenAI TL;DR 本文认为text-imag 阅读全文

posted @ 2024-03-20 23:34 fariver 阅读(129) 评论(0) 推荐(0) 编辑

2024年3月19日

[Paper Reading] DALLE2: Hierarchical Text-Conditional Image Generation with CLIP Latents

摘要：名称 DALLE2: Hierarchical Text-Conditional Image Generation with CLIP Latents 也叫 UnCLIP 时间：22.04 机构：OpenAI TL;DR OpenAI的首篇从CLIP的image embedding生成图像的方法，实阅读全文

posted @ 2024-03-19 23:42 fariver 阅读(87) 评论(0) 推荐(0) 编辑

2024年3月18日

[Paper Reading] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

摘要： GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models GLIDE(Guided Language to Image Diffusion for Generation a 阅读全文

posted @ 2024-03-18 23:46 fariver 阅读(220) 评论(0) 推荐(0) 编辑

2024年3月16日

[Paper Reading] DALLE: Zero-Shot Text-to-Image Generation

摘要： DALLE: Zero-Shot Text-to-Image Generation DALLE: Zero-Shot Text-to-Image Generation 时间：21.02（与CLIP同期论文）机构：OpenAI TL;DR 提出一个将文本与图像作为token，利用Transforme 阅读全文

posted @ 2024-03-16 23:45 fariver 阅读(107) 评论(0) 推荐(0) 编辑

2024年3月14日

[基础] Latent Diffusion Model: High-Resolution Image Synthesis with Latent Diffusion Models

摘要：名称 Latent Diffusion Model, High-Resolution Image Synthesis with Latent Diffusion Models 时间：21.12 机构：runway TL;DR 这篇文章介绍了一种名为潜在扩散模型（Latent Diffusion Mo 阅读全文

posted @ 2024-03-14 21:35 fariver 阅读(390) 评论(0) 推荐(0) 编辑

2024年3月12日

[Paper Reading] DDIM: DENOISING DIFFUSION IMPLICIT MODELS

摘要：名称 DDIM DENOISING DIFFUSION IMPLICIT MODELS TL;DR 这篇文章介绍了一种名为去噪扩散隐式模型（Denoising Diffusion Implicit Models, DDIMs）的新型生成模型，它是基于去噪扩散概率模型（DDPMs）的改进版本。DDIM 阅读全文

posted @ 2024-03-12 00:12 fariver 阅读(220) 评论(0) 推荐(0) 编辑

公告