[Avatar] Avatar by ImgGen
Avatarify 神器,如何学习,并优化定制。
From StyleGan to DeepFake
如何训练
-
硬件配置
整个项目花费了51个GPU年,总耗电量131.61MWh。 也就是单个Tesla V100 GPU需要51年,总耗电量13万度,单电费10万块钱。
如果只是run一下玩,3080肯定够,你这个青铜器应该是要自行训练了,官方代码里面要求最低16G显存,可能需要3090了。而且gan的训练,如果重头开始训练成本极高,楼主珍重。
-
尝试训练
deepfacelab【可以视频到视频自动训练并合成】
Everybody Can Make Deepfakes Now!【实时人脸deepfake才是重点】
How to Develop a Conditional GAN (cGAN) From Scratch【*****】
In this tutorial, you will discover how to develop a conditional generative adversarial network for the targeted generation of items of clothing.
大致了解一个生成clothing icon的实验。
如何使用
-
真实脸重建
AI + Forensic Facial Reconstruction
-
蚂蚁嘿呀
这是一篇NIPS2019的论文,任务是让源图像(source image)中的物体按照驱动视频(driving video)中的运动而运动。
有开源的代码,地址:https://github.com/AliaksandrSiarohin/first-order-model
For detail: [NN] First Order Motion Model for Image Animation
-
换脸,没前途;控制脸,则有 [***]
MegaPortraits: One-shot Megapixel Neural Head Avatars, 2022
修复脸部,变高清
论文集:https://github.com/TaoWangzj/Awesome-Face-Restoration
HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment
Taking Face
Start From Here
MegaPortraits: One-shot Megapixel Neural Head Avatars
这篇的效果已达标,可惜不开源~
SadTalker
13 Mar 2023
SadTalker: https://github.com/Winfredy/SadTalker
[project] https://sadtalker.github.io/
[paper] https://arxiv.org/pdf/2211.12194.pdf
目录:Awesome Talking Face Generation
title | - | paper | code | dataset | keywords | |
---|---|---|---|---|---|---|
Emotionally Enhanced Talking Face Generation | paper | code | CREMA-D | emotion | 画面不稳定 | |
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video | AAAI(23) | paper | code | 嘴形不错,清晰度也不错 | ||
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior | paper | code | 3D | 3D脸部模型。 | ||
GENEFACE: GENERALIZED AND HIGH-FIDELITY AUDIO-DRIVEN 3D TALKING FACE SYNTHESIS | ICLR (23) | paper | code | NeRF | 优化了嘴,耗时~ | |
OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION | paper | |||||
LipNeRF: What is the right feature space to lip-sync a NeRF? | paper | NeRF | ||||
Audio-Visual Face Reenactment | WACV (23) | paper | code | 效果可以,7个月还没代码~ | ||
Towards Generating Ultra-High Resolution Talking-Face Videos With Lip Synchronization | WACV (23) | paper | - | |||
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles | AAAI(23) | paper | code | |||
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis | paper | proj | Diffusion | |||
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation | paper | proj | Diffusion |
期待,但暂时没code。 mesh |
||
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model | paper | code | Diffusion |
-
渐变效果
Ref: WIP: new Extension for frame-interpolations (F.I.L.M, InfiniteZoom...)
Ref: Loopback-Inpaint+GoogleFILM in SD-WebUI (<- Workflow)
Ref: https://github.com/google-research/frame-interpolation#see-windows_installation-for-windows-support
Paper: Real-time Talking Face
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
For more details: [GenerativeAI] Avatar solutions
FaceFormer: Speech-Driven 3D Facial Animation with Transformers [11 Mar 2022]
CoderTalker (3D) and SadTalker (2D) 双雄!
Talking-Face-Generation系列之CodeTalker[CVPR2023]
潜力股
Generating Holistic 3D Human Motion from Speech
读论文、得思想
读论文《Generating Holistic 3D Human Motion from Speech》
解决数据匮乏。数据集是来源于公开演讲视频,总共有四个人,26.9小时时长。视频被切分为多个片段,每个片段小于10秒,分为了若干组。
对于身体和手部,设计了一个基于VQ-VAE的新框架,用于学习组合离散运动空间,同时也带来了随机灵活性。
3Dmesh。inference 速度存疑?
[MIT License]
Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation
S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation
先看看 inference 的效率如何?