[Avatar] Avatar by ImgGen

Avatarify 神器,如何学习,并优化定制。

 

 

From StyleGan to DeepFake


如何训练

  • 硬件配置

训练styleGan需要多大显存?

整个项目花费了51个GPU年,总耗电量131.61MWh。 也就是单个Tesla V100 GPU需要51年,总耗电量13万度,单电费10万块钱。

 

StyleGAN2 需要的配置和基础?

如果只是run一下玩,3080肯定够,你这个青铜器应该是要自行训练了,官方代码里面要求最低16G显存,可能需要3090了。而且gan的训练,如果重头开始训练成本极高,楼主珍重。

 

  • 尝试训练 

deepfacelab【可以视频到视频自动训练并合成】

Everybody Can Make Deepfakes Now!【实时人脸deepfake才是重点】

How to Develop a Conditional GAN (cGAN) From Scratch【*****】

In this tutorial, you will discover how to develop a conditional generative adversarial network for the targeted generation of items of clothing.

大致了解一个生成clothing icon的实验。

 

 

如何使用

  • 真实脸重建

AI + Forensic Facial Reconstruction

Colorized statues

 

  • 蚂蚁嘿呀

这是一篇NIPS2019的论文,任务是让源图像(source image)中的物体按照驱动视频(driving video)中的运动而运动。

有开源的代码,地址:https://github.com/AliaksandrSiarohin/first-order-model

For detail: [NN] First Order Motion Model for Image Animation

 

  • 换脸,没前途;控制脸,则有 [***]

MegaPortraits: One-shot Megapixel Neural Head Avatars, 2022

  

 

修复脸部,变高清

论文集:https://github.com/TaoWangzj/Awesome-Face-Restoration

 

HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment

 

 

 

 

Taking Face


Start From Here 

MegaPortraits: One-shot Megapixel Neural Head Avatars

这篇的效果已达标,可惜不开源~

 

 

SadTalker

13 Mar 2023

SadTalker: https://github.com/Winfredy/SadTalker

[project] https://sadtalker.github.io/

[paper] https://arxiv.org/pdf/2211.12194.pdf

 

 

目录:Awesome Talking Face Generation

title-papercodedatasetkeywords 
Emotionally Enhanced Talking Face Generation   paper code CREMA-D emotion 画面不稳定
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video AAAI(23) paper code     嘴形不错,清晰度也不错
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior   paper code   3D 3D脸部模型。
GENEFACE: GENERALIZED AND HIGH-FIDELITY AUDIO-DRIVEN 3D TALKING FACE SYNTHESIS ICLR (23) paper code   NeRF 优化了嘴,耗时~
OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION   paper        
LipNeRF: What is the right feature space to lip-sync a NeRF?   paper     NeRF  
Audio-Visual Face Reenactment WACV (23) paper code     效果可以,7个月还没代码~
Towards Generating Ultra-High Resolution Talking-Face Videos With Lip Synchronization WACV (23) paper -      
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles AAAI(23) paper code      
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis   paper proj   Diffusion  
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation   paper proj   Diffusion

期待,但暂时没code。 mesh

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model   paper code   Diffusion  

 

  • 渐变效果

Ref: WIP: new Extension for frame-interpolations (F.I.L.M, InfiniteZoom...)

Ref: Loopback-Inpaint+GoogleFILM in SD-WebUI (<- Workflow)

Ref: https://github.com/google-research/frame-interpolation#see-windows_installation-for-windows-support

 

 

Paper: Real-time Talking Face

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

For more details: [GenerativeAI] Avatar solutions

 

FaceFormer: Speech-Driven 3D Facial Animation with Transformers [11 Mar 2022]

CoderTalker (3D) and SadTalker (2D) 双雄!

Talking-Face-Generation系列之CodeTalker[CVPR2023]

https://colab.research.google.com/github/Doubiiu/CodeTalker/blob/main/demo.ipynb#scrollTo=Rgkt7SA7vdSU

 

潜力股

Generating Holistic 3D Human Motion from Speech

读论文、得思想

读论文《Generating Holistic 3D Human Motion from Speech》

解决数据匮乏。数据集是来源于公开演讲视频,总共有四个人,26.9小时时长。视频被切分为多个片段,每个片段小于10秒,分为了若干组。

对于身体和手部,设计了一个基于VQ-VAE的新框架,用于学习组合离散运动空间,同时也带来了随机灵活性。

3Dmesh。inference 速度存疑?

 

[MIT License]

Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

S2L-S2D: Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

先看看 inference 的效率如何?

 

posted @ 2021-10-29 19:48  郝壹贰叁  阅读(365)  评论(0编辑  收藏  举报