[Avatar] Talking Face Dataset and Solutions
数据集
Ref: FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Given the raw audio input and a neutral 3D face mesh, our proposed end-to-end Transformer-based architecture, dubbed FaceFormer, can autoregressively synthesize a sequence of realistic 3D facial motions with accurate lip movements.
Dependencies
-
- Check the required Python packages in
requirements.txt
. - ffmpeg
- MPI-IS/mesh # This package contains core functions for manipulating meshes and visualizing them. It requires
Python 3.5+
and is supported on Linux and macOS operating systems.
- Check the required Python packages in
Data
VOCASET
Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy
, raw_audio_fixed.pkl
, templates.pkl
and subj_seq_to_idx.pkl
in the folder VOCASET
. Download "FLAME_sample.ply" from voca and put it in VOCASET/templates
.
# VOCASET is a 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio. The dataset has 12 subjects and 480 sequences of about 3-4 seconds each with sentences chosen from an array of standard protocols that maximize phonetic diversity.
BIWI
Request the BIWI dataset from Biwi 3D Audiovisual Corpus of Affective Communication. The dataset contains the following subfolders:
-
-
- 'faces' contains the binary (.vl) files for the tracked facial geometries.
- 'rigid_scans' contains the templates stored as .obj files.
- 'audio' contains audio signals stored as .wav files.
-
Place the folders 'faces' and 'rigid_scans' in BIWI
and place the wav files in BIWI/wav
.
该语料库总共包含 14 名英语母语者(6 名男性和 8 名女性)说出的 1109 个句子。
We gratefully acknowledge ETHZ-CVL for providing the B3D(AC)2 database and MPI-IS for releasing the VOCASET dataset. The implementation of wav2vec2 is built upon huggingface-transformers, and the temporal bias is modified from ALiBi. We use MPI-IS/mesh for mesh processing and VOCA/rendering for rendering. We thank the authors for their excellent works. Any third-party packages are owned by their respective authors and must be used under their respective licenses.
CodeTaker也用一样的数据集。
Real-Time Talking Face方案
https://github.com/lipku/metahuman-stream/tree/main
有趣的Talking Face方案
wav2lip不用多说,但效果确实不太好。
OpenTalker
腾讯的数字人作品:
video-retalker, 嘴比较大。但故意不给训练代码,只有推断代码。
MuseV
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
MuseTaker
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Link: https://www.youtube.com/watch?v=L5g5tuH_4a8, 也可以 换嘴?
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations
Author: Huawei Wei, Zejun Yang, Zhisheng Wang
Organization: Tencent Games Zhiji, Tencent
Link: https://github.com/Zejun-Yang/AniPortrait
2. EMOPortraits
This repo reserved for official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars. The code is planned to be released in July 2024.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
· Pantheons:用 TypeScript 打造主流大模型对话的一站式集成库