[Avatar] Talking Face Dataset and Solutions

 

数据集

Ref: FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Given the raw audio input and a neutral 3D face mesh, our proposed end-to-end Transformer-based architecture, dubbed FaceFormer, can autoregressively synthesize a sequence of realistic 3D facial motions with accurate lip movements.

 

Dependencies

    • Check the required Python packages in requirements.txt.
    • ffmpeg
    • MPI-IS/mesh  # This package contains core functions for manipulating meshes and visualizing them. It requires Python 3.5+ and is supported on Linux and macOS operating systems.

Data

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npyraw_audio_fixed.pkltemplates.pkl and subj_seq_to_idx.pkl in the folder VOCASET. Download "FLAME_sample.ply" from voca and put it in VOCASET/templates.

# VOCASET is a 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio. The dataset has 12 subjects and 480 sequences of about 3-4 seconds each with sentences chosen from an array of standard protocols that maximize phonetic diversity.

BIWI

Request the BIWI dataset from Biwi 3D Audiovisual Corpus of Affective Communication. The dataset contains the following subfolders:

      • 'faces' contains the binary (.vl) files for the tracked facial geometries.
      • 'rigid_scans' contains the templates stored as .obj files.
      • 'audio' contains audio signals stored as .wav files.

Place the folders 'faces' and 'rigid_scans' in BIWI and place the wav files in BIWI/wav.

该语料库总共包含 14 名英语母语者(6 名男性和 8 名女性)说出的 1109 个句子

 

We gratefully acknowledge ETHZ-CVL for providing the B3D(AC)2 database and MPI-IS for releasing the VOCASET dataset. The implementation of wav2vec2 is built upon huggingface-transformers, and the temporal bias is modified from ALiBi. We use MPI-IS/mesh for mesh processing and VOCA/rendering for rendering. We thank the authors for their excellent works. Any third-party packages are owned by their respective authors and must be used under their respective licenses.

 

CodeTaker也用一样的数据集。

 

 

Real-Time Talking Face方案

https://github.com/lipku/metahuman-stream/tree/main

 

 

有趣的Talking Face方案

wav2lip不用多说,但效果确实不太好。


OpenTalker

腾讯的数字人作品:

video-retalker, 嘴比较大。但故意不给训练代码,只有推断代码。

 

MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

MuseTaker

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Link: https://www.youtube.com/watch?v=L5g5tuH_4a8, 也可以 换嘴?

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations

Author: Huawei Wei, Zejun Yang, Zhisheng Wang

Organization: Tencent Games Zhiji, Tencent

Link: https://github.com/Zejun-Yang/AniPortrait

 

 

 

 

 This repo reserved for official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars. The code is planned to be released in July 2024.

 
 
 
SyncTalk的前置:
 
 
 
 
 
 
 
 
 
 
 
 
posted @ 2023-09-07 16:29  郝壹贰叁  阅读(158)  评论(0编辑  收藏  举报