SWivid/F5-TTS
SWivid/F5-TTS
Folders and files
Name | ||
---|---|---|
2 weeks ago
|
||
last month
|
||
last month
|
||
3 days ago
|
||
3 weeks ago
|
||
3 weeks ago
|
||
3 weeks ago
|
||
last month
|
||
last month
|
||
2 days ago
|
||
3 days ago
|
||
last month
|
Repository files navigation
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.
E2 TTS: Flat-UNet Transformer, closest reproduction from paper.
Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance
- 2024/10/08: F5-TTS & E2 TTS base models on 🤗 Hugging Face, 🤖 Model Scope, 🟣 Wisemodel.
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts
# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.0+cu118 torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
Then you can choose from a few options below:
pip install git+https://github.com/SWivid/F5-TTS.git
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive # (optional, if need bigvgan)
pip install -e .
If initialize submodule, you should add the following code at the beginning of src/third_party/BigVGAN/bigvgan.py
.
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
# Build from Dockerfile
docker build -t f5tts:v1 .
# Or pull from GitHub Container Registry
docker pull ghcr.io/swivid/f5-tts:main
Currently supported features:
- Basic TTS with Chunk Inference
- Multi-Style / Multi-Speaker Generation
- Voice Chat powered by Qwen2.5-3B-Instruct
- Custom inference with more language support
# Launch a Gradio app (web interface)
f5-tts_infer-gradio
# Specify the port/host
f5-tts_infer-gradio --port 7860 --host 0.0.0.0
# Launch a share link
f5-tts_infer-gradio --share
# Run with flags
# Leave --ref_text "" will have ASR model transcribe (extra GPU memory usage)
f5-tts_infer-cli \
--model "F5-TTS" \
--ref_audio "ref_audio.wav" \
--ref_text "The content, subtitle or transcription of reference audio." \
--gen_text "Some text you want TTS model generate for you."
# Run with default setting. src/f5_tts/infer/examples/basic/basic.toml
f5-tts_infer-cli
# Or with your own .toml file
f5-tts_infer-cli -c custom.toml
# Multi voice. See src/f5_tts/infer/README.md
f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml
- In order to have better generation results, take a moment to read detailed guidance.
- The Issues are very useful, please try to find the solution by properly searching the keywords of problem encountered. If no answer found, then feel free to open an issue.
Read training & finetuning guidance for more instructions.
# Quick start with Gradio web interface
f5-tts_finetune-gradio
Use pre-commit to ensure code quality (will run linters and formatters automatically)
pip install pre-commit
pre-commit install
When making a pull request, before each commit, run:
pre-commit run --all-files
Note: Some model components have linting exceptions for E722 to accommodate tensor notation
- E2-TTS brilliant work, simple and effective
- Emilia, WenetSpeech4TTS valuable datasets
- lucidrains initial CFM structure with also bfs18 for discussion
- SD3 & Hugging Face diffusers DiT and MMDiT code structure
- torchdiffeq as ODE solver, Vocos as vocoder
- FunASR, faster-whisper, UniSpeech for evaluation tools
- ctc-forced-aligner for speech edit test
- mrfakename huggingface space demo ~
- f5-tts-mlx Implementation with MLX framework by Lucas Newman
- F5-TTS-ONNX ONNX Runtime version by DakeQQ
If our work and codebase is useful for you, please cite as:
@article{chen-etal-2024-f5tts,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
journal={arXiv preprint arXiv:2410.06885},
year={2024},
}
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.
About
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)