Coqui TTS合成语音
工具介绍
Coqui TTS是一个用于语音转文本的高性能深度学习模型库。提供1100种语言的预训练模型,提供训练新模型和微调已有模型的工具,提供数据集分析工具。XTTS-v2版本支持16种语言: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)。
安装步骤
- conda安装参考 python环境搭建
- conda create -n coqui
创建虚拟环境
- conda activate coqui
进入虚拟环境
- conda install python=3.9.20
安装python>= 3.9,< 3.12
- pip install pypinyin
合成中文语音依赖库
- pip install numpy
依赖库
- pip install sounddevice
- pip install TTS
安装Coqui TTS
- 如果TTS安装报错
Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools"
,可用使用它的分支项目安装pip install coqui-tts
体验功能
- 检查支持的语言:
- tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --list_language_idx
- 同意
Otherwise, I agree to the terms of the non-commercial CPML: https://coqui.ai/cpml
条款, 输入Y - 输出支持的语言:
Available language ids: (Set --language_idx flag to one of these values to use the multi-lingual model. ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja', 'hi']
- 检查支持的播报员:
- tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --list_speaker_idx
- 合成案例
- 合成中文语音
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "国家粮食和物资储备局29日发布数据显示:截至目前,全国累计收购秋粮1.2亿吨,收购进度快于上年,收购工作进展顺利。" --speaker_idx "Ana Florence" --language_idx zh --use_cuda true
- 指定音色文件合成
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "国家粮食和物资储备局29日发布数据显示:截至目前,全国累计收购秋粮1.2亿吨,收购进度快于上年,收购工作进展顺利。" --speaker_idx "Ana Florence" --language_idx zh --speaker_wav e:/source.mp3 --use_cuda true
- 合成英文语音
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "TTS is a library for advanced Text-to-Speech generation.TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices." --speaker_idx "Ana Florence" --language_idx en --use_cuda true
程序合成案例
# -*- coding: UTF-8 -*-
import torch
from TTS.api import TTS
import numpy as np
import sounddevice as sd
import soundfile as sf
from datetime import datetime
device = "cuda" if torch.cuda.is_available() else "cpu"
# 列出可用模型
print(TTS().list_models())
print("开始初始化模型:", datetime.now())
# tts_models/multilingual/multi-dataset/xtts_v2是模型标识
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
print("初始化模型完成:", datetime.now())
# 从文件中读取文本
with open('demo.txt','r',encoding='utf-8') as source_file:
content = source_file.read()
print("文本读取完成:", datetime.now())
# 参考语音文件,要模仿的音色
source_wav = 'source.mp3'
generated_voide = 'generated_voice.wav'
# 文本生成语音
wav = tts.tts(text=content, speaker_wav=source_wav, language="zh")
# 播放语音
rate = 22050
sd.play(wav, rate)
# 等待播放结果
sd.wait()
# 保存为文件
sf.write(generated_voide, wav, rate)
# 文本转为语音文件直接保存
tts.tts_to_file(text=content, speaker_wav=source_wav, language="zh", file_path="example.wav")