语音相关dataset
http://www.openslr.org/resources.php
https://test.data-baker.com/#/data/index/compose
22050 Hz sampling rate.
*Total Clips | 13,100 |
---|---|
Total Words | 225,715 |
Total Characters | 1,308,678 |
*Total Duration | 23:55:17 |
Mean Clip Duration | 6.57 sec |
Min Clip Duration | 1.11 sec |
Max Clip Duration | 10.10 sec |
Mean Words per Clip | 17.23 |
Distinct Words | 13,821 |
- LibriTTS:https://openslr.org/60/
585 hours of read English speech at 24kHz sampling rate.
- VCTK(TORCHAUDIO.DATASETS.VCTK):https://datashare.ed.ac.uk/handle/10283/3443
110 English speakers with various accents. Each speaker reads out about 400 sentences. roughly 44 hours
- AISHELL-3:http://www.openslr.org/93/
roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances.
-
CSMSC:https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar (from https://github.com/jerrykuo7727/TTS-Taiwanese/blob/master/local/data_download.sh)
-
闽南语:
Suisiann:https://suisiann-dataset.ithuan.tw/
Taiwanese Across Taiwan (TAT) Corpora:https://ieeexplore.ieee.org/abstract/document/9295019
https://www.colips.org/conferences/iscslp2022/Proceedings/papers/ISCSLP2022_P078.pdf
https://ieeexplore.ieee.org/abstract/document/9764940
CMU Wildness:https://ieeexplore.ieee.org/abstract/document/8683536/
common voice:https://commonvoice.mozilla.org/zh-CN/datasets
Olr 2021:https://arxiv.org/pdf/2107.11113 -
ASR:需要至少2000小时