基于各种分类算法的说话人识别(年龄段识别)
基于各种分类算法的语音分类(年龄段识别)
概述
实习期间作为帮手打杂进行了一段时间的语音识别研究,内容是基于各种分类算法的语音的年龄段识别,总结一下大致框架,基本思想是:
-
获取语料库
TIMIT -
提取数据特征,进行处理
MFCC/i-vector
LDA/PLDA/PCA -
语料提取,基于分类算法进行分类
SVM/SVR/GMM/GBDT...
用到的工具有HTK(C,shell)/Kaldi(C++,shell)/LIBSVM(Python)/scikit-learn(Python)
获取语料库
TIMIT语料库 http://www.cnblogs.com/welen/p/3782804.html
PS:
- TIMIT的语料语音(即子文件夹下的WAV文件)是SPHERE文件,可以用Kaldi转换
- TIMIT/DOC/SPKRINFO.TXT中为speaker信息,作为分类条件
提取数据特征,进行处理
将SPHERE文件转换为WAV文件
Kaldi中tools下有SPHERE文件转换工具sph2pipe.exe
cd kaldi/kaldi-trunk/tools/sph2pipe_v2.5/
转换方法
sph2pipe -f wav sourcefile targetfile
用re_sph2pipe.py脚本生成sph2pipe转换文件
#encoding="utf-8"
import os
import os.path
rootdir = "E:/vc/TIMIT"
timitpath = "/home/zhangzd/kaldi/kaldi-trunk/TIMIT"
sph2pipepath = "/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe"
f = open('E:/vc/data/mfcc/make_sph2pipe_file.txt','w')
for root,dirs,files in os.walk(rootdir):
for fn in files:
if fn[len(fn)-3:len(fn)]=='WAV':
sourcefile = timitpath+root[len(rootdir):]+"/"+fn
targetfile = root[len(root)-5:len(root)]+"_"+fn
s = sph2pipepath + " -f wav " + sourcefile+" "+targetfile+"\n"
f.write(s.replace('\\','/'))
f.close()
得到的转换文件make_sph2pipe_file.txt如下
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/CONVERT/SA1.WAV NVERT_SA1.WAV
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA1.WAV FAKS0_SA1.WAV
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA2.WAV FAKS0_SA2.WAV
...
最后在linux下执行shell命令
#!bin/sh
while read line
do
echo $line
done make_sph2pipe_file.txt
PS:
f.write(s.replace('\\','/'))
是因为在windows下用\\
表示路径,在linux下用/
表示
在Kaldi中生成MFCC特征
解析/home/zhangzd/kaldi/kaldi-trunk/egs/wsj/s5/steps/make_mfcc.sh
中提取特征代码为
$cmd JOB=1:$nj $logdir/make_mfcc_${name}.JOB.log \
compute-mfcc-feats --verbose=2 --config=$mfcc_config \
scp,p:$logdir/wav_${name}.JOB.scp ark:- \| \
copy-feats --compress=$compress ark:- \
ark,scp:$mfccdir/raw_mfcc_$name.JOB.ark,$mfccdir/raw_mfcc_$name.JOB.scp \
|| exit 1;
即生成MFCC命令为
compute-mfcc-feats --verbose=2 --config=config.txt scp,p:scp.txt ark:-|copy-feats ark:- ark,scp:mfcc.ark,mfcc.scp
config.txt格式为
--use-energy=false # only non-default option.
...
scp.txt格式为
FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/FAKS0_SA1.WAV
mfcc.scp格式为
FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/mfcc.ark
mfcc.ark会自动生成
HTK中生成MFCC特征
HTK更为简单
HCopy -c config.txt -S scp.txt
config.txt格式为
SOURCEFORMAT = WAV # Gives the format of the speech files
TARGETKIND = MFCC_0_D_A # Identifier of the coefficients to use
# Unit = 0.1 micro-second :
WINDOWSIZE = 250000.0 # = 25 ms = length of a time frame
TARGETRATE = 100000.0 # = 10 ms = frame periodicity
NUMCEPS = 12 # Number of MFCC coeffs (here from c1 to c12)
USEHAMMING = T # Use of Hamming function for windowing frames
PREEMCOEF = 0.97 # Pre-emphasis coefficient
NUMCHANS = 26 # Number of filterbank channels
CEPLIFTER = 22 # Length of cepstral liftering
ENORMALIZE = T
scp.txt格式为
E:\vc\data\timit\FADG0_SA1.WAV E:\vc\data\mfcc\FADG0_SA1.mfcc
E:\vc\data\timit\FADG0_SA2.WAV E:\vc\data\mfcc\FADG0_SA2.mfcc
E:\vc\data\timit\FADG0_SI1279.WAV E:\vc\data\mfcc\FADG0_SI1279.mfcc
...
其他
- i-vector
- vad