【sphinx]声学模型训练流程学习

一 训练流程逐步追踪

 ./script_pl/make_feats.pl -ctl test7_train.fileids   //由wav文件转为mfc文件

./script_pl/RunAll.pl                                             //开始训练

如下是RunAll.pl中的流程,我们挨个分析

("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",    //检查各个训练文件的格式是否正常   
     "$ST::CFG_SCRIPT_DIR/01.lda_train/slave_lda.pl",
     "$ST::CFG_SCRIPT_DIR/02.mllt_train/slave_mllt.pl",
     "$ST::CFG_SCRIPT_DIR/05.vector_quantize/slave.VQ.pl",
     "$ST::CFG_SCRIPT_DIR/10.falign_ci_hmm/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/11.force_align/slave_align.pl",
     "$ST::CFG_SCRIPT_DIR/12.vtln_align/slave_align.pl",
     "$ST::CFG_SCRIPT_DIR/20.ci_hmm/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/30.cd_hmm_untied/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/40.buildtrees/slave.treebuilder.pl",
     "$ST::CFG_SCRIPT_DIR/45.prunetree/slave.state-tying.pl",
     "$ST::CFG_SCRIPT_DIR/50.cd_hmm_tied/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/60.lattice_generation/slave_genlat.pl",
     "$ST::CFG_SCRIPT_DIR/61.lattice_pruning/slave_prune.pl",
     "$ST::CFG_SCRIPT_DIR/62.lattice_conversion/slave_conv.pl",
     "$ST::CFG_SCRIPT_DIR/65.mmie_train/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/90.deleted_interpolation/deleted_interpolation.pl",

 

("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",    //检查各个训练文件的格式是否正常 ,其中各个步骤含义如下:
MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file. //检查词典中的音素是否都在phonelist中
        Found 1485 words using 65 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary//检查是否词典中有重复的词条,此处注意如果是多音字,也要区别写为 【曾(2) z eng】,加个(2)标记
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist //检查test.fileids文件是否存在
    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file//transcription中的行数必须和fileids中的行数是一致的
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.估计训练数据的规模,看训练n状态模型是否可行
        Estimated Total Hours Training: 0.470841666666667  估计时长:47分钟
        This is a small amount of data, no comment at this time  数据量有点小,此处先不做评论
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary //检查是否transcription中所有单词都在词典中
        Words in dictionary: 1482   词典中单词有1482个
        Words in filler dictionary: 3   三个环境噪音:sil <s>  </s>
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once   //检查transcription中所有音素都在音素表里,并且至少出现了一次(至少有一个例子)

 

以下步骤由于sphinxtrain.cfg中的设置,这些步骤都跳过,应该在训练别的格式的模型中会使用到
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped:  $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped:  $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped:  $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped:  $ST::CFG_VTLN set to 'no' in sphinx_train.cfg

如下开始走20step,开始做迭代训练,

sphinxtrain.cfg的设置

$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don't iterate more than this, somethings likely wrong.

是说最少迭代一次,最多迭代10 次,这个应该后续可以调整,训练收敛的条件,一般一个是达到迭代次数,一个是每次变化的阈值接近一个值,比如0.0001,发现再无更大变化,则训练就停止。一般经验是迭代次数越多,模型越精确??

MODULE: 20 Training Context Independent models   //训练上下文无关的模型,也就是单个gram模型
    Phase 1: Cleaning up directories:  设置一些目录
    accumulator...logs...qmanager...models...
    Phase 2: Flat initialize   平滑初始化
mk_mdef_gen 这个步骤好像是检查phonelist中的每个phone,是否有triphone,所以phonelist中可以是填写trinphone的格式吧,以后训练时候

INFO: cmd_ln.c(691): Parsing command line:
/home/lijieqiong/sphinx/mytrain/data7/bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.phonelist \
-ocimdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-n_state_pm 3

mk_flat 定义了模型训练过程中的混合权重,转换矩阵,定义了transition_matrices [65x3x4 array]总共是65个音素,所以总共是65个模型,每个模型最多4个状态。一般是3状态。mixture_weights [195x1x1 array],每个模型3个状态,65*3=总共有195个连接状态。

data7/bin/mk_flat \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-topo /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.topology \
-mixwfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/mixture_weights \
-tmatfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/transition_matrices \
-nstream 1 \
-ndensity 1

init_gau 根据训练文件定义的文件列表,对每个文件的feat做分析,每个feat中有13维的特征,计算一个均值估计,将结果写到bwaccumdir/test_buff_1/gauden_count

bin/init_gau \

-ctlfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.fileids \
-part 1 \
-npart 1 \
-cepdir /home/lijieqiong/sphinx/mytrain/data7/feat \
-cepext mfc \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-agc none \
-cmn current \
-varnorm no \
-feat 1s_c_d_dd \
-ceplen 13

norm  将上个均值,做归一化,即为全局均值,写到test.ci_cont_flatinitial/globalmean

bin/norm \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-meanfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/globalmean

init_gau  根据上文的全局均值,初始化方差

norm  归一化得到全局方差

cp_parm 65*3,得到一个均值和方差

cp_parm


    Phase 3: Forward-Backward   前向后向
        Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1) 开始训练一个高斯模型,进程从0-100
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
        Normalization for iteration: 1  做归一化
        Current Overall Likelihood Per Frame = -7.34599977581518  当前总体每帧的似然度
        Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)   1高斯训练的第二次迭代
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 14 ERROR messages and 0 WARNING messages.  Please check the log file for details. 14个错误
        Normalization for iteration: 2
        Current Overall Likelihood Per Frame = -7.08191351156474
        Convergence Ratio = 0.264086264250444
        Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 42 ERROR messages and 0 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 3
        Current Overall Likelihood Per Frame = -1.99339567581038
        Convergence Ratio = 5.08851783575436
        Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 70 ERROR messages and 0 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 4
        Current Overall Likelihood Per Frame = 0.331860199053862
        Convergence Ratio = 2.32525587486424
        Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 82 ERROR messages and 0 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 5
        Current Overall Likelihood Per Frame = 1.5098667300257
        Convergence Ratio = 1.17800653097184
        Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 94 ERROR messages and 1 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 6
WARNING: This step had 0 ERROR messages and 3 WARNING messages.  Please check the log file for details.
        Current Overall Likelihood Per Frame = 2.41324954333952
        Convergence Ratio = 0.903382813313822
        Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 100 ERROR messages and 1 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 7
WARNING: This step had 0 ERROR messages and 3 WARNING messages.  Please check the log file for details.
        Current Overall Likelihood Per Frame = 3.09044610173611
        Convergence Ratio = 0.677196558396586
        Baum welch starting for 1 Gaussian(s), iteration: 8 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 110 ERROR messages and 1 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 8
WARNING: This step had 0 ERROR messages and 3 WARNING messages.  Please check the log file for details.
        Current Overall Likelihood Per Frame = 3.71101464540463
        Convergence Ratio = 0.620568543668517
        Baum welch starting for 1 Gaussian(s), iteration: 9 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 114 ERROR messages and 1 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 9
WARNING: This step had 0 ERROR messages and 3 WARNING messages.  Please check the log file for details.
        Current Overall Likelihood Per Frame = 4.15122418167385
        Convergence Ratio = 0.440209536269223
        Baum welch starting for 1 Gaussian(s), iteration: 10 (1 of 1)
        0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
This step had 122 ERROR messages and 1 WARNING messages.  Please check the log file for details.
        Normalization for iteration: 10
WARNING: This step had 0 ERROR messages and 3 WARNING messages.  Please check the log file for details.
        Maximum desired iterations 10 performed. Terminating CI training   迭代次数到达10次,停止CI训练
        Training completed after 10 iterations

 接下来是,训练上下文相关的模型,就是ngram模型

Phase 1: Cleaning up directories:   依然是清空目录
    accumulator... logs... qmanager...  completed 
Phase 2: Initialization

mk_mdef_gen 初始化,这一步的工作看起来是整理训练文本中triphone和单词的格式,统计了有多少个triphone,有多少个单独单词

bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.phone \
-dictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.dic \
-fdictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.filler \
-lsnfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.transcription \
-ountiedmdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.untied.mdef \
-n_state_pm 

INFO: mk_mdef_gen.c(878): 65 n_base, 3316 n_tri,得到transcription中有这么多音素和triphone

init_mixw  这一步是copyCItoCD,将CI模型转换为CD模型,其中读取CI的权重方差均值,然后列出一些模型个数,变成CD的时候,个数变了,这里不懂原理

INFO: model_def_io.c(588): 65 total models defined (65 base, 0 tri)
INFO: model_def_io.c(589): 260 total states
INFO: model_def_io.c(590): 195 total tied states

如上,因为有65个音素,所以是65个状态,每个音素4个状态,所以是65*4.每个音素之间有3个转移状态,所以是65*3的连接状态
如下,是65+3316=3381.这里是单个音素的模型和triphone模型加在一起了。

INFO: model_def_io.c(588): 3381 total models defined (65 base, 3316 tri)
INFO: model_def_io.c(589): 13524 total states
INFO: model_def_io.c(590): 10143 total tied states


Phase 3: Forward-Backward  开始前后向的迭代训练

Baum welch starting for iteration: 1 (1 of 1)

bw

Normalization for iteration: 1

norm

Current Overall Likelihood Per Frame = 2.99562145055707

Baum welch starting for iteration: 2 (1 of 1)

bw

Normalization for iteration: 2

norm

Current Overall Likelihood Per Frame = 10.2063900810682

Convergence Ratio = 2.40710274963821

Baum welch starting for iteration: 3 (1 of 1)

bw

如上,BW迭代,在做归一,再BW迭代。每次是在上一次基础上的迭代。每次递增每一帧的总体似然度。此处报一个错误

utt> 59 60 1608INFO: cmn.c(175): CMN: 10.64 -0.19 -0.10 0.05 -0.30 -0.07 -0.05 -0.04 -0.08 -0.05 -0.14 -0.13 -0.09
0 288 2 ERROR: "backward.c", line 430: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 331: 60 ignored  第60个句子,audio和transcript无法对应上。搜索时候最后一个状态无法到达。这个问题一般是由于录音和文本不对应,录音中多了内容文本中缺乏导致。可是听了音对比文件也没发现有明显缺失。这个问题再议。。

而后,开始下一步,树的生成了。

MODULE: 40 Build Trees  建立决策树

Phase 1: Cleaning up old log files...

Phase 2: Make Questions 准备问题集

make_quests 如下,对3个状态,每个状态准备20个问题。

INFO: main.c(1108): Done building questions using state 0
INFO: main.c(1109): 20 questions from state 0
INFO: main.c(1108): Done building questions using state 1
INFO: main.c(1109): 20 questions from state 1
INFO: main.c(1108): Done building questions using state 2
INFO: main.c(1109): 21 questions from state 2
INFO: main.c(1114): Stored questions in /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions

生成的问题集写到model参数目录中,如下格式:

QUESTION9  iong k n ui uxn w,似乎含义是,这一条问题,要回答其中的音素是否是后面这些中的一个?还是组合呢?这里不懂。

 

Phase 3: Tree building 开始建设决策树,这里是对每个音素的每个状态建立一个决策树

Processing each phone with each state

a 0  对a音素的第0个状态建立决策树,涉及到遍历问题集,回答问题得到值,结点的劈分,最大结点数有定义,为7.如此,应该是将这个音素存在的单词作为训练集合,其中回答各种问题,形成决策树,每一枝干都是一个问题的劈分。有未剪枝和剪枝之分。

bldtree

completed

a 1

bldtree

completed

a 2

bldtree

completed

ai 0

bldtree

completed

ai 1

bldtree

completed

ai 2 

接下来对决策树做剪枝

MODULE: 45 Prune Trees (2015-09-23 15:07)

mk_mdef_gen 
Phase 1: Tree Pruning  首先,对树做剪枝
prunetree 

bin/prunetree \
-itreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.unpruned \
-nseno 1000 \
-otreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions \
-minocc 0      将每个音素的结点,缩减,达到剪枝的效果。 


Phase 2: State Tying   其次,状态连接起来
tiestate  这一步好像是统计了每个音素的偏移量 

bin/tiestate \
-imoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-omoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.1000.mdef \
-treedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions

然后开始训练独立于上下文的模型。

Phase 1: Cleaning up directories:
    accumulator... logs... qmanager...  completed 
Phase 2: Copy CI to CD initialize
Phase 3: Forward-Backward   前向后向训练,BW迭代,似乎是迭代了7次,然后高斯劈分为2高斯,再迭代7次,在劈分为4个高斯,再迭代
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 2.99562145055707
。。。。。。。。。。。。。。。。。。。。。。
到最终,宣布迭代完毕,训练完毕。

Current Overall Likelihood Per Frame = 17.2036627024291

Convergence ratio = 0.0693319838056503
Likelihoods have converged! Baum Welch training completed!
******************************TRAINING COMPLETE*************************

自此,模型训练完毕,如下流程,在初始设置中是跳过的。

MODULE: 60 Lattice Generation (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg


MODULE: 61 Lattice Pruning (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg


MODULE: 62 Lattice Format Conversion (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg


MODULE: 65 MMIE Training (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to 'no' in sphinx_train.cfg


MODULE: 90 deleted interpolation (2015-09-23 15:10)

Skipped for continuous models

 

 
posted @ 2015-09-23 19:02  luoyinqq  阅读(1554)  评论(0编辑  收藏  举报