一、离散输出的隐马尔科夫模型(DHMM,HMM with discrete outputs)
最大似然参数估计EM(Baum Welch算法)
The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic matrices as follows.
O = 3;
Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
Now we sample nex=20 sequences of length T=10 each from this model, to use as training data.
T=10; %序列长度
nex=20; %样本序列数目
data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);
Here data is 20x10. Now we make a random guess as to what the parameters are,
prior1 = normalise(rand(Q,1)); %初始状态概率
transmat1 = mk_stochastic(rand(Q,Q)); %初始状态转移矩阵
obsmat1 = mk_stochastic(rand(Q,O)); %初始观测状态到隐藏状态间的概率转移矩阵
and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);
%prior2, transmat2, obsmat2 为训练好后的初始概率,状态转移矩阵及混合状态概率转移矩阵
LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.
序列分类
To evaluate the log-likelihood of a trained model given test data, proceed as follows:
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2) %HMM测试
Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence data cannot contain any 0s.
To classify a sequence into one of k classes, train up k HMMs, one per class, and then compute the log-likelihood that each model gives to the test sequence; if the i'th model is the most likely, then declare the class of the sequence to be class i.
Computing the most probable sequence (Viterbi)
First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:
B = multinomial_prob(data, obsmat);
Then you can use
[path] = viterbi_path(prior, transmat, B)
二、具有高斯混合输出的隐马尔科夫模型(GHMM,HMM with mixture of Gaussians outputs)
Maximum likelihood parameter estimation using EM (Baum Welch)
Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.
O = 2;
T = 50;
nex = 50;
data = randn(O,T,nex);
Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using K-means.
M = 2;
Q = 2;
left_right = 0;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
Finally, let us improve these parameter estimates using EM.
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);
Since EM only finds a local optimum, good initialisation is crucial. The initialisation procedure illustrated above is very crude, and is probably not adequate for real applications... Click here for a real-world example of EM with mixtures of Gaussians using BNT.
What to do if the log-likelihood becomes positive?
It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).
This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.
What to do if the log-likelihood decreases during EM?
Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease. This suggests that one of your mixture components is not getting enough data. Try a better initialization or fewer clusters (states).
What to do if the covariance matrix becomes singular?
Estimates of the covariance matrix often become singular if you have too little data, or if too few points are assigned to a cluster center due to a bad initialization of the means. In this case, you should constrain the covariance to be spherical or diagonal, or adjust the prior (see below), or try a better initialization.
How do I add a prior to the covariance matrix?
Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior is initialized to 0.01*I. This is added to the maximum likelihood estimate after every M step. To change this, you will need to modify the mhmm_em function so it calls mixgauss_Mstep with a different value.
Sequence classification
To classify a sequence (e.g., of speech) into one of k classes (e.g., the digits 0-9), proceed as in the DHMM case above, but use the following procedure to compute likelihood:
loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);
Computing the most probable sequence (Viterbi)
First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:
B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);
where data(:,:,ex) is OxT where O is the size of the observation vector. Finally, use
[path] = viterbi_path(prior, transmat, B);
三、具有高斯输出的HMM
This is just like the mixture of Gaussians case, except we have M=1, and hence there is no mixing matrix.
Online EM for discrete HMMs/ POMDPs
For some applications (e.g., reinforcement learning/ adaptive control), it is necessary to learn a model online. The script dhmm_em_online_demo gives an example of how to do this.