HMM隐马尔科夫模型 MATLAB 工具包对各种数据的处理
HMM 工具包下载地址:
工具包使用说明:
接下来简单叙述一下如何写data
1、data是一维数据、每一组训练样例序列长度一致。
O = 3; Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q)); obsmat0 = mk_stochastic(rand(Q,O));%Now we sample nex=20 sequences of length T=10 each from this model, to use as training data.
T=10; nex=20; data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);%Here data is 20x10. Now we make a random guess as to what the parameters are,
prior1 = normalise(rand(Q,1));
transmat1 = mk_stochastic(rand(Q,Q)); obsmat1 = mk_stochastic(rand(Q,O));%and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)
%loglik 即用来预测测试数据的相似程度 越大越相似 0为最大
2、data是多维数据、每一组训练样例序列长度一致。
%Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.
O = 2;
T = 50;
nex = 50;
data = randn(O,T,nex);
%Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using K-means.
M = 2; Q = 2;
left_right = 0;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
%Finally, let us improve these parameter estimates using EM.
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
说明这里的数组格式是O*T*nex举个例子这个数组是怎么存的
data0=[x,y,z];data0 是三维数据,供T*nex行,1~T行为nex=1的数据,T+1~2*T行为nex=2的数 %据,以此类推
data = randn(O,T,nex);
index=1;
for k=1:nex
for j=1:T
data(:,j,k)=data0(index,:);
index=index+1;
end
end
%按照上述这样将data0写入data即可
%新的数据查看与这个模型的相似程度,即分类
loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);
3、data是多维数据、并且每一组训练样例序列长度一致,即HMM如何处理长度不一致数据。
这种情况还是很常见的,例如采集一组连续语音信号,但每次采集得到的长度(帧数)不一致。
假如数据维度为O维,帧数为T(每一组肯能都不一致),NEX为训练数据数目。
步骤1、按照O*T存成NEX行cell类型数据(这里命名为cell_data),例如我的cell_data截图
我的单个数据为8维,供4组训练数据,每一组训练数据取得序列长度不一致。
步骤2、训练代码
O = 8;%维度
M = 2;
Q = 3;
train_num = 4;
data =[];
% initial guess of parameters
cov_type = 'full';
% initial guess of parameters
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
for train_len = 1 : train_num
data = [data(:, 1 : end), cell_data{train_len}];
end
[mu0, Sigma0] = mixgauss_init(Q*M, data, cov_type);
mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));
[LL, HMM.prior, HMM.transmat, HMM.mu, HMM.Sigma, HMM.mixmat] = ...