HMM学习(5)-Forward Algorithm

HMM学习(5)-Forward Algorithm

分类: HMM学习 2007-12-20 16:52 1395人阅读 

 

原文:http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

 

 

 

Wangben at mcrc, hit, Harbin 2007.12.20

 

5 Forward Algorithm

5.1 Finding the probability of an observed sequence

1. Exhaustive search for solution

We want to find the probability of an observed sequence given an HMM - that is, the parameters ( ,A,B) are known. Consider the weather example; we have a HMM describing the weather and its relation to the state of the seaweed, and we also have a sequence of seaweed observations. Suppose the observations for 3 consecutive days are (dry,damp,soggy) - on each of these days, the weather may have been sunny, cloudy or rainy. We can picture the observations and the possible hidden states as a trellis.

 

我们想要找出观察序列在给定一个HMM之下(出现)的概率,也就是参数( ,A,B)都是已知的。考虑天气的模型;我们有一个HMM来描述天气以及它与海藻状态的关系,我们还有一个海藻的观察序列。假设连续3天的观察结果是(干,潮,湿)-这其中的每一天的天气可能是晴,雨,多云。我们能把观察结果和可能的隐状态表示为一个格状图。

 

 

Each column in the trellis shows the possible state of the weather and each state in one column is connected to each state in the adjacent columns. Each of these state transitions has a probability provided by the state transition matrix. Under each column is the observation at that time; the probability of this observation given any one of the above states is provided by the confusion matrix.

 

格子中的每一列显示了天气的可能状态,并且在某列中的每一个状态都与其临近状态相连接。每一个状态转换在状态转换矩阵中都有其概率。在每一列的下面是当时的观察结果;给定任何一个状态(观察结果之上的)产生这个观察结果的概率由混合矩阵提供。

 

 

It can be seen that one method of calculating the probability of the observed sequence would be to find each possible sequence of the hidden states, and sum these probabilities. For the above example, there would be 3^3=27 possible different weather sequences, and so the probability is

 

可以看出一种计算这个观察序列的方法可以是,找到每一个可能的隐状态序列,并把所有这些概率(隐状态序列产生该观察序列的概率)求和

 

 

Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny ,cloudy) + Pr(dry,damp,soggy | sunny,sunny ,rainy) + . . . . Pr(dry,damp,soggy | rainy, rainy ,rainy)

 

Calculating the probability in this manner is computationally expensive, particularly with large models or long sequences, and we find that we can use the time invariance of the probabilities to reduce the complexity of the problem.

 

 

按照这种方法的计算代价非常昂贵,尤其在大的模型或是较长的序列中,我们发现我们能够利用概率随时间恒定不变的性质来降低问题的复杂度。

2. Reduction of complexity using recursion

We will consider calculating the probability of observing a sequence recursively given a HMM. We will first define a partial probability, which is the probability of reaching an intermediate state in the trellis. We then show how these partial probabilities are calculated at times t=1 and t=n (> 1).

 

给定一个HMM,我们将考虑递归的计算观察一个序列的概率。我们将首先定义部分概率的概念,也就是在格子中到达中间状态的概率。然后,我们将演示这些部分概率是如何计算的(在时刻:at times t=1 and t=n (> 1)

Suppose throughout that the T-long observed sequence is

考虑一个长度为T的观察序列。

 

 


2a. Partial probabilities, ( 's)

Consider the trellis below showing the states and first-order transitions for the observation sequence dry, damp, soggy;

 

看下面的格状图,图中展示了观察序列(干,湿,潮)的状态以及一阶转换。

 

 

We can calculate the probability of reaching an intermediate state in the trellis as the sum of all possible paths to that state.

 

我们可以计算在格状图中到达一个中间状态的概率来作为到达该状态所有可能路径(这些路径产生观察结果的概率)的和。

 

 

For example, the probability of it being cloudy at t = 2 is calculated from the paths;

 

例如,在t = 2时为多云的情况下的概率可以从以下路径中计算出来;

 

 

We denote the partial probability of state j at time t as t ( j ) - this partial probability is calculated as;

 

我们把状态j在时刻t的部分概率记作 t ( j )——计算公式如下:

 

t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

 

The partial probabilities for the final observation hold the probability of reaching those states going through all possible paths - e.g., for the above trellis, the final partial probabilities are calculated from the paths :

 

最后观察结果的部分概率保存了所有到达这些状态的所有可能路径的概率——例如,在上面的格状图中,最后的部分概率从以下路径中计算出来:

 

 

 

It follows that the sum of these final partial probabilities is the sum of all possible paths through the trellis, and hence is the probability of observing the sequence given the HMM.

 

它的意思是,所有这些最后的部分概率的和是通过这个格状图的所有可能路径的和,因此也是给定这个HMM时,观察出这个序列的概率。

 

Section 3 introduces an animated example of the calculation of the probabilities.

 

第三部分介绍了一个计算这些概率的生动例子。也就是Example部分。

 

2b. Calculating 's at time t = 1

We calculate partial probabilities as : (部分概率的计算公式)

t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

In the special case where t = 1, there are no paths to the state. The probability of being in a state at t = 1 is therefore the initial probability, i.e. Pr( state | t = 1) = (state), and we therefore calculate partial probabilities at t = 1 as this probability multiplied by the associated observation probability;

在特殊的情况下,当t = 1,没有任何路径到达状态。所以在t = 1时刻的状态的概率是初始概率,也就是Pr( state | t = 1) = (state),因此我们在t = 1时刻把这个(初始)概率乘以与其关联的观察概率来计算其部分概率。

 

 

α:部分概率δ:观察概率π:初始概率 j:状态j 1:时刻1

 

Thus the probability of being in state j at initialization is dependent on that state's probability together with the probability of observing what we see at that time.

 

因此在初始状态下为状态j的概率是依赖于那个状态的概率和我们在那一时刻所得观察结果的产生的概率。???( Pr1( j | Y ) = Pr( Y | j )Pr( j ) / Pr( Y )

 

2c. Calculating 's at time, t (> 1)

We recall that a partial probability is calculated as : (部分概率的计算公式:)

t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

We can assume (recursively) that the first term of the product is available, and now consider the term Pr(all paths to state j at time t).

 

我们假设(递归的?)这个公式的第一部分是可以获得的,则考虑后面一部分Pr(all paths to state j at time t)

 

To calculate the probability of getting to a state through all paths, we can calculate the probability of each path to that state and sum them - for example,

 

为了计算通过所有路径达到一个状态的概率,我们可以计算每一条进入这个状态的路径的概率并把他们求和。

 

 

The number of paths needed to calculate  increases exponentially as the length of the observation sequence increases but the 's at time t-1 give the probability of reaching that state through all previous paths, and we can therefore define 's at time t in terms of those at time t-1 -i.e.,

 

计算α所需要的路径数量随着观察序列的长度成指数级别的增长但是在时刻t-1的α给出了到达那个状态的所有可能路径的概率,所以我们可以根据t-1时刻的α来定义t时刻的α:

 

 

Thus we calculate the probabilities as the product of the appropriate observation probability (that is, that state j provoked what is actually seen at time t+1) with the sum of probabilities of reaching that state at that time - this latter comes from the transition probabilities together with a from the preceding stage.

 

 

因此我们可以把部分概率当作是适当的观察概率(也就是状态j产生了观察的结果)和那个时刻到达该状态的概率和的乘积——后者来自与转移概率与上个阶段的α。

 

Notice that we have an expression to calculate  at time t+1 using only the partial probabilities at time t.

 

注意到我们计算t+1时刻的α仅用到了t时刻的部分概率。

 

We can now calculate the probability of an observation sequence given a HMM recursively - i.e. we use 's at t=1 to calculate 's at t=2; 's at t=2 to calculate 's at t=3; and so on until t = T. The probability of the sequence given the HMM is then the sum of the partial probabilities at time t = T

 

我们现在能够在给定的HMM之下递归的计算一个观察序列的概率了——也就是我们使用t=1时刻的α来计算t=2时刻的α;用t=2时刻的α来计算t=3时刻的α;这样直到t=T。在给定的HMM之下这个序列的概率就是t=T时刻的部分概率之和。

 

2d. Reduction of computational complexity

We can compare the computational complexity of calculating the probability of an observation sequence by exhaustive evaluation and by the recursive forward algorithm.

 

我们能够比较一下分别使用穷举估计和递归的前向算法的计算复杂度。

 

We have a sequence of T observations, O. We also have a Hidden Markov Model, l=(π,A,B), with n hidden states.

 

我们有一个长度为T的观察序列O,还有一个隐马尔科夫模型l=( π,A,B),其中有n个隐状态。

 

An exhaustive evaluation would involve computing for all possible execution sequences

 

穷举估值需要计算所有可能执行的序列。

 

 

 

the quantity

 

 

 

which sums the probability of observing what we do - note that the load here is exponential in T. Conversely, using the forward algorithm we can exploit knowledge of the previous time step to compute information about a new one - accordingly, the load will only be linear in T.

 

计算是指数级别的(公式?)。相反,使用前向算法我们可以从迁移时间步得到信息来计算一个新的概率——相应的,计算复杂度是线性的(T)。

 

3 Summary

Our aim is to find the probability of a sequence of observations given a HMM - (Pr (observations | λ).

 

我们的目的是找到,在给定的HMM之下发现一个特定观察序列的概率——(Pr (observations | λ)。

 

We reduce the complexity of calculating this probability by first calculating partial probabilities (α 's). These represent the probability of getting to a particular state, s, at time t.

 

我们通过先计算部分概率(α's)来减少计算的复杂度。部分概率表示在时刻t到达一个特定状态s的概率。

 

We then see that at time t = 1, the partial probabilities are calculated using the initial probabilities (from the vector) and Pr(observation | state) (from the confusion matrix); also, the partial probabilities at time t (> 1) can be calculated using the partial probabilities at time t-1.

 

接着我们看到在时刻t = 1, 部分概率可以使用初始概率(从π向量得到)和Pr(observation | state)(从混合矩阵得到);而且,在t时刻(t>1)的部分概率可以从t-1时刻的部分概率计算得到。

 

This definition of the problem is recursive, and the probability of the observation sequence is found by calculating the partial probabilities at time t = 1, 2, ..., T, and adding all 's at t = T.

 

这个问题的定义是递归的,通过计算时刻t = 1, 2, ..., T的部分概率并把t = T时刻所有的部分概率求和能够求出HMM产生该观察序列的概率。

 

Notice that computing the probability in this way is far less expensive than calculating the probabilities for all sequences and adding them.

注意到这样计算概率可以大大降低计算复杂度。

5.2 Forward algorithm definition

We use the forward algorithm to calculate the probability of a T long observation sequence;

 

我们使用前向算法来计算一个T长度观察序列的概率(在某个HMM之下);

 

where each of the y is one of the observable set. Intermediate probabilities (α 's) are calculated recursively by first calculating for all states at t=1.

 

上式中每一个Y是可观察集合的一员。中间概率首先求出t = 1时刻所有状态的中间概率之后可以递归求得。

 

 

Then for each time step, t = 2, ..., T, the partial probability is calculated for each state;

 

t = 2, ..., T时,在每一个时间步,求出每个状态的部分概率α;

 

 

that is, the product of the appropriate observation probability and the sum over all possible routes to that state, exploiting recursion by knowing these values already for the previous time step.

Finally the sum of all partial probabilities gives the probability of the observation, given the HMM, .

 

上式也就是,观察概率和所有到达该状态路径的概率之和的乘积。在给定HMM λ时,所有T时刻部分概率的和给出了这个观察序列的概率。

 

 

To recap, each partial probability (at time t > 2) is calculated from all the previous states.

 

回顾一下,每一个部分概率(t > 2)都可以从前一时间步的状态中求出。

 

Using the `weather' example, the diagram below shows the calculation for at t = 2 for the cloudy state. This is the product of the appropriate observation probability b and the sum of the previous partial probabilities multiplied by the transition probabilities .

 

以天气为例,下图中显示了如何计算t = 2 时刻的云状态的部分概率。

 

Summary

We use the forward algorithm to find the probability of an observed sequence given a HMM. It exploits recursion in the calculations to avoid the necessity for exhaustive calculation of all paths through the execution trellis.

 

我们使用前向算法来找到给定一个HMM时一个观察序列出现的概率。它在计算中使用了递归的方法来避免穷举计算所有路径的必要。

 

Given this algorithm, it is straightforward to determine which of a number of HMMs best describes a given observation sequence - the forward algorithm is evaluated for each, and that giving the highest probability selected.

 

有了这个算法,在给定一个观察序列的时候,决定那个HMM才是最匹配的就变得非常容易——使用前向算法来对没一个HMM进行估计,选择概率最高的HMM

 

例子:

http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/s3_pg3.html

posted @ 2014-03-24 11:02  帖子  阅读(339)  评论(0编辑  收藏  举报