HMM学习(6)-Viterbi Algorithm
原文:http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html
Wangben at mcrc, hit, Harbin 2007.12.22
6 Viterbi Algorithm
6.1 Finding most probable sequence of hidden states
We often wish to take a particular HMM, and determine from an observation sequence the most likely sequence of underlying hidden states that might have generated it.
我们希望使用一个特定的HMM,然后从观察序列中找出产生该序列的最有可能的隐状态序列。
1. Exhaustive search for a solution
We can use a picture of the execution trellis to visualise the relationship between states and observations.
我们可以用下面的格状图(execution trellis)来表明状态(隐状态)和观察结果的关系。
We can find the most probable sequence of hidden states by listing all possible sequences of hidden states and finding the probability of the observed sequence for each of the combinations.
我们可以通过列出所有可能的隐状态序列并且找到每种组合产生该观察序列的概率的方法来找到最有可能的隐状态序列。
The most probable sequence of hidden states is that combination that maximises
最有可能的隐状态序列满足以下式子最大化的条件
Pr(observed sequence | hidden state combination).
For example, for the observation sequence in the trellis shown, the most probable sequence of hidden states is the sequence that maximises :
例如,在上面的格状图中,最有可能的隐状态序列满足下式最大化:
Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy), . . . . Pr(dry,damp,soggy | rainy,rainy,rainy)
This approach is viable, but to find the most probable sequence by exhaustively calculating each combination is computationally expensive. As with the forward algorithm, we can use the time invariance of the probabilities to reduce the complexity of the calculation.
这个方法是可行的,但要通过穷举的方法计算每一种组合来找到最可能的序列,计算复杂度会非常高。就如前向算法,我们能利用概率随时间恒定的性质来降低计算的复杂度。
6.2 Reducing complexity using recursion
We will consider recursively finding the most probable sequence of hidden states given an observation sequence and a HMM. We will first define the partial probability δ, which is the probability of reaching a particular intermediate state in the trellis. We then show how these partial probabilities are calculated at t=1 and at t=n (> 1).
我们将考虑使用递归的方法来找到最有可能的隐状态序列(给定HMM和观察序列)。我们首先定义一个部分概率δ,它是在格状图中到达某个特定中间状态的概率。然后我们来看如何计算在t = 1 和 t = n时的部分概率。
These partial probabilities differ from those calculated in the forward algorithm since they represent the probability of the most probable path to a state at time t, and not a total.
这些部分概率与前向算法中的不同,因为他们代表了在t时刻到达一个状态最可能路径的概率而不是全部。(局部最优能代表全局?符合贪心选择性?)(使用viterbi来代替前向算法来进行evaluation会怎么样?)
2a. Partial probabilities (δ's) and partial best paths
Consider the trellis below showing the states and first order transitions for the observation sequence dry, damp, soggy;
下面的格状图中对应于观察序列(干,潮,湿)显示了各个状态以及相应的一阶转换。
For each intermediate and terminating state in the trellis there is a most probable path to that state. So, for example, each of the three states at t = 3 will have a most probable path to it, perhaps like this;
对于每一个中间状态和结束状态都有一个最有可能的路径到达该状态。所以,在t = 3 时刻的每个状态都有一个最可能的路径到达它,可能是如下的情况:
We will call these paths partial best paths. Each of these partial best paths has an associated probability, the partial probability or δ. Unlike the partial probabilities in the forward algorithm, δis the probability of the one (most probable) path to the state.
我们将把这些路径叫做局部最优路径。每一条这样的局部最优路径都有一个概率,局部概率或是δ。不像前向算法中的局部概率,δ是到达这个状态最可能的一条路径的概率。
Thus (i,t) is the maximum probability of all sequences ending at state i at time t, and the partial best path is the sequence which achieves this maximal probability. Such a probability (and partial path) exists for each possible value of i and t.
所以δ( i , t )是t时刻所有结束状态在i的序列的最大概率,并且局部最优路径就是产生该最大概率的序列。对于每一个i,t的可能值对都存在着这样一个概率(和局部路径)。
In particular, each state at time t = T will have a partial probability and a partial best path. We find the overall best path by choosing the state with the maximum partial probability and choosing its partial best path.
2b. Calculating δ's at time t = 1
We calculate the partial probabilities as the most probable route to our current position (given particular knowledge such as observation and probabilities of the previous state). When t = 1 the most probable path to a state does not sensibly exist; however we use the probability of being in that state given t = 1 and the observable state k1 ; i.e.
(给定些特定的信息如观察结果和上一状态的概率)我们把局部概率当作是到达我们当前位置的最有可能的路径来计算。当t = 1时到达该状态最可能是不存在的;但是我们可以使用在t = 1时刻在某状态的并且观察结果为k1的概率,也就是:
- as in the forward algorithm, this quantity is compounded by the appropriate observation probability.
——就像在前向算法中,这个值结合了相应的观察概率。
2c. Calculating 's at time t ( > 1 )
We now show that the partial probabilities δat time t can be calculated in terms of the 's at time t-1.
我们现在来看在t时刻的局部概率δ是能够根据t-1时刻的局部概率求出的。(贪心选择性)
Consider the trellis below : (考虑下图)
We consider calculating the most probable path to the state X at time t; this path to X will have to pass through one of the states A, B or C at time (t-1).
我们考虑计算t时刻到达状态X的最可能路径;这条路径将必须通过状态A,B或者C中的一个。
Therefore the most probable path to X will be one of
因此到达X最有可能的路径将会是下面3条中的一条:
|
(sequence of states), . . ., A, X |
|
(sequence of states), . . ., B, X |
or |
(sequence of states), . . ., C, X |
We want to find the path ending AX, BX or CX which has the maximum probability.
我们想要找到在分别以AX,BX和CX结尾的三条路径中哪一条有最大的概率。
Recall that the Markov assumption says that the probability of a state occurring given a previous state sequence depends only on the previous n states. In particular, with a first order Markov assumption, the probability of X occurring after a sequence depends only on the previous state, i.e.
回忆一下马尔科夫假设,给定先前的状态序列,一个状态发生的概率仅仅依赖于前n个状态。特殊的,在一阶马尔科夫假设下,X发生的概率仅仅依赖雨其上一个状态也就是:
Pr (most probable path to A) . Pr (X | A) . Pr (observation | X)
Following this, the most probable path ending AX will be the most probable path to A followed by X. Similarly, the probability of this path will be
按照这种方法,以AX结束有着最大概率的路径也就是到达A具有最大概率的路径。
Pr (most probable path to A) . Pr (X | A) . Pr (observation | X)
So, the probability of the most probable path to X is :
所以最有可能到达X的路径的概率是:
where the first term is given by at t-1, the second by the transition probabilities and the third by the observation probabilities.
其中第一部分有t – 1时刻的局部概率给出,第二部分是转移概率,第三部分是观察概率。
Generalizing the above expression, the probability of the partial best path to a state i at time t when the observation kt is seen, is :
把上式进行推广,(观察结果是kt)在t时刻到达状态i的局部最优路径的概率是:
Here, we are assuming knowledge of the previous state, using the transition probabilites and multiplying by the appropriate observation probability. We then select the maximum such.
这里我们假定上一个的状态,使用转移概率并且乘上相应的观察概率。然后我们选择一个概率最大的路径。
2d. Back pointers, φ's
Consider the trellis 考虑这个格状图:
At each intermediate and end state we know the partial probability, δ(i,t). However the aim is to find the most probable sequence of states through the trellis given an observation sequence - therefore we need some way of remembering the partial best paths through the trellis.
在每一个中间状态和结束状态我们知道了局部概率,δ( i, j )。然而我们的目的是在给定观察序列时在图中找到最可能的状态序列——因此我们需要一些方法在图中来记住局部最优路径。
Recall that to calculate the partial probability, δat time t we only need the δ's for time t-1. Having calculated this partial probability, it is thus possible to record which preceding state was the one to generateδ(i,t) - that is, in what state the system must have been at time t-1 if it is to arrive optimally at state i at time t. This recording (remembering) is done by holding for each state a back pointer which points to the predecessor that optimally provokes the current state.
Formally, we can write
回忆如何计算局部概率,在t时刻的δ我们仅仅需要t – 1时刻的δ。在计算出该局部概率之后,就可以记录下那一个前驱的状态产生了δ( i, t )——也就是,如果想要在t时刻用最优的路径到达状态i,这个系统在t – 1时刻的状态是哪个。这个记录通过为每一个状态保存一个后向的指针φ来实现,这个指针指向能够最优的产生当前状态的前驱。
Here, the argmax operator selects the index j which maximises the bracketed expression.
这里argmax操作符选择了最大化括号中表达式的索引j。
Notice that this expression is calculated from the 's of the preceding time step and the transition probabilities, and does not include the observation probability (unlike the calculation of the 's themselves). This is because we want these φ's to answer the question `If I am here, by what route is it most likely I arrived?' - This question relates to the hidden states, and therefore confusing factors due to the observations can be overlooked.
注意到这个表达式仅仅用到上一时间步的局部概率和转移概率(和计算局部概率不同)。这是因为我们想要这些φ回答这样一个问题“如果我在这,我最有可能是从那条路径上过来的”——这个问题关系到隐状态,而对于观察结果我们可以不考虑。
2e. Advantages of the approach
Using the Viterbi algorithm to decode an observation sequence carries two important advantages:
使用viterbi算法去理解一个观察序列有两点重要的好处
1. There is a reduction in computational complexity by using the recursion - this argument is exactly analogous to that used in justifying the forward algorithm.
利用递归(迭代?)来降低时间复杂度——这个讨论与评价前向算法时完全一样。
2. The Viterbi algorithm has the very useful property of providing the best interpretation given the entire context of the observations. An alternative to it might be, for example, to decide on the execution sequence
在给出了整个观察的上下文之后,viterbi算法有着非常有用的性质来提供最佳解释。另一种方法是,例如,在执行序列上决定。(?)
where
Here, decisions are taken about a likely interpretation in a `left-to-right' manner, with an interpretaion being guessed given an interpretation of the preceding stage (with initialisation from the vector).
这里对于可能解释的选取是按照从左到右的方式,在给出了前面阶段的一个解释之后猜测出当期的解释。
3. Continued...
This approach, in the event of a noise garble half way through the sequence, will wander away from the correct answer.
这种方法在序列中存在连续的噪声干扰的时候,将会远离正确答案。
Conversely, the Viterbi algorithm will look at the whole sequence before deciding on the most likely final state, and then `backtracking' through the pointers to indicate how it might have arisen. This is very useful in `reading through' isolated noise garbles, which are very common in live data.
相反的,viterbi算法在决定最可能的最后状态之前将会考虑整个序列,并且通过φ指针反向找出产生这个结果的中间步奏。这对于存在单个噪声(非连续出现)的时候非常有用。?
3. Section Summary
The Viterbi algorithm provides a computationally efficient way of analyzing observations of HMMs to recapture the most likely underlying state sequence. It exploits recursion to reduce computational load, and uses the context of the entire sequence to make judgments, thereby allowing good analysis of noise. (噪声存在的时候?)
In use, the algorithm proceeds through an execution trellis calculating a partial probability for each cell, together with a back-pointer indicating how that cell could most probably be reached. On completion, the most likely final state is taken as correct, and the path to it traced back to t=1 via the back pointers.