HMM Part-of-Speech Tagging
HMM Part-of-Speech Tagging
A HMM is like this:
Q=q1q2....qN |
A set of N status |
A=a1a2...an1...ann |
A transition probability matrix A, each aij representing the probability of moving from state i to state j, s.t. ▽i |
O=o1o2...oT |
A sequence of T observations, each one drawn from a vocabulary V=v1,v2,...,Vv |
B = bi(ot) |
A sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation ot being generated from a state i |
qO, qF |
A special start state and end(final) state that are not associated with observations, together with transition probabilities a01a02...a0n out of the start state and a1Fa2F...anF into the end state |
Viterbi: like a DP algorithm and a minimum edit distance algorithm
Extending the HMM to Trigrams:
See the article <<TnT-A Statistical Part-of-Speech Tagger>>
note
The HMM taggers uses trained on hand-tagged data. A tagger using EM algorithm can starts with a untagged data. But even a small amout of training data worked better than EM. The EM-trained ‘’pure HMM’’ tagger is probably best suited to cases for which no training data is available.