Loading

CRF HMM 区别

As a side note: I would kindly ask you to maintain this (incomplete) list so that interested users have an easily accessible resource. The status quo still requires individuals to investigate a lot of papers and/or long technical reports for finding answers related to CRFs and HMMs.

In addition to the other, already good answers, I want to point out the distinctive features I find most noteworthy:

HMMs are generative models which try to model the joint distribution P(y,x). Therefore, such models try to model the distribution of the data P(x) which in turn might impose highly dependent features. These dependencies are sometimes undesirable (e.g. in NLP's POS tagging) and very often intractable to model/compute.
CRFs are discriminative models which model P(y|x). As such, they do not require to explicitly model P(x) and depending on the task, might therefore yield higher performance, in part because they need fewer parameters to be learned, e.g. in settings when generating samples is not desired. Discriminative models are often more suitable when complex and overlapping features are used (since modelling their distribution is often hard).
If you have such overlapping/complex features (as in POS tagging) you might want to consider CRFs since they can model these with their feature functions (keep in mind that you will usually have to feature-engineer these functions).
In general, CRFs are more powerful than HMMs due to their application of feature functions. For example, you can model functions like 1(yt=NN, xt=Smith, cap(xt−1)=true) whereas in (first-order) HMMs you use the Markov assumption, imposing a dependency only to the previous element. I therefore see CRFs as a generalization of HMMs.
Also note the difference between between linear and general CRFs. Linear CRFs, like HMMs, only impose dependencies on the previous element whereas with general CRFs you can impose dependencies to arbitrary elements (e.g. the first element is accessed in the very end of a sequence).
In practice, you will see linear CRFs more often than general CRFs since they usually allow easier inference. In general, CRF inference is often intractable, leaving you with the only tractable option of approximate inference).
Inference in linear CRFs is done with the Viterbi algorithm as in HMMs.
Both HMMs and linear CRFs are typically trained with Maximum Likelihood techniques such as gradient descent, Quasi-Newton methods or for HMMs with Expectation Maximization techniques (Baum-Welch algorithm). If the optimization problems are convex, these methods all yield the optimal parameter set.
According to [1], the optimization problem for learning the linear CRF parameters is convex if all nodes have exponential family distributions and are observed during training.
[1] Sutton, Charles; McCallum, Andrew (2010), "An Introduction to Conditional Random Fields"

posted @ 2021-11-20 22:59  ZXYFrank  阅读(43)  评论(0编辑  收藏  举报