Relationship and difference among HMM, MEMM, CRF and MRF

由于工作语言是英语，所以这里用英语总结一下对机器学习中的几个模型：隐马尔可夫模型，最大熵隐马，条件随机场和马尔可夫随机场相互之间的关系和区别的体会。

     MaxEnt: Maximum-Entropy model
     HMM: Hidden Markov Model
     MEMM: Maximum-Entropy Markov model
     CRF: Conditional Random Field
     MRF: Markov Random Field

     The keywords that will be related within this article:
     conditional independence, generative model, discriminative model, undirected graph model, directed graph model, factor graph

     1. Backgroud on graph models and generative/discriminative models

     1.1) Directed vs. Undirected:

     If the model tries to model the probability dependency between the observations and states/outputs, it is directed graph.
     However, if the relation between observations and states are described by some arbitrary functions/potential functions/energy functions, it is undirected graph, because the model can not judge what is the reason and what is the result.
     Someone calls undirected graph expresses soft constraints among observations and outputs (you can NOT judge who "observation or output" decides whom "observation or output").

     Convertint directed graph to undirected graph: factoring by introducing factor node between observation and output node!!!

     NOTE: factor graph is always undirected! The observations and outputs are linked (I would like using regularized/functioned/factorized) by the factor nodes.

     1.2) Generative vs. Discriminative:

     If the model tries to model the joint probability distribution between the observations and states/outputs, it is generative model;
     however, if the model tries to model the conditional probability distribution, it is discriminative model.

   2. How to classify HMM, MEMM, CRF and MRF

     2.1)

     The difference and relations among these 4 models can be shown in the following table:

                                      |     generative model      |       discriminative model
     ---------------------------------------------------------------------------------------------
     directed graph model           |      HMM                      |         MEMM
     ---------------------------------------------------------------------------------------------
     undirected graph model       |           ??               |         MaxEnt, CRF, MRF

     2.2) HMM vs. MEMM

     So HMM and MEMM are directed graph models: HMM models "state decides observation" and this is why it is called "generative"; MEMM models "observation decides state", the other direction comparing with HMM.
     HMM is generative model because it models the joint probability p(x, y) given x is the observation and y is the state/output.
     But given a x and p(x, y), to predict p(y|x), we need as well p(x).
     MEMM is discriminative model because it models directly the conditional probability p(y|x).

     So MEMM dose not need p(x) and more over, it can use other features of the observation and even the neighbor observations, which is impossible in HMM.
     Or we can say that HMM has "very strict independence assumptions" on the observations, in another sentence, HMM dose NOT use the features of observations, single observation or the neighbor observations?

     It seems thae MEMM is much powerful and simple than HMM and CRF, but it has "label bias problem". Note that HMM does NOT have "label bias problem", because HMM is generative model.

     Now the question is: what is "label bias problem" in MEMM.
     Let's start from the format of MEMM: p(y1, y2, ..., yn|x1, x2, ..., xn) = ....p(y2|y1)p(y3|y2)p(y4|y3)..., the conditional probability p(yi|yi-1) is affected by the number of output goings of p(yi-1).
     So generally, the small number of outputs takes advantages than the big number.
     (why? because per-state normalization)

     2.3) CRF vs. MRF

     MRF is also called Markov Network, for the name, we know that the nodes in the network cannot separated clearly by observation nodes and output nodes. However, CRF does: the output is conditioned on an input sequence X (discriminate feeling, right? Yes).

     2.4) Maximum Entropy
     MaxEnt is the discriminative model, corresponding the naive bayes as the generative model.
     Naive bayes models the single observation and output by the joint probability, with the assumption that conditional dependence among all the outputs.
     However, Maximum Entropy models the conditional dependency between the observation and outputs. It uses the maximum entrophy function, while not use the pure conditional probablity to measure the output dependence given the observation.

    3. Generative and Undirected Graph Model

     Is there any model that is undirected graph and generative model?
     Yes, Restricted Boltzmann machine, and neutral network, for example.

posted on 2017-01-23 06:52 言龙阅读(1249) 评论(0) 编辑收藏举报