Introduce probabilistic language modeling

let's thinking about the probability of sentence.

P(w1, w2, w3, w4)

= P(w1, w2, w3) * P(w4| w1, w2, w3)

= P(w1, w2) * P(w3| w1, w2) * P(w4| w1, w2, w3)

= P(w1) * P(w2| w1) * P(w3| w1, w2) * P(w4| w1, w2, w3)

 

Markov Assumption

 P(w1, w2, w3, w4...wn) = P(wn) * P (wn| wk, wk+1...wn)

for example

P(w1, w2, w3, w4)

= P(w1, w2, w3) * P(w4|w3)

= P(w1, w2) * P(w3|w2) * P(w4| w3)

= P(w1) * P(w2| w1) * P(w3|w2) * P(w4|w3)

Then it is easier.

And there are N-grams, 3-grams, 4-grams.

posted on 2013-04-18 20:11  MrMission  阅读(1661)  评论(0编辑  收藏  举报