Introduce probabilistic language modeling
let's thinking about the probability of sentence.
P(w1, w2, w3, w4)
= P(w1, w2, w3) * P(w4| w1, w2, w3)
= P(w1, w2) * P(w3| w1, w2) * P(w4| w1, w2, w3)
= P(w1) * P(w2| w1) * P(w3| w1, w2) * P(w4| w1, w2, w3)
Markov Assumption
P(w1, w2, w3, w4...wn) = P(wn) * P (wn| wk, wk+1...wn)
for example
P(w1, w2, w3, w4)
= P(w1, w2, w3) * P(w4|w3)
= P(w1, w2) * P(w3|w2) * P(w4| w3)
= P(w1) * P(w2| w1) * P(w3|w2) * P(w4|w3)
Then it is easier.
And there are N-grams, 3-grams, 4-grams.