eastcowboy

 

Smoothing of Language Model

本文主要让读者快速学习Language Model

语言模型最初在语音识别领域中应用,然后逐渐将起扩展到各个领域OCR、手写识别、统计机器翻译、拼写校正、信息检索等各个领域。

基本Language Model 主要涉及

(1)    LM的定义.

(2)    N-gram作为LM的主要工具.下面所涉及都指N-gram

(3)    LM链式规则.

(4)    LM MLE(Maximum Likelihood Estimation).

(5)    LM 评估(Cross-Entropy, Perplexity).

(6)    针对LM的数据稀疏,提出的各类平滑方法.

(7)    平滑方法分类
-退化法(Backing-off Models)
   • Katz smoothing
   • Kneser-Ney smoothing
-线性插值法
(Linear Interpolated Models)
   • Additive smoothing
   • Absolute smoothing
   • Jelinek-Mercer smoothing
   • Witten-Bell smoothing
   • Interpolated Kneser-Ney smoothing

(8)    其他平滑方法
Church-Gale Smoothing
Bayesian Smoothing

(9)    Good-Turing估计法

Reference:

n       Stanley F. Chen and Joshua Goodman (1998), An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98, Computer Science Group, Harvard University , 1998. (推荐)

n       Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on pages 334--342, 2001.

n       Christopher D. Manning and Hinrich Schutze. Foudations of Statistical Natural Language Processing [chapter 7 Statistical Inference: n-gram models over Sparse Data]. The MIT Press.

n       Berlin Chen (2003). Statistical Language Modeling for Speech Recognition. PPT slide

n       统计语言建模中的平滑技术(中科院计算所软件室LCC).(推荐)

n       刘挺. 语言模型:[网上讲义,http://ir.hit.edu.cn/download/NLP_3.pdf]

n       http://dingo.sbs.arizona.edu/~sandiway/ling538/(内含计算机语言学大量讲义)

n       Statistical Methods in Computational Linguistics(内含计算机语言学大量讲义).

n       The State of the art in Language Modeling  powerpoint slides

(附注:推荐先看《统计语言建模中的平滑技术》这篇ppt,pptAn Empirical Study of Smoothing Techniques for Language Modeling这篇论文为主要线索,然后覆盖了很多别的论文的内容.将这片pptAn Empirical Study of Smoothing Techniques for Language Modeling这篇论文穿插看会省很多时间。如果遇到具体不明白的地方,可以参考别的paper或者ppt)

LM软件工具

CMU Statistical Toolkit documentation

posted on 2009-08-18 21:08  eastcowboy  阅读(619)  评论(0编辑  收藏  举报

导航