Python自然语言处理学习笔记(58):深入阅读
6.9 Further Reading 深入阅读
Please consult http://www.nltk.org/ for further materials on this chapter and on how to install external machine learning packages, such as Weka, Mallet, TADM, and MEGAM. For more examples of classification and machine learning with NLTK, please see the classification HOWTOs at http://www.nltk.org/howto.
For a general introduction to machine learning, we recommend (Alpaydin, 2004). For a more mathematically intense introduction to the theory of machine learning, see (Hastie, Tibshirani, & Friedman, 2009). Excellent books on using machine learning techniques for NLP include (Abney, 2008), (Daelemans & Bosch, 2005), (Feldman & Sanger, 2007), (Segaran, 2007), (Weiss et al, 2004). For more on smoothing techniques for language problems, see (Manning & Schutze, 1999). For more on sequence modeling, and especially hidden Markov models, see (Manning & Schutze, 1999) or (Jurafsky & Martin, 2008). Chapter 13 of (Manning, Raghavan, & Schutze, 2008) discusses the use of naive Bayes for classifying texts.
Many of the machine learning algorithms discussed in this chapter are numerically intensive, and as a result, they will run slowly when coded naively in Python. For information on increasing the efficiency of numerically intensive algorithms in Python, see (Kiusalaas, 2005).
The classification techniques described in this chapter can be applied to a very wide variety of problems. For example, (Agirre & Edmonds, 2007) uses classifiers to perform word-sense disambiguation; and (Melamed, 2001) uses classifiers to create parallel texts. Recent textbooks that cover text classification include (Manning, Raghavan, & Schutze, 2008) and (Croft, Metzler, & Strohman, 2009).
Much of the current research in the application of machine learning techniques to NLP problems is driven by government-sponsored "challenges," where a set of research organizations are all provided with the same development corpus, and asked to build a system; and the resulting systems are compared based on a reserved test set. Examples of these challenge competitions include CoNLL Shared Tasks, the ACE competitions, the Recognizing Textual Entailment competitions, and the AQUAINT competitions. Consult http://www.nltk.org/ for a list of pointers to the webpages for these challenges.