Alex He

...永远保持希望与激情...约会未来更强大的自己...

 

Codes in NLTK

NLTK includes the following software modules (~120k lines of Python code):

Corpus readers
interfaces to many corpora
Tokenizers
whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter
Stemmers
Porter, Lancaster, regexp
Taggers
regexp, n-gram, backoff, Brill, HMM, TnT
Chunkers
regexp, n-gram, named-entity
Parsers
recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ccg, ...
Semantic interpretation
untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface
WordNet
WordNet interface, lexical relations, similarity, interactive browser
Classifiers
decision tree, maximum entropy, naive Bayes, Weka interface, megam
Clusterers
expectation maximization, agglomerative, k-means
Metrics
accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation
Estimation
uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
Miscellaneous
unification, chatbots, many utilities
NLTK-Contrib (less mature)
categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement

Browse the source code: https://github.com/nltk/nltk/tree/master/nltk

Status: automatic testing of NLTK code with Jenkins: http://build.nltk.org/

posted on 2011-11-01 15:30  Alex木头  阅读(296)  评论(0编辑  收藏  举报

导航