Alex He

...永远保持希望与激情...约会未来更强大的自己...

Codes in NLTK

NLTK includes the following software modules (~120k lines of Python code):

Corpus readers: interfaces to many corpora

Tokenizers: whitespace, newline, blankline, word, treebank, sexpr, regexp, Punkt sentence segmenter

Stemmers: Porter, Lancaster, regexp

Taggers: regexp, n-gram, backoff, Brill, HMM, TnT

Chunkers: regexp, n-gram, named-entity

Parsers: recursive descent, shift-reduce, chart, feature-based, probabilistic, dependency, ccg, ...

Semantic interpretation: untyped lambda calculus, first-order models, DRT, glue semantics, hole semantics, parser interface

WordNet: WordNet interface, lexical relations, similarity, interactive browser

Classifiers: decision tree, maximum entropy, naive Bayes, Weka interface, megam

Clusterers: expectation maximization, agglomerative, k-means

Metrics: accuracy, precision, recall, windowdiff, distance metrics, inter-annotator agreement coefficients, word association measures, rank correlation

Estimation: uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell

Miscellaneous: unification, chatbots, many utilities

NLTK-Contrib (less mature): categorial grammar (Lambek, CCG), finite-state automata, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT interface, inter-annotator agreement

Browse the source code: https://github.com/nltk/nltk/tree/master/nltk

Status: automatic testing of NLTK code with Jenkins: http://build.nltk.org/

posted on 2011-11-01 15:30 Alex木头阅读(296) 评论(0) 编辑收藏举报

刷新页面返回顶部

导航

公告