1.Functions Defined for NLTK's Frequency Distributions

 

ExampleDescription
fdist = FreqDist(samples) create a frequency distribution containing the given samples
fdist[sample] += 1 increment the count for this sample
fdist['monstrous'] count of the number of times a given sample occurred
fdist.freq('monstrous') frequency of a given sample
fdist.N() total number of samples
fdist.most_common(n) the n most common samples and their frequencies
for sample in fdist: iterate over the samples
fdist.max() sample with the greatest count
fdist.tabulate() tabulate the frequency distribution
fdist.plot() graphical plot of the frequency distribution
fdist.plot(cumulative=True) cumulative plot of the frequency distribution
fdist1 |= fdist2 update fdist1 with counts from fdist2
fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2

 

2.Some Word Comparison Operators

 

 

FunctionMeaning
s.startswith(t) test if s starts with t
s.endswith(t) test if s ends with t
t in s test if t is a substring of s
s.islower() test if s contains cased characters and all are lowercase
s.isupper() test if s contains cased characters and all are uppercase
s.isalpha() test if s is non-empty and all characters in s are alphabetic
s.isalnum() test if s is non-empty and all characters in s are alphanumeric
s.isdigit() test if s is non-empty and all characters in s are digits
s.istitle() test if s contains cased characters and is titlecased (i.e. all words in s have initial capitals)

 

3.Basic Corpus Functionality defined in NLTK

ExampleDescription
fileids() the files of the corpus
fileids([categories]) the files of the corpus corresponding to these categories
categories() the categories of the corpus
categories([fileids]) the categories of the corpus corresponding to these files
raw() the raw content of the corpus
raw(fileids=[f1,f2,f3]) the raw content of the specified files
raw(categories=[c1,c2]) the raw content of the specified categories
words() the words of the whole corpus
words(fileids=[f1,f2,f3]) the words of the specified fileids
words(categories=[c1,c2]) the words of the specified categories
sents() the sentences of the whole corpus
sents(fileids=[f1,f2,f3]) the sentences of the specified fileids
sents(categories=[c1,c2]) the sentences of the specified categories
abspath(fileid) the location of the given file on disk
encoding(fileid) the encoding of the file (if known)
open(fileid) open a stream for reading the given corpus file
root if the path to the root of locally installed corpus
readme() the contents of the README file of the corpus

4.NLTK's Conditional Frequency Distributions

ExampleDescription
cfdist = ConditionalFreqDist(pairs) create a conditional frequency distribution from a list of pairs
cfdist.conditions() the conditions
cfdist[condition] the frequency distribution for this condition
cfdist[condition][sample] frequency for the given sample for this condition
cfdist.tabulate() tabulate the conditional frequency distribution
cfdist.tabulate(samples, conditions) tabulation limited to the specified samples and conditions
cfdist.plot() graphical plot of the conditional frequency distribution
cfdist.plot(samples, conditions) graphical plot limited to the specified samples and conditions
cfdist1 < cfdist2 test if samples in cfdist1 occur less frequently than in cfdist2
posted on 2019-04-26 15:55  不同的日子丶看不同的云  阅读(643)  评论(0编辑  收藏  举报