WordNet简介
接口
from nltk.corpus import wordnet as wn #用nltk的接口
wn.synsets('dog') #synsets的查询(一个synset由lemma.POS.number组成,代表一个语义);注意synset和synsets 的区别,synsets是list,synset是一个object
>> [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),
Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
wn._synset_from_pos_and_offset('n',4543158) #用POS和offset序号来查询,返回一个synset
>> Synset('wagon.n.01')
wn.synset('dog.n.01').lemma_names() #返回一个synset的所有lemma_name
>> ['dog', 'domestic_dog', 'Canis_familiaris']
lemma和synset的关系:
Synset数量:
total:117659
noun:82115
verb:13767
adjective:18156
( ADJ, ADJ_SAT, ADV, NOUN, VERB = 'a', 's', 'r', 'n', 'v' )
Synset之间的关系:(对数在noun+verb+adj上测的)
- hypernyms, instance_hypernyms:89089 对
- hyponyms, instance_hyponyms (hyponyms和hyponyms相反)
- member_holonyms, substance_holonyms, part_holonyms :12293 797 9097 对
- member_meronyms, substance_meronyms, part_meronyms (holonyms和meronyms相反)
- attributes:1278 对
- entailments:408 对
- causes:220 对
- also_sees
- verb_groups
- similar_tos
参考:
http://www.nltk.org/howto/wordnet.html
http://www.nltk.org/index.html
http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html (nltk中WordNet源码)