ntlk_data安装小结

《Python自然语言处理》用nltk.download()的方法安装书中所用语料库数据，不太好使。一是部分网友反映的下载很慢很慢，二是下载链接，无论书上、NLTK官网（http://nltk.org/nltk_data/）、网友博客（http://www.cnblogs.com/ToDoToTry/archive/2013/01/18/2865941.html）提供的，都已年久失修，试试改了改旧链接找到了NLTK Corpus，应该比书中所用的数据集更多，列出如下方便大家参考:

1) NLTK Corpora: http://www.nltk.org/nltk_data/. 语料库没有打包在一起，需要哪个下哪个。

2) Github: https://github.com/nltk/nltk_data/tree/gh-pages。整个一个340M多的Zip，解压出corpora文件夹，放到nltk默认nltk_data所在文件夹c:\nltk_data下就都OK啦。如要换默认位置要在环境变量里做修改，具体看官网http://www.nltk.org/data.html。

按照官网http://www.nltk.org/data.html和书上的例子各做了test，验证nltk_data可用：

>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>>

发表于 2014-06-04 17:16 xiao dan feng 阅读(1611) 评论(0) 编辑收藏举报

刷新页面返回顶部

ntlk_data安装小结

公告