python+NLTK 自然语言学习处理:环境搭建
首先在http://nltk.org/install.html去下载相关的程序。需要用到的有python,numpy,pandas, matplotlib. 当安装好所有的程序之后运行nltk.download()进行词料库的下载。如下图。选择All packages。 然后点击下载
这里需要注意的是Download Directory 可以自行修改。但是最后的一级目录必须是nltk_data
比如可以修改成D:\nltk_data
这个下载器下载很慢,经常会遇到下载不了的时候。这个时候有两种方法可以选择:
1 直接到 http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml 去下载对应的包
2第二种方法:网上也有其他人打包的库:比如下面的这个链接就可以下载
这里需要注意的是自行下载的包必须要放在nltk_data文件夹里面。否则导入的时候会出现失败:比如我下载到NLTK的文件夹里面,在导入的时候报如下错误。系统
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
from nltk.book import *
File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\book.py", line 20, in <module>
text1 = Text(gutenberg.words('melville-moby_dick.txt'))
File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 116, in __getattr__
self.__load()
File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 81, in __load
except LookupError: raise e
LookupError:
**********************************************************************
Resource u'corpora/gutenberg' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- 'C:\\Users\\Administrator/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'E:\\python2.7.11\\nltk_data'
- 'E:\\python2.7.11\\lib\\nltk_data'
- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'
系统在下面的几个路径去找,由于没有nltk_data的文件夹,所以找不到相关的文件
- 'C:\\Users\\Administrator/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'E:\\python2.7.11\\nltk_data'
- 'E:\\python2.7.11\\lib\\nltk_data'
- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'
将文件目录名改成如下后就可以了
而在linux环境下,搜索的路径为如下,我们需要将nltk的数据放置在如下的目录中
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/nltk_data'
- '/usr/lib/nltk_data'
我是放在了/usr目录下面
我们再导入就成功了
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
我们来测试一把:下面这个命令的意义在于在text1文本中查找monstrous出现的地方
>>> text1.concordance('monstrous')
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
环境已经搭建好了,后面就开始正式的NLTK学习了
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架