随笔- 184 文章- 4 评论- 47 阅读- 70万

Configure the Stanford segmenter for NLTK

1

2

3

4

5

6

7

8

9

10

11

>>> from nltk.tokenize.stanford_segmenter import StanfordSegmenter
>>> segmenter = StanfordSegmenter(path_to_jar='stanford-segmenter-3.8.0.jar', path_to_sihan_corpora_dict='./data', path_to_model='./data/pku.gz', path_to_dict='./data/dict-chris6.ser.gz')
>>> sentence = u'这是斯坦福中文分词器测试'
>>> segmenter.segment(sentence)
u'\u8fd9 \u662f \u65af\u5766\u798f \u4e2d\u6587 \u5206\u8bcd\u5668 \u6d4b\u8bd5\n'
>>> segmenter.segment_file('test.simp.utf8')
u'\u9762\u5bf9 \u65b0 \u4e16\u7eaa \uff0c \u4e16\u754c \u5404\u56fd \u4eba\u6c11 \u7684 \u5171\u540c \u613f\u671b \u662f \uff1a \u7ee7\u7eed \u53d1\u5c55 \u4eba\u7c7b \u4ee5\u5f80 \u521b\u9020 \u7684 \u4e00\u5207 \u6587\u660e \u6210\u679c \uff0c \u514b\u670d 20 \u4e16\u7eaa \u56f0\u6270 \u7740 \u4eba\u7c7b \u7684 \u6218\u4e89 \u548c \u8d2b\u56f0 \u95ee\u9898 \uff0c \u63a8\u8fdb \u548c\u5e73 \u4e0e \u53d1\u5c55 \u7684 \u5d07\u9ad8 \u4e8b\u4e1a \uff0c \u521b\u9020 \u4e00\u4e2a \u7f8e\u597d \u7684 \u4e16\u754c \u3002\n'
>>> outfile = open('outfile', 'w')
>>> result = segmenter.segment(sentence)
>>> outfile.write(result.encode('UTF-8'))
>>> outfile.close()

posted on 2017-07-30 12:41 华东博客阅读(316) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

【推荐】还在用 ECharts 开发大屏？试试这款永久免费的开源 BI 工具！
【推荐】国内首个AI IDE，深度理解中文开发场景，立即下载体验Trae
【推荐】编程新体验，更懂你的AI，立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包，你的智能百科全书，全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell：AI 加持，快人一步

编辑推荐：
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列：如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列（二）：开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示

阅读排行：
· 阿里最新开源QwQ-32B，效果媲美deepseek-r1满血版，部署成本又又又降低了！
· 开源Multi-agent AI智能体框架aevatar.ai，欢迎大家贡献代码
· Manus重磅发布：全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后，我竟然真的恢复了删除的微信聊天记录！
· 没有Manus邀请码？试试免邀请码的MGX或者开源的OpenManus吧

<

2025年3月

>

日

一

二

三

四

五

六

23

24

25

26

27

28

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

5

昵称：华东博客
园龄： 11年8个月
粉丝： 71
关注： 0

随笔档案 (182)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

1. Re:聚类算法：ISODATA算法
博主你好，我使用了上述程序，导入了一个2*900的数据做聚类分析，出现了： Index in position 1 exceeds array bounds. Index must not excee...
--0971355
2. Re:调用sklearn遇到dlopen: cannot load any more object with static TLS问题解决
你好，遇到dlopen: cannot load any more object with static TLS的问题通常和Python的库加载有关。这个错误可能发生在某些库（如scikit-lear...
--华东博客
3. Re:调用sklearn遇到dlopen: cannot load any more object with static TLS问题解决
此外，我也尝试创建新环境去安装这个版本的sklearn包，不幸的是，它依旧出错。 python -m venv myenv source myenv/bin/activate pip install ...
--2017张晶晶
4. Re:调用sklearn遇到dlopen: cannot load any more object with static TLS问题解决
pip uninstall sklearn pip install scikit-learn==0.20.3 #降级sklearn的版本 python -c "import sklearn; prin...
--2017张晶晶
5. Re:Swin-Transformer代码工程进行物体检测
你好，我在跑swin的时候出现了Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.50901603962...
--，，1210