Spacy的安装—操作指南

1.步骤一：下载spacy

pip install spacy -i https://pypi.tuna.tsinghua.edu.cn/simple 【使用清华源】
 
pip install spacy -i https://pypi.douban.com/simple 【或者使用豆瓣源】

注意下载好的spacy版本号，第二步需要用到。

2. 步骤二：下载对应Spacy-models

Github下载地址：https://github.com/explosion/spacy-models/releases

进入后，可以看到如下界面，并下载支持对应spacy版本的Spacy-models;

此处的 lg 为large的缩写，根据spacy官方文档的解读，

模型指示符如英文模型en,

后缀为 sm：en_core_web_sm-3.7.1 代表 small 模型；

后缀为 md: 代表 middle 模型；

后缀为 lg: 代表 large 模型；

后缀为 trf: 代表涵盖 transformer 模型；

【建议：使用en_core_web_lg 较大的模型】

理由：根据spacy官方文档解释:

The words “dog”, “cat” and “banana” are all pretty common in English, so they’re part of the pipeline’s vocabulary, and come with a vector. The word “afskfsd” on the other hand is a lot less common and out-of-vocabulary – so its vector representation consists of 300 dimensions of 0, which means it’s practically nonexistent. If your application will benefit from a large vocabulary with more vectors, you should consider using one of the larger pipeline packages or loading in a full vector package, for example, en_core_web_lg, which includes 685k unique vectors.

翻译成大白话，简言之就是en_core_web_lg, 包含很多的唯一向量，使得在计算tokens, doc等相似度的时候要更加客观真实，

如果是en_core_web_sm 则不具备这类向量，使得相似度(similarity)计算结果不可靠，偏差过大。

然后可以通过在命令行(anaconda prompt 管理员方式打开)中输入

python -m spacy download en_core_web_lg   # 远程下载，速度较慢

建议直接按照如下方式下载到本地(放到python解释器安装的目录，我这里是E:\Anaconda\installation)，经测试这样的下载速度较快。

3. 步骤三：本地安装下载好的spacy-model

进入安装目录：E:\Anaconda\installation

打开cmd命令，然后输入

pip install en_core_web_lg-3.7.1-py3-none-any.whl

4. 步骤四：检验

如果出现上述 Sucessfully installed en-core-web-lg-3.7.1. 表明至此，安装spacy全流程已经成功。

最后本地jupyter notebook执行如下命令进行测试：

import spacy

nlp = spacy.load("en_core_web_lg")

如果上两行代码运行都未报错，即表明安装spacy成功，模型可以正常调用。

码字不易，如果有帮到您，还请帮忙点赞下，让更多朋友可以少走弯路。谢谢！

参考链接：

【1】https://github.com/explosion/spacy-models/releases

【2】https://spacy.io/models/en#en_core_web_lg

【3】https://spacy.io/usage/spacy-101#annotations

posted @ 2023-12-02 15:04 AlphaGeek 阅读(3361) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称： AlphaGeek
园龄： 5年9个月
粉丝： 2
关注： 33

2025年3月

日

一

二

三

四

五

六

Running water never grows stale. So you just have to keep on flowing.

Spacy的安装—操作指南

公告