自然语言处理工具之spaCy
spaCy 介绍
spaCy 主要功能包括分词、词性标注、词干化、命名实体识别、名词短语提取等等。
spaCy 安装
# pip install spacy
# python -m spacy download en_core_web_sm
spaCy 基本操作
Github : SpacyDemo.ipynb
#!/usr/bin/env python
# coding: utf-8
# get_ipython().system(u'pip install spacy')
# pip install spaCy -i https://pypi.tuna.tsinghua.edu.cn/simple
# get_ipython().system(u'python -m spacy download en_core_web_sm')
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'This is a sentence.')
# #### 1.tokenize功能
for token in doc:
print(token)
# #### 2.词干化(Lemmatize)
for token in doc:
print(token, token.lemma_, token.lemma)
# #### 3.词性标注(POS Tagging)
for token in doc:
print(token, token.pos_, token.pos)
# #### 4.命名实体识别(NER)
for entity in doc.ents:
print(entity, entity.label_, entity.label)
# #### 5.名词短语提取
for nounc in doc.noun_chunks:
print(nounc)