文本分析笔记

Python 文本分析笔记

中文停用词处理

自行下载 shotwords.txt，代码如下：

def stopwordslist(filepath): 
    stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()] 
    return stopwords 
 
 
# 对句子进行分词 
def seg_sentence(sentence): 
    sentence_seged = jieba.cut(sentence.strip()) 
    stopwords = stopwordslist('/root/stopwords.txt')  # 这里加载停用词的路径 
    outstr = '' 
    for word in sentence_seged: 
        if word not in stopwords: 
            if word != '\t': 
                outstr += word 
                outstr += " " 
    return outstr

posted @ 2019-08-14 18:54 翡翠嫩白菜阅读(237) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

文本分析 笔记

Python 文本分析 笔记

中文停用词处理

公告

文本分析笔记

Python 文本分析笔记