摘要:
import jieba import matplotlib.pyplot as plt from wordcloud import WordCloud from scipy.misc import imread with open('lagou.txt',encoding='utf-8') as f: tmp_line=f.read() jieba_cut=jieba.cut... 阅读全文
摘要:
import jieba import sys import jieba.analyse from optparse import OptionParser textrank = jieba.analyse.textrank with open('lagoujobdatails.txt',encoding='utf-8') as f: tmp_line=f.read() jie... 阅读全文
摘要:
其中"爬虫‘,’spanclass','岗位职责‘是垃圾数据,数据清理时没清理完 阅读全文
摘要:
from nltk.corpus import PlaintextCorpusReader import nltk corpus_root=r"C:\Users\sun\AppData\Roaming\nltk_data\corpora\jieba" file_pattern=r".*/.*\.txt" ptb=PlaintextCorpusReader(corpus_root,file_pa... 阅读全文
摘要:
1 import jieba 2 with open('lagoujobdatails.txt',encoding='utf-8') as f: 3 tmp_line=f.read() 4 jieba_cut=jieba.cut(tmp_line) 5 ans=' '.join(jieba_cut) 6 with open('jieba5.txt','w',enc... 阅读全文
摘要:
1 words2=re.sub("[\s+\.\!\/_,$%^*(+\"\'\n]+|[+——;!,”。《》,。:“?、~@#¥%……&*()1234567①②③④)]+", "", words) 阅读全文
摘要:
文件不能同名(import re re.py) 阅读全文