【python】获取网页中中文内容并分词
摘要:
1 # -*- coding: utf-8 -*- 2 3 import urllib2 4 import re 5 import time 6 import jieba 7 8 9 url="http://www.baidu.com"10 html=urllib2.urlopen(url).read()11 html=unicode(html,'utf-8')12 word=re.findall(ur"[\u4e00-\u9fa5]+",html)13 14 s=""15 for w in word:16 s+=w1 阅读全文
posted @ 2014-01-15 17:25 colipso 阅读(6824) 评论(0) 推荐(0) 编辑