Python 词频统计

小说《Walden》单词词频统计

lambda表达式的形式： y=lambda x:x+1 其中x相当于输入的接口，x+1相当于返回值；

eg: g=lambda x: x**2

g(2)=4

 1 import re #正则表达式操作  此处用到re.split
 2 f=open('E:/Python培训/python_01/Walden.txt','r')
 3 #txt=f.read()    #将文本内容读到txt中
 4 '''txt=f.readline()
 5 f.close()'''
 6 txt=f.read()
 7 txt=txt.lower()
 8 txt1=re.sub('[().,?*\']','',txt)  #substitute 替换的意思 讲这些符号都换成''什么都没有
 9 words=txt1.split()  #字符分割
10 words_index=set(words)    #集合set 元素不重复
11 dic={word:words.count(word) for word in words_index}
12 res=sorted(dic.items(),key=lambda x:x[1],reverse=True)
13 print(res)

posted on 2018-08-11 10:28 墨殇浅尘阅读(404) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

墨殇浅尘

Python 词频统计

导航

公告