python实现简易词频统计-源码

需求:给瓦尔登湖文章统计单词出现的频率

思路:首先读取文件并以空格分割得到列表,然后利用for循环遍历列表中的元素并把去掉列表元素中的符号,第三步去掉相同的元素,将列表转换为一个字典,最后按照键值对升序排序。

源码:

 1 #!/user/bin/env python
 2 #-*-coding:utf-8 -*-
 3 #Author: qinjiaxi
 4 import string
 5 path = "C:\\Users\\Administrator\\Desktop\\walden.txt"
 6 with open(path, 'r') as test:
 7     # words = test.read().split()
 8     # print(words)
 9     # for word in words:
10     #     print('{}-{} times'.format(word, words.count(word)))
11     words = [raw_word.strip(string.punctuation).lower() for raw_word in test.read().split()]#去掉每个单词的包含的标点符号并首字母变成小写
12     words_index = set(words)#去同
13     counts_dict = {index:words.count(index) for index in words_index}#字典推导式,键是每个单词,值是对应的单词在文件中出现的频率
14 for word in sorted(counts_dict, key = lambda x: counts_dict[x], reverse = True):#利用字典的值进行排序-降序
15     print('{}--{} times'.format(word, counts_dict[word]))

 

posted on 2018-08-14 16:06  秦朗的天空  阅读(1488)  评论(0编辑  收藏  举报

导航