python实现简易词频统计-源码
需求:给瓦尔登湖文章统计单词出现的频率
思路:首先读取文件并以空格分割得到列表,然后利用for循环遍历列表中的元素并把去掉列表元素中的符号,第三步去掉相同的元素,将列表转换为一个字典,最后按照键值对升序排序。
源码:
1 #!/user/bin/env python 2 #-*-coding:utf-8 -*- 3 #Author: qinjiaxi 4 import string 5 path = "C:\\Users\\Administrator\\Desktop\\walden.txt" 6 with open(path, 'r') as test: 7 # words = test.read().split() 8 # print(words) 9 # for word in words: 10 # print('{}-{} times'.format(word, words.count(word))) 11 words = [raw_word.strip(string.punctuation).lower() for raw_word in test.read().split()]#去掉每个单词的包含的标点符号并首字母变成小写 12 words_index = set(words)#去同 13 counts_dict = {index:words.count(index) for index in words_index}#字典推导式,键是每个单词,值是对应的单词在文件中出现的频率 14 for word in sorted(counts_dict, key = lambda x: counts_dict[x], reverse = True):#利用字典的值进行排序-降序 15 print('{}--{} times'.format(word, counts_dict[word]))
转载请注明出处
每天努力多一点,忧愁少一点,快乐多一点
--->by晴朗sky