文件方式实现完整的英文词频统计实例
- 读入待分析的字符串
- 分解提取单词
- 计数字典
- 排除语法型词汇
- 排序
- 输出TOP(20)
fo=open('test.txt','w') >>> fo.write('''Twinkle Twinkle Little Star (Declan's Prayer) - Declan Galbraith Twinkle twinkle little star, How I wonder what you are, Up above the world so high, Like a diamond in the sky, Star light, Star bright, The first star I see tonight, I wish I may, I wish I might, Have the wish I wish tonight, Twinkle twinkle little star, How I wonder what you are, I have so many wishes to make, But most of all is what I state, So just wonder, That I've been dreaming of, I wish that I can have owe her enough, I wish I may, I wish I might, Have the dream I dream tonight, Ooo baby Twinkle twinkle little star, How I wonder what you are, I want a girl who'll be all mine, And wants to say that I'm her guy, Someone's sweet that's for sure, I want to be the one shes looking for, I wish I may, I wish I might, Have the girl I wish tonight, Ooo baby Twinkle twinkle little star, How I wonder what you are, Up above the world so high, Like a diamond in the sky, Star light, Star bright, The first star I see tonight, I wish I may, I wish I might, Have the wish I wish tonight.''') 1138 >>> fo.close() >>> fr=open('test.txt','r') >>> fr.read()
fo=open('test.txt','r') song=fo.read() exc={'the','in','to','a','of','and','on','what','that'} song=song.lower() for i in '''.,-\n\t\u3000'()"''': song=song.replace(i,'') words=song.split(' ') dic={} keys=set(words) keys=keys-exc for w in keys: dic[w]=words.count(w) wc = list(dic.items()) wc.sort(key=lambda x:x[1],reverse=True) print(wc) for w in range(20): print(wc[w])