文件方式实现完整的英文词频统计实例
1.读入待分析的字符串
fo=open('test.txt','w') fo.write('''All the times that you rain on my parade And all the clubs you get in using my name You think you broke my heart Ohhh girl for goodness sake You think I'm crying Oh my ohhh, well I ain't! And I didn't wanna write a song 'Cause I didn't want anyone thinking I still care, I don't But, you still hit my phone up And baby I be moving on And I think you should be somethin' I don't wanna hold back Maybe you should know that My mama don't like you and she like's everyone''') fo=close()
2.分解提取单词
news=news.lower() for i in ',!'': news=news.replace(i,' ') words=sorry.split(' ') print(words)
3.计数字典
4.排除语法型词汇
wc=list(dic.items()) wc.sort(key=lambda x:x[1],reverse=True)#排序 for i in range(20): print(wc[i])
5.排序
6.输出TOP(20)
wc=list(dic.items()) wc.sort(key=lambda x:x[1],reverse=True)#排序 for i in range(20): print(wc[i])