1.读入待分析的字符串
1 fo=open('test.txt','w') 2 fo.write('''You gotta go and get angry at all of my honesty 3 You know I try but I don’t do too well with apologies 4 I hope I don’t run out of time, could someone call a referee? 5 Cause I just need one more shot at forgiveness 6 I know you know that I made those mistakes maybe once or twice 7 By once or twice I mean maybe a couple a hundred times 8 So let me, oh let me redeem, oh redeem, oh myself tonight 9 Cause I just need one more shot at second chances 10 Yeah, is it too late now to say sorry? 11 Cause I’m missing more than just your body 12 Is it too late now to say sorry? 13 Yeah I know that I let you down 14 Is it too late to say I’m sorry now? 15 I’m sorry, yeah 16 Sorry, yeah 17 Sorry 18 Yeah I know that I let you down 19 Is it too late to say sorry now? 20 I’ll take every single piece of the blame if you want me to 21 But you know that there is no innocent one in this game for two 22 I’ll go, I’ll go and then you go, you go out and spill the truth 23 Can we both say the words and forget this? 24 Is it too late now to say sorry? 25 Cause I’m missing more than just your body 26 Is it too late now to say sorry? 27 Yeah I know that I let you down 28 Is it too late to say I’m sorry now? 29 I’m not just trying to get you back on me 30 Cause I’m missing more than just your body 31 Is it too late now to say sorry? 32 Yeah I know that I let you down 33 Is it too late to say sorry now? 34 I’m sorry, yeah 35 Sorry, oh 36 Sorry 37 Yeah I know that I let you down 38 Is it too late to say sorry now? 39 I’m sorry, yeah 40 Sorry, oh 41 Sorry 42 Yeah I know that I let you down 43 Is it too late to say sorry now?''') 44 fo.close()
fo=open('test.txt','r')
sorry=fo.read()
2.分解提取单词
sorry=sorry.lower() for i in '?,': sorry=sorry.replace(i,' ') words=sorry.split(' ')#以空格分隔 print(words)
3.计数字典
4.排除语法型词汇
dic={}#定义一个空字典 words.sort()#排列切片好的单词 d=set(words)#集合d的元素就是切片好的单词 for i in d: dic[i]=words.count(i)#循环插入值为空的主键i
5.排序
wc=list(dic.items()) wc.sort(key=lambda x:x[1],reverse=True)#排序
6.输出TOP(20)
for i in range(20): print(wc[i])
运行结果: