文件方式实现完整的英文词频统计实例

1.读入待分析的字符串

fo=open('text.txt','w')
fo.write('''Waking up I see that everything is okay 
The first time in my life and now it's so great
Slowing down I look around and I am so amazed
I think about the little things that make life great
I wouldn't change a thing about it
This is the best feeling
This innocence is brilliant, I hope that it will stay
This moment is perfect, please don't go away
I need you now
And I'll hold on to it, don't you let it pass you by
I found a place so safe, not a single tear
The first time in my life and now it's so clear
Feel calm, I belong, I'm so happy here
It's so strong and now I let myself be sincere
I wouldn't change a thing about it
This is the best feeling
This innocence is brilliant, I hope that it will stay
This moment is perfect, please don't go away
I need you now'''
)
fo=open('text.txt','r')
day=fo.read()

运行结果为:

2.分解提取单词 

for i in ',.\"?':
    day=day.replace(i,' ')

words=day.split(' ')
print(words)

运行结果为:

 

3.计数字典

dict={}
keys=set(words)
print(keys)
for i in keys:
    
    dict[i]=words.count(i)
print(dict)

运行结果为:

4.排除语法型词汇

exc={'i','sincere','to','brilliant','the','innocence','of','so','and','were','','on','really'}

dict={}
keys=set(words)
keys=keys-exc
print(keys)
for i in keys:
    
    dict[i]=words.count(i)
print(dict)

运行结果为:

5.排序

wc=list(dict.items())
wc.sort(key=lambda x:x[1],reverse=True)
print(wc)

运行结果为:

6.输出TOP(20)

for i in range(20):
    print(wc[i])

运行结果为:

posted @ 2017-09-26 23:17  047连薇娜  阅读(115)  评论(0编辑  收藏  举报