文件方式实现完整的英文词频统计实例

 

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

 

fo=open('text','w')
fo.write('''Beat It - Michael Jackson
They Told Him Don't You Ever Come Around Here
Don't Wanna See Your Face You Better Disappear
The Fire's In Their Eyes And Their Words Are Really Clear
So Beat It Just Beat It
You Better Run You Better Do What You Can
Don't Wanna See No Blood Don't Be A Macho Man
You Wanna Be Tough Better Do What You Can
So Beat It But You Wanna Be Bad
Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Just Beat It Beat It
Just Beat It Beat It
Just Beat It Beat It Just Beat It Beat It
They're Out To Get You Better Leave While You Can
Don't Wanna Be A Boy You Wanna Be A Man
You Wanna Stay Alive Better Do What You Can
So Beat It Just Beat It
You Have To Show Them That You're Really Not Scared
You're Playin' With Your Life This Ain't No Truth Or Dare
They'll Kick You Then They Beat You
Then They'll Tell You It's Fair
So Beat It But You Wanna Be Bad
Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Just Beat It Beat It Beat It Beat It Beat It...
Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Who Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Who's Right
Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Just Beat It Beat It Beat It Beat It
No One Wants To Be Defeated
Showin' How Funky Strong Is Your Fighter
It Doesn't Matter Who's Wrong Or Right
Just Beat It
-''')
fo.close()#写入待分析的字符串到text

fo=open('text','r')
s=fo.read()
fo.close#读入待分析的字符串

s=s.lower()#换小写
for i in ',.?!-':
    s=s.replace(i,' ')
s=s.replace('\n',' ')#替换符号
s=s.split(' ')#分解提取单词
print(s)

dict={}#建立一个字典
exc={'it','be','no','to','or',' '}#排除语法型词汇
keys=set(s)-exc#对字典赋键
for i in keys:
    dict[i]=s.count(i)#便利键后对字典赋值
print(dict)

wc=list(dict.items())#将字典转换成由元组组成的列表
wc.sort(key=lambda x:x[1],reverse=True)#对字典的值按从大到小排序
print(wc)

for i in range(20):
    print(wc[i])#输出前20个

 

posted on 2017-09-26 13:06  L文斌  阅读(189)  评论(0编辑  收藏  举报