文件方式实现完整的英文词频统计实例

可以下载一长篇的英文小说,进行词频的分析。

1.读入待分析的字符串

a=open('book.txt','w')
a.write('''Dusty Brian has been wandering for three days and three nights, several days didn't eat enough.
  All of a sudden, the wall AD attracted him: "looking for a man, height 180 cm, shoe size 44, just with my son a weekend, I am willing to pay $500..."
  Fifteen minutes later, Brian get accommodation, a gaunt woman opened the door.
  After a simple communication, the woman out of the set of teddy bear doll clothing let Brian put on. She said, as long as the play with my son to go to bed, $500 is you. Woman took Brian into the door: "dear, see who come back?"
  A little boy ran out seven or eight years old. As the woman said, he is a mentally retarded son.
  The boy look at Brian: "dad, dad... are you come back?" Brian said: "yes!" Woman said: "a la hora Andrew, mom didn't cheat you. Dad's into a teddy bear look back!"
  Three people sat up after the meal, the little boy noisy going to play basketball in the yard. Brian nice dunk, let the boy see spent eyes: "dad, you are great!"
  It's time for bed. Brian to the little boy bathed and began to tell his story. Brian musical sound, let woman for a long time can't leave behind the door. Finally, the little boy fell asleep.
  Brian out of the bedroom, the woman has been boiled coffee. She gratefully said: "thanks." Brian red the face said: "tonight, I also had a good time! However, Andrew's dad?" "He is gone... six months ago, he was involved in the robberies, because rearguard action, by the police..." Woman began to SOB, "Andrew every day thinking of my father, I have to cheat, he said, dad will become a teddy bear home." Brian should go, woman took out a $500. Who knows, Brian refused.
  It turns out that Brian is a fugitive. Two months ago, he robbed a jewelry store, has been in hiding. Now, he decided to turn himself in. He doesn't want his wife and children like this pair of mother and child, every day immersed in sorrow.
  Brian left, the woman took a photo. Photograph, a policeman's smile is bright. Woman sobbed said: "my dear, just now, I almost did a foolish thing. I know, he is the murderer killed you. I know he was cornered, deliberately posted the advertisement to attract him. He is coming, but I give up, no poison in the coffee. Because, just that moment, I think, as is your teddy bear. I'm sorry, I will take good care of Andrew, one day, let him be your pride.''')
a.close()

2.分解提取单词 

print('读取book.txt文件,并将其转化为列表形式提取单词')
b=open('book.txt','r')
read=b.read()
b.close()
read=read.lower()
for i in ',.!?:':
    read=read.replace(i,' ')
word=read.split(' ')
print(word)

3.计数字典

4.排除语法型词汇

print('集合转换为字典排除语法型词汇并计数字典:')
ex={'','the','to','on','we','as','a','at','and','you','i','of'}
exx={'is','for','he','is','him','in'}
ke=set(word)-ex
key=ke-exx
print(key)

 5.排序

print('根据次数排序:')
dic={}
for w in key:
    dic[w]=word.count(w)
wc=list(dic.items())
wc.sort(key=lambda x:x[1],reverse=True)
print(wc)

6.输出TOP(20)

print('输出TOP(20):')
for i in range(20):
    print(wc[i])

7.对输出结果的简要说明。

这是一篇关于brian的小说故事……

 

posted on 2017-09-27 16:11  079刘洁婷  阅读(174)  评论(0编辑  收藏  举报

导航