英文词频统计

  1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10

 

        

f = open('whr.txt','r')
music = f.read()
# f.close()
# 将所有大写转换为小写#
music = music.lower()
print('全部转换为小写的结果:' + music + '\n')
# 将所有分隔符(,.?!)替换为空格
p = 0
symbol = list(''',.?!’:"“”-%$''')
for p in symbol:
    music = music.replace(p, ' ')
    print('分隔符替换为空格的结果:' + music + '\n')
split = music.split()
word = {}
for i in split:
    count = music.count(i)
    word[i] = count
words = '''
a an the in on to at and of is was are were i he she you your they us their our it or for be too do no
that s so as but it's don't
'''
prep = words.split()
for i in prep:
    if i in word.keys():
        del (word[i])
word = sorted(word.items(), key=lambda item: item[1], reverse=True)
for i in range(10):
    print(word[i])

  

  

  

posted on 2018-03-26 10:45  140-吴华锐  阅读(148)  评论(1编辑  收藏  举报