英文词频统计

综合练习:英文词频统计

  1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10
    复制代码
    word = '''
    Lately, I've been, I've been losing sleep
    Dreaming about the things that we could be
    But baby, I've been, I've been praying hard,
    Said, no more counting dollars
    We'll be counting stars, yeah we'll be counting stars
    I see this life like a swinging vine
    Swing my heart across the line
    And my face is flashing signs
    Seek it out and you shall find
    Old, but I'm not that old
    Young, but I'm not that bold
    I don't think the world is sold
    I'm just doing what we're told
    I feel something so right
    Doing the wrong thing
    I feel something so wrong
    Doing the right thing
    I could lie, coudn't I, could lie
    Everything that kills me makes me feel alive
    Lately, I've been, I've been losing sleep
    Dreaming about the things that we could be
    But baby, I've been, I've been praying hard,
    Said, no more counting dollars
    We'll be counting stars
    '''
    #标点替换为空格
    symbol = [",", ".", "!", "?", "'", ":", "-"]
    #无意义的单词
    
    words = ['t','ve','ll','m']
    
    new_art = word
    for i in range(len(symbol)):
        new_art = new_art.replace(symbol[i],' ') #把文章的标点符号替换
    
    new_art = new_art.lower() #改成小写
    art_list = new_art.split() #以空格将字符串分成单词列表
    
    dic = dict(zip())
    for i in art_list:
        dic[i] = new_art.count(i) #用字典记录单词和其出现次数
    for i in words:
        if(dic.get(i)!=None): #如果为冠词之类的无意义的词,将其舍弃
            dic.pop(i)
    
    new_dic = sorted(dic.items(),key=lambda x:x[1],reverse = True)
    
    for i in range(10):
        print(new_dic[i]) #取出现频率最高的10个单词
    复制代码

posted @   lawliet9  阅读(210)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
点击右上角即可分享
微信分享提示