综合练习:词频统计
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | f = open ( "peng.txt" , "r" , encoding = 'utf-8' ) song = f.read() f.close() sep = ''',.?—!"''' exclude = { 'the' , 'and' , 'i' , 'in' , "i'm" , 'a ', ' of ', ' an ', ' on ', ' to ', ' with'} for c in sep: song = song.replace(c, ' ' ) swl = song.lower().split() swd = {} sws = set (swl) - exclude for w in sws: swd[w] = swl.count(w) fl = list (swd.items()) fl.sort(key = lambda x: x[ 1 ], reverse = True ) for i in fl: print (i) f = open ( "result.txt" , "w" ) for i in range ( 20 ): f.write(fl[i][ 0 ] + " " + str (fl[i][ 1 ]) + "\n" ) f.close() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | import jieba f = open ( 'weicheng.txt' , 'r' , encoding = 'utf-8' ) text = f.read() f.close() p = ''',。‘’“”:;()!?、 ''' a = { '的' , '\n' , '\u3000' , '曰' , '之' , '不' , '人' , '一' , '大' , '马' , '来' , '有' , '于' , '下' , '此' , } for i in p: text = text.replace(i, '') print ( list (jieba.cut(text))) t = list (jieba.lcut(text)) print (t) count = {} wl = list ( set (t) - a) print (wl) for i in range ( 0 , len (wl)): count[wl[i]] = text.count( str (wl[i])) cl = list (count.items()) cl.sort(key = lambda x: x[ 1 ], reverse = True ) print (cl) f = open ( 'wcCount.txt' , 'a' ) for i in range ( 20 ): f.write(cl[i][ 0 ] + ':' + str (cl[i][ 1 ]) + '\n' ) f.close() |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 智能桌面机器人:用.NET IoT库控制舵机并多方法播放表情
· Linux glibc自带哈希表的用例及性能测试
· 深入理解 Mybatis 分库分表执行原理
· 如何打造一个高并发系统?
· .NET Core GC压缩(compact_phase)底层原理浅谈
· 新年开篇:在本地部署DeepSeek大模型实现联网增强的AI应用
· DeepSeek火爆全网,官网宕机?本地部署一个随便玩「LLM探索」
· Janus Pro:DeepSeek 开源革新,多模态 AI 的未来
· 上周热点回顾(1.20-1.26)
· 【译】.NET 升级助手现在支持升级到集中式包管理