摘要: # -*- coding=utf-8 -*-import feedparserimport reimport collectionsimport mathimport sysreload(sys)sys.setdefaultencoding("utf8") def info_entropy(words): result = 0 total = sum([val for _, val in words.iteritems()]) for word, cnt in words.iteritems(): p = float(cnt) / total ... 阅读全文
posted @ 2013-09-23 09:10 阿黄的苹果 阅读(583) 评论(0) 推荐(1) 编辑
摘要: 基于统计抽词方法,原理点这里。代码见这里C#,代码比较乱。另有python版本见这里using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.IO;namespace seg_word{ class Program { static void Main(string[] args) { if (args.Length "); return; } ... 阅读全文
posted @ 2013-09-23 09:04 阿黄的苹果 阅读(463) 评论(0) 推荐(1) 编辑