摘要:
# -*- coding=utf-8 -*-import feedparserimport reimport collectionsimport mathimport sysreload(sys)sys.setdefaultencoding("utf8") def info_entropy(words): result = 0 total = sum([val for _, val in words.iteritems()]) for word, cnt in words.iteritems(): p = float(cnt) / total ... 阅读全文
摘要:
基于统计抽词方法,原理点这里。代码见这里C#,代码比较乱。另有python版本见这里using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.IO;namespace seg_word{ class Program { static void Main(string[] args) { if (args.Length "); return; } ... 阅读全文