nutch-1.7-二次开发-Content中增加编码
摘要:
1 识别nutch-1.7的编码,完成以前1.2是在 org.apache.nutch.parse.html.HtmlParser EncodingDetector detector = new EncodingDetector(conf); detector.autoDetectClues(content, true); detector.addClue(sniffCharacterEncoding(contentInOctets), "sniffed"); String enco... 阅读全文
posted @ 2013-08-12 15:39 雨渐渐 阅读(785) 评论(0) 推荐(0) 编辑