随笔分类 -  搜索引擎

simhash--文本排重
摘要:转载自 https://github.com/julycoding/The-Art-Of-Programming-By-July/blob/master/ebook/zh/06.12.mdhttp://grunt1223.iteye.com/blog/964564simhash来历如果某一天,面试官... 阅读全文

posted @ 2014-06-11 10:08 雨渐渐 阅读(831) 评论(0) 推荐(0) 编辑

TF-IDF
摘要:参考源:http://www.ruanyifeng.com/blog/2013/03/tf-idf.html 写的很明了package com.data.text.tfidf;import java.io.BufferedReader;import java.io.File;import java.... 阅读全文

posted @ 2013-08-06 15:15 雨渐渐 阅读(371) 评论(0) 推荐(0) 编辑

lucene的基本查询及lucene3.0.1API
摘要:lucene 3.0.1 apihttp://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/overview-summary.htmlpackage com.tianditu.searchDemo;import java.io.File;import org.apache.lucene.analysis.KeywordAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.queryParser.QueryPar 阅读全文

posted @ 2013-02-28 16:50 雨渐渐 阅读(214) 评论(0) 推荐(0) 编辑

导航