摘要:先撇开其他的不谈,我们先看看下面几段代码,他们均能实现“实时”检索。注意:1.笔者目前采用的lucene版本为3.5.2.为了检查是否“实时”,采用了numDocs是否发生变化进行简易判断。3.请正确理解这里的提到的“实时”,并与“准实时”予以区分。方式一:indexWriter每次都commit,indexReader每次都open(dir)public void nrtOpenDir() { try { Document doc = new Document(); Field f = new Field("f", "test", Store.YES,
阅读全文
02 2012 档案
摘要:1.对于“关注度排序问题”的记录在查阅资料是发现:ExternalFileField is handy for cases where you want to update a particular field in many documents more often than you want to update the rest of the documents. For example, suppose you have some kind of document rank based on number of views . You might want to update the r
阅读全文
摘要:This story is part of the DZone Solr-Lucene Zone, which is brought to you in collaboration with the Solr/Lucene Community.Visit the Solr-Lucene Zonefor additional tutorials, videos, opinions, and other resources on this topic.Let’s talk about spellcheckers. A spellchecker, as you may know, is that d
阅读全文
摘要:My initial interest in spell checking algorithms started when I had to fix some bugs in the spell checking code at work over a year ago. We useJazzy, a Java implementation ofGNU Aspell, as our spell checking library. Jazzy uses a combination ofMetaphoneandLevenshtein distance(aka Edit distance) to m
阅读全文
摘要:In this tutorial I would like to talk a bit aboutApache Lucene. Lucene is an open-source project that provides Java-based indexing and search technology. Using its API, it is easy to implementfull-text search. I will deal with theLucene Java version, but bear in mind that there is also a .NET port a
阅读全文
摘要:转自:http://www.javacodegeeks.com/2010/05/did-you-mean-feature-lucene-spell.htmlGoogle's "Did you mean" featureAfter making anintroduction to Lucene in a previous post, now it is time to take it up a notch and create a more sophisticated application. You are most surely familiar with Goo
阅读全文
摘要:转载地址:http://knowlspace.wordpress.com/2011/06/15/different-ways-to-implement-autosuggest-using-solr/There are currently five techniques that can be used to create an auto-suggest functionality:1- The TermsComponent 2- Facet Prefixes 3- The new Suggester component 4- Edge N-Grams 5- Wildcard queries.T
阅读全文
摘要:spellChecker是用来对用户输入的“检索内容”进行校正,例如百度上搜索“麻辣将”,他的提示如下图所示:我们首先借用lucene简单实现该功能。本文内容如下(简单实现、原理简介、现有问题)lucene中spellchecker简述lucene 的扩展包中包含了spellchecker,利用它我们可以方便的实现拼写检查的功能,但是检查的效果(推荐的准确程度)需要开发者进行调整、优化。lucene实现“拼写检查”的步骤步骤1:建立spellchecker所需的索引文件spellchecker也需要借助lucene的索引实现的,只不过其采用了特殊的分词方式和相关度计算方式。建立spellche
阅读全文
摘要:solrJ可以看成是solr的java版客户端,提供基本的索引维护、检索等功能。solrJ和solr服务端有两种“沟通”方式:第一,利用http进行通信。第二,直接访问solrCore(solr配置文件、索引文件等),不需要http通信(without http)。solrJ的SolrServer类结构如下图所示:EmbeddedSolrServer是第二种方式,CommonsHttpSolrServer,LBHttpSolrServer是第一种方式,其中LBHttpSolrServer是在CommonsHttpSolrServer的基础上提供了负载均衡的功能(load Balanced)。在
阅读全文
摘要:Distributed Searching基础在单机的情况下,当索引越来越大,检索就显得力不从心了。solr容许我们将索引切开(多个适当大小的索引,称之为shards),并分布到多台“服务器”上。solr通过一台服务器(single shard)接受检索任务,并将其分发到各个shards上,最后合并检索结果。详细信息参见:http://wiki.apache.org/solr/DistributedSearch1.通过shards参数执行Distributed Searching我们可以检索请求中加入shards参数执行Distributed Searching,其格式为:host:port/
阅读全文
摘要:solr-searching过程分析(一)——searching过程粗略梳理下午看了一会solr的启动过程,往细的看相当繁琐。换个头绪先看看solr的searching过程。1.拦截请求,解析请求并构建相应的handler。发送检索请求,例如:http://localhost:8983/solr3.5/core2/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on首先他将被SolrDispatchFilter拦截。doFilter(ServletRequest request, ServletRespons
阅读全文
摘要:有时为了满足特有的需求,可能需要对现有的分词器进行调整,为了完成对这些特有分词器的配置,需要完成以下步骤。(基本思路和配置IKAnalyzer一致)1.开发特有的分词器,并继承solr中的类BaseTokenizerFactory。例如笔者调整了一个分词器为OBOLowercaseTokenizer类层次结构如下图:然后编写BOBSolr类,继承BaseTokenizerFactory。public class BOBSolr extends BaseTokenizerFactory { @Override public Tokenizer create(Reader input) { re.
阅读全文
摘要:http://www.cnblogs.com/huangfox/archive/2012/02/08/2342881.html一文中介绍的怎么将solr发布到eclipse中,现在就在原有的基础上将IKAnalyzer加入。1.下载IKAnalyzer的源码,将其复制到solr3.5项目中,如下图:2.在schema.xml配置IKAnalyzer<!-- IKAnalyzer3.2.8 中文分词--> <fieldType name="text" class="solr.TextField"> <analyzer type
阅读全文
摘要:将solr发布到eclipse当中,可以调试、修改,比较主动,也是自主学习solr的有效途径。环境:eclipse:Eclipse Java EE IDE for Web Developers.Tomcat 6.0.35部署步骤:1.新建【Dynamic Web project】项目2.删除WebContent下面的所有内容,并将apache-solr-3.5.0.war(下载dist文件夹中)所有内容复制到WebContent下面。(红叉除了碍眼,也没有什么!)3.指定solrHome在web.xml中添加以下内容<env-entry> <env-entry-name>
阅读全文
摘要:【引自黑马王子的博客】Java中的位操作指定包括:~ 按位非(NOT)& 按位与(AND)| 按位或(OR)^ 按位异或(XOR)>> 右移>>> 无符号右移<<左移前面几个都非常简单,主要是移位操作比较容易出错.首先要搞清楚参与运算的数的位数,如int的是32位。long的是64位。如int i = 1;i的二进制原码表示为:00000000000000000000000000000001long l = 1;l的二进制原码表示为:00000000000000000000000000000000000000000000000000000000
阅读全文