apache-solr-4.0.0-ALPHA中文分析器IKAnalyzer4.0
最近看solr出了4.0ALPHA版本,管理界面比3.x漂亮,而且在和mmseg和lucene的SmartChineseAnalyzer、StandardAnalyzer、CJKAnalyzer比较之后,感觉IKAnalyzer比较好用!在配置IKAnalyzer的时候发现有些接口已经改变了,所以根据启动时出现的错误进行修改,所以有了4.0版本,已经测试可用!
如下为IKAnalyzer的新目录结构
IKAnalyzer4.0的jar包 ==>下载
解压后把IKAnalyzer4.0.jar,IKAnalyzer.cfg,stopword.dic放到solr目录下的lib中
编辑solrconfig.xml添加
<lib dir="http://www.cnblogs.com/dist/" regex="apache-solr-analysis-extras-\d.*\.jar" /> <lib dir="http://www.cnblogs.com/contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
编辑schema.xml添加
<!-- IKAnalyzer --> <fieldType name="text_ik" class="solr.TextField" > <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
顺便也贴下SmartChineseAnalyzer的配置
<!-- Chinese --> <fieldType name="text_zh-cn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/> <filter class="solr.SmartChineseWordTokenFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PositionFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_zh-cn.txt" enablePositionIncrements="true"/> </analyzer> </fieldType>
如果有什么问题请指出,跟大家一起学习进步!