档案智能搜索

 
  1. 配置拼音搜索:

    复制pinyin4j-2.5.0.jar、pinyinAnalyzer.jar这两个jar包到solr-8.5.0/server/solr-webapp/webapp/WEB-INF/lib目录下修改solr-8.5.0/server/solr/conf下的managed-schema

    在文件中增加如下内容:这里设置fieldType的name为 text_pinyin

    <fieldType name="text_pinyin" class="solr.TextField" positionIncrementGap="0">
    <analyzer type="index">
    <tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory"/>
    <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2"/>
    <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" maxGram="20" minGram="1"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory"/>
    <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2"/>
    <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" maxGram="20" minGram="1"/>
    </analyzer>
    </fieldType>
  2. 配置SuggestComponent:SuggestComponent 为用户提供查询术语的自动建议.该建议器的主要特点是:查找实现可插拔性,术语词典可插拔性,使您可以灵活选择词典实现, 分布式支持.

    第一步是添加一个搜索组件solrconfig.xml并告诉它使用 SuggestComponent。

    <searchComponent name="suggest" class="solr.SuggestComponent">
      <lst name="suggester">
        <str name="name">mySuggester</str>
        <str name="lookupImpl">FuzzyLookupFactory</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">cat</str>
        <str name="weightField">price</str>
        <str name="suggestAnalyzerFieldType">string</str>
        <str name="buildOnStartup">false</str>
      </lst>
    </searchComponent>

    添加搜索组件后,必须将请求处理程序添加到solrconfig.xml

    <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
      <lst name="defaults">
        <str name="suggest">true</str>
        <str name="suggest.count">10</str>
      </lst>
      <arr name="components">
        <str>suggest</str>
      </arr>
    </
    requestHandler>
  3. 配置拼写检查:

    SpellCheck 组件旨在根据其他类似术语提供内联查询建议.

    这些建议的基础可以是 Solr 中字段中的术语、外部创建的文本文件或其他 Lucene 索引中的字段。

    solrconfig.xml使用以下配置

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">name</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <int name="maxQueryLength">40</int>
        <float name="maxQueryFrequency">0.01</float>
        <float name="thresholdTokenFrequency">.01</float>
      </lst>
    </
    searchComponent>

    使用FileBasedSpellChecker外部文件作为拼写词典

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="classname">solr.FileBasedSpellChecker</str>
        <str name="name">file</str>
        <str name="sourceLocation">spellings.txt</str>
        <str name="characterEncoding">UTF-8</str>
        <str name="spellcheckIndexDir">./spellcheckerFile</str>
     </lst><
    /
    searchComponent>

     

posted @ 2024-04-09 10:47  zwbsoft  阅读(4)  评论(0编辑  收藏  举报