(四)lucene之文本域加权

一、前言

  1.1  应用场景  

  • 有时在搜索的时候,会根据需要的不同,对不同的关键值或者不同的关键索引分配不同的权值,让权值高的内容更容易被用户搜索出来,而且排在前面。

    为索引域添加权是再创建索引之前,把索引域的权值设置好,这样,在进行搜索时,lucene会对文档进行评分,这个评分机制是跟权值有关的,而且其它情况相同时,权值跟评分是成正相关的。

  1.2  案例  

public class IndexTest2 {

    private String ids[] = { "1", "2", "3", "4" };
    private String authors[] = { "Jack", "Marry", "John", "Json" };
    private String positions[] = { "accounting", "technician", "salesperson", "boss" };
    private String titles[] = { "Java is a good language.", "Java is a cross platform language", "Java powerful",
            "You should learn java" };
    private String contents[] = { "If possible, use the same JRE major version at both index and search time.",
            "When upgrading to a different JRE major version, consider re-indexing. ",
            "Different JRE major versions may implement different versions of Unicode,",
            "For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6," };

    /**
     * 获取IndexWriter写索引实例对象
     * 
     * @return
     * @throws IOException
     * @throws Exception
     */
    public IndexWriter getWriter() throws IOException {

        IndexWriter writer = null;
        Directory dir = FSDirectory.open(Paths.get("E:\\lucene3"));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);

        writer = new IndexWriter(dir, conf);

        return writer;
    }

    /**
     * 生成索引
     * 
     * @throws IOException
     */
    @Test
    public void index() throws IOException {
        IndexWriter writer = getWriter();

        for (int i = 0; i < ids.length; i++) {
            Document doc = new Document();
            /**
             * Document.add方法中添加的如果是StringField,则不会分词,不管字符串有多长, 如果需要分词则使用TextField类
             */
            doc.add(new StringField("id", ids[i], Field.Store.YES));
            doc.add(new StringField("author", authors[i], Field.Store.YES));
            doc.add(new StringField("position", positions[i], Field.Store.YES));
            
            /**
             * 加权
             */
            TextField field=new TextField("title", titles[i], Field.Store.YES);
            if(positions[i].equals("boss")) {
                field.setBoost(2.0f);
            }
            doc.add(field);
            doc.add(new TextField("content", contents[i], Field.Store.NO));
            
            writer.addDocument(doc);
        }
        writer.close();

    }

    /**
     * 根据关键字搜索搜索
     * @throws Exception
     */
    @Test
    public void search() throws Exception {

        //directory 指向索引所在目录
        Directory directory = FSDirectory.open(Paths.get("E:\\lucene3"));
        IndexReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);
        //key为要搜索的内容
        String key="java";
        Term t=new Term("title",key);
        Query query=new TermQuery(t);
        TopDocs hits=searcher.search(query, 20);
        System.out.println("匹配 '"+key+"',总共查询到"+hits.totalHits+"个文档");
        for(ScoreDoc scoreDoc:hits.scoreDocs) {
            Document doc=searcher.doc(scoreDoc.doc);
            System.out.println(doc.get("author"));
        }
        reader.close();
    }

}
  • 注意代码中橙色加注的代码为加权操作
  • field.setBoost(2.0f); 该方法在lucene7.0以上是没有的,本文的lucene的版本为5.5.0
  •  lucene5.5.0 版本 只能使用luke5.5.0版本打开索引,否则打开luke报错

 

  •  结果:

 

   1.3  番外

  •  如果没有加权操作,即上述代码去掉下面内容:
field.setBoost(2.0f);
  • 结果:

  • 可见之前的加权操作是生效的。Json的position为“boss”,则其权重被调到了2.0f(小于1.0f则是降权)。

 

posted @ 2017-11-29 22:39  shyroke、  阅读(680)  评论(0编辑  收藏  举报
作者:shyroke 博客地址:http://www.cnblogs.com/shyroke/ 转载注明来源~