Lucene3.6.2包介绍,第一个Lucene案例介绍,查看索引信息的工具lukeall介绍,Luke查看的索引库内容,索引查找过程

<h1>
    <span class="link_title"><a href="/tototuzuoquan/article/details/41794169">
    2.Lucene3.6.2包介绍,第一个Lucene案例介绍,查看索引信息的工具lukeall介绍,Luke查看的索引库内容,索引查找过程        
       
    </a>
    </span>

     
</h1>
    <div class="article_manage clearfix">
    <div class="article_r">
        <span class="link_postdate">2014-12-07 23:39</span>
        <span class="link_view" title="阅读次数">2623人阅读</span>
        <span class="link_comments" title="评论次数"> <a href="#comments" onclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_pinglun'])">评论</a>(0)</span>
        <span class="link_collect tracking-ad" data-mod="popu_171"> <a href="javascript:void(0);" onclick="javascript:collectArticle('2.Lucene3.6.2%e5%8c%85%e4%bb%8b%e7%bb%8d%ef%bc%8c%e7%ac%ac%e4%b8%80%e4%b8%aaLucene%e6%a1%88%e4%be%8b%e4%bb%8b%e7%bb%8d%ef%bc%8c%e6%9f%a5%e7%9c%8b%e7%b4%a2%e5%bc%95%e4%bf%a1%e6%81%af%e7%9a%84%e5%b7%a5%e5%85%b7lukeall%e4%bb%8b%e7%bb%8d%ef%bc%8cLuke%e6%9f%a5%e7%9c%8b%e7%9a%84%e7%b4%a2%e5%bc%95%e5%ba%93%e5%86%85%e5%ae%b9%ef%bc%8c%e7%b4%a2%e5%bc%95%e6%9f%a5%e6%89%be%e8%bf%87%e7%a8%8b','41794169');return false;" title="收藏" target="_blank">收藏</a></span>
         <span class="link_report"> <a href="#report" onclick="javascript:report(41794169,2);return false;" title="举报">举报</a></span>

    </div>
</div>    <style type="text/css">        
        .embody{
            padding:10px 10px 10px;
            margin:0 -20px;
            border-bottom:solid 1px #ededed;                
        }
        .embody_b{
            margin:0 ;
            padding:10px 0;
        }
        .embody .embody_t,.embody .embody_c{
            display: inline-block;
            margin-right:10px;
        }
        .embody_t{
            font-size: 12px;
            color:#999;
        }
        .embody_c{
            font-size: 12px;
        }
        .embody_c img,.embody_c em{
            display: inline-block;
            vertical-align: middle;               
        }
         .embody_c img{               
            width:30px;
            height:30px;
        }
        .embody_c em{
            margin: 0 20px 0 10px;
            color:#333;
            font-style: normal;
        }
</style>
<script type="text/javascript">
    $(function () {
        try
        {
            var lib = eval("("+$("#lib").attr("value")+")");
            var html = "";
            if (lib.err == 0) {
                $.each(lib.data, function (i) {
                    var obj = lib.data[i];
                    //html += '<img src="' + obj.logo + '"/>' + obj.name + "&nbsp;&nbsp;";
                    html += ' <a href="' + obj.url + '" target="_blank">';
                    html += ' <img src="' + obj.logo + '">';
                    html += ' <em><b>' + obj.name + '</b></em>';
                    html += ' </a>';
                });
                if (html != "") {
                    setTimeout(function () {
                        $("#lib").html(html);                      
                        $("#embody").show();
                    }, 100);
                }
            }      
        } catch (err)
        { }
        
    });
</script>
  <div class="category clearfix">
    <div class="category_l">
       <img src="http://static.blog.csdn.net/images/category_icon.jpg">
        <span>分类:</span>
    </div>
    <div class="category_r">
                <label onclick="GetCategoryArticles('1305140','toto1297488504','top','41794169');">
                    <span onclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_fenlei']);">爬虫<em>(8)</em></span>
                  <img class="arrow-down" src="http://static.blog.csdn.net/images/arrow_triangle _down.jpg" style="display:inline;">
                  <img class="arrow-up" src="http://static.blog.csdn.net/images/arrow_triangle_up.jpg" style="display:none;">
                    <div class="subItem">
                        <div class="subItem_t"><a href="http://blog.csdn.net/toto1297488504/article/category/1305140" target="_blank">作者同类文章</a><i class="J_close">X</i></div>
                        <ul class="subItem_l" id="top_1305140">                            
                        </ul>
                    </div>
                </label>                    
    </div>
</div>
    <div class="bog_copyright">         
        <p class="copyright_p">版权声明:本文为博主原创文章,未经博主允许不得转载。</p>
    </div>


1  Lucen目录介绍

2  lucene-core-3.6.2.jarlucene开发核心jar

   contrib  目录存放,包含一些扩展jar

3  案例

建立第一个Lucene项目:lucene3_day1

   1)需要先将数据转换成为Document对象,每一个数据信息转换成为Field(String name, String value, Field.Store store, Field.Indexindex)

   2)指定索引库位置Directorydirectory = FSDirectory.open(new File("index"));// 当前Index目录

   3)分词器Analyzeranalyzer = new StandardAnalyzer(Version.LUCENE_36);

   4)写入索引:

IndexWriterConfig indexWriterConfig = new IndexWriterConfig(

            Version.LUCENE_36, analyzer);

IndexWriter indexWriter = new IndexWriter(directory,indexWriterConfig);

     

//document数据写入索引库

indexWriter.addDocument(document);

//关闭索引

indexWriter.close();

案例编写:

案例目录:

Article.java

package cn.toto.lucene.quickstart;

 

public class Article {

   private int id;

   private String title;

   private String content;

   /**

    * @return the id

    */

   public int getId() {

      return id;

   }

   /**

    * @param id the id to set

    */

   public void setId(int id) {

      this.id = id;

   }

   /**

    * @return the title

    */

   public String getTitle() {

      return title;

   }

   /**

    * @param title the title to set

    */

   public void setTitle(String title) {

      this.title = title;

   }

   /**

    * @return the content

    */

   public String getContent() {

      return content;

   }

   /**

    * @param content the content to set

    */

   public void setContent(String content) {

      this.content = content;

   }

}

package cn.toto.lucene.quickstart;

 

import java.io.File;

 

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.Field.Index;

import org.apache.lucene.document.Field.Store;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

import org.junit.Test;

 

/**

 * @brief LuceneTest.java 测试Lucene的案例

 * @attention

 * @author toto-pc

 * @date 2014-12-7

 * @note begin modify by 涂作权 2014/12/07 null

 */

public class LuceneTest {

   @Test

   public void buildIndex() throws Exception {

      Article article = new Article();

      article.setId(100);

      article.setTitle("Lucene快速入门");

      article.setContent("Lucene是提供了一个简单却强大的应用程式接口,"

            + "能够做全文检索索引和搜寻,在Java开发环境里Lucene" +

            "一个成熟的免费的开放源代码工具。");

 

      // 将索引数据转换成为Document对象(Lucene要求)

      Document document = new Document();

      document.add(new Field("id", // 字段

            article.getId() + "", Store.YES, // 是否建立索引

            Index.ANALYZED // 表示使用分词索引

      ));

      document.add(new Field("title", article.getTitle(), Store.YES,Index.ANALYZED));

      document.add(new Field("content", article.getContent(), Store.YES, Index.ANALYZED));

 

      // 建立索引库

      // 索引目录位置

      Directory directory = FSDirectory.open(new File("index"));// 当前Index目录

      // 分词器

      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);

      // 写入索引

      IndexWriterConfig indexWriterConfig = new IndexWriterConfig(

            Version.LUCENE_36, analyzer);

      IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

 

      // document数据写入索引库

      indexWriter.addDocument(document);

      // 关闭索引

      indexWriter.close();

   }

}

运行单元测试后的结果:

运行后index目录下的结果:

4  可以通过luke工具查看索引库中内容(它是一个jar包)

下载网址:http://code.google.com/p/luke/

打开方式:

如果用这种方式打不可以,可以用命令的方式打开文件,进入这个目录,选中Shift+鼠标右键—>此处打开命令窗口—>输入命令:java -jar lukeall-3.5.0.jar

工具的截图如下:

点击OK后的结果:

通过overview可以查看到索引信息,通过Document可以查看文档对象信息

5  查找

和上面的并集的query代码如下:

@Test

public void searchIndex() throws Exception

{

   //建立Query对象--根据标题

   String queryString = "Lucene";

   //第一个参数,版本号

   //第二个参数,字段

   //第三个参数,分词器

   Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);

   QueryParser queryParser = new QueryParser(Version.LUCENE_36,"title",analyzer);

    Query query = queryParser.parse(queryString);

      

    //根据Query查找

    // 索引目录位置

   Directory directory = FSDirectory.open(new File("index"));

    IndexSearcher indexSearcher = new IndexSearcher(IndexReader.open(directory));

   //查询满足结果的前100条数据

   TopDocs topDocs = indexSearcher.search(query, 100);

    System.out.println("满足结果记录条数:" + topDocs.totalHits);

      

    //获取结果

    ScoreDoc[] scoreDocs = topDocs.scoreDocs;

    for (int i = 0; i < scoreDocs.length; i++) {

      //先获得Document下标

       int docID = scoreDocs[i].doc;

       Document document = indexSearcher.doc(docID);

       System.out.println("id:" + document.get("id"));

       System.out.println("title:" + document.get("title"));

       System.out.println("content:" + document.get("content"));

   }

 

    indexSearcher.close();

}

运行结果:

 

  1.  Luke查看的索引库内容:

索引库中信息,包括两大部分:

A 索引词条信息

B 文档对象信息

  1.  每个Field中都存在一个Store和一个Index

  2.  索引内容和Document内容有什么关系

查找时,通过索引内容  查找  文档对象信息

 

  1. 索引的查找过程

 

    <div id="digg" articleid="41794169">
        <dl id="btnDigg" class="digg digg_enable" onclick="btndigga();">
           
             <dt>顶</dt>
            <dd>0</dd>
        </dl>
       
          
        <dl id="btnBury" class="digg digg_enable" onclick="btnburya();">
          
              <dt>踩</dt>
            <dd>0</dd>               
        </dl>
        
    </div>
 <div class="tracking-ad" data-mod="popu_222"><a href="javascript:void(0);" target="_blank">&nbsp;</a>   </div>
<div class="tracking-ad" data-mod="popu_223"> <a href="javascript:void(0);" target="_blank">&nbsp;</a></div>
<script type="text/javascript">
    function btndigga() {
        $(".tracking-ad[data-mod='popu_222'] a").click();
    }
    function btnburya() {
        $(".tracking-ad[data-mod='popu_223'] a").click();
    }
        </script>
posted @ 2017-11-28 15:32  星朝  阅读(372)  评论(0编辑  收藏  举报