Lucene的学习及使用实验
实验一下Lucene是怎么使用的。
参考:http://www.importnew.com/12715.html (例子比较简单)
http://www.yiibai.com/lucene/lucene_first_application.html (例子比较复杂)
这里也有一个例子:http://www.tuicool.com/articles/aqIZNnE
我用的版本比较高,是6.2.1版本,文档查阅:
http://lucene.apache.org/core/6_2_1/core/index.html
首先在Intellij里面创建一个Maven项目。名字为lucene-demo。(主要参考 http://www.importnew.com/12715.html )
其中pom.xml如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.myapp</groupId> <artifactId>lucene-demo</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core --> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>6.2.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>6.2.1</version> </dependency> </dependencies> </project>
讲了一个package:com.myapp.lucene,里面class LuceneDemo,内容如下:
package com.myapp.lucene; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.queryparser.classic.ParseException; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.Directory; import java.io.IOException; /** * Created by baidu on 16/10/20. */ public class LuceneDemo { // 0. Specify the analyzer for tokenizing text. // The same analyzer should be used for indexing and searching static StandardAnalyzer analyzer; static Directory index; static void prepareDoc() throws IOException{ // 0. init analyzer analyzer = new StandardAnalyzer(); // 1. create index index = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter w = new IndexWriter(index, config); addDoc(w, "lucence tutorial", "123456"); addDoc(w, "hi hi hi", "222"); addDoc(w, "ok LUCENCE", "123"); w.close(); } static void addDoc(IndexWriter w, String text, String more) throws IOException{ Document doc = new Document(); doc.add(new TextField("text", text, Field.Store.YES)); doc.add(new StringField("more", more, Field.Store.YES)); w.addDocument(doc); } static void search(String str) throws ParseException, IOException { // 2. query Query q = new QueryParser("text", analyzer).parse(str); // 3. search int listNum = 10; IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); TopScoreDocCollector collector = TopScoreDocCollector.create(listNum); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; // 4. display System.out.printf("Found %d docs.\n", hits.length); for (int i=0; i<hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); System.out.printf("Doc %d: text: %s, more: %s\n", i+1, doc.get("text"), doc.get("more")); } reader.close(); } public static void main(String[] args) { try { prepareDoc(); search("Lucence"); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } }
然后运行,能够成功:
Found 2 docs. Doc 1: text: lucence tutorial, more: 123456 Doc 2: text: ok LUCENCE, more: 123 Process finished with exit code 0
因为用的是RAMDirectory,所以应该没有创建实际的目录和文件。
另外,代码和逻辑中有几点需要注意的地方:
注意,对于需要分词的内容我们使用TextField,对于像id这样不需要分词的内容我们使用StringField。
编码过程中,报过好几次错,关于Exception需要wrap或者throws的情况。
有些API的版本升级了,参数和以前不一样。在实际的代码中根据实际要求有所修改。一般都是简化了。