Lucene实践之IndexFile
惦记了好几天的lucene开始学习。
Game Starts
文档参考:
1、http://lucene.apache.org/core/4_9_0/demo/src-html/org/apache/lucene/demo/IndexFiles.html
2、http://www.ibm.com/developerworks/cn/java/j-lo-lucene1/
3、http://www.cnblogs.com/likehua/archive/2012/02/16/2354532.html
依赖jar包
1) lucene-core-4.6.0.jar
2) lucene-analyzers-common-4.6.0.jar
3) lucene-queryparser-4.6.0.jar
http://archive.apache.org/dist/lucene/java/
主要的类(参考[文档2])
a) Document:用来封装要建索引的文档
b) Field:描述文档的属性
c) Directory:目录,文档的目录、索引的目录等
d) Analyzer:分词器
e) IndexWriterConfig:配置信息
f) IndexWriter:创建索引的核心类
What's Up
怎么更新索引
设置了config.setOpenMode(OpenMode.CREATE);后以为高枕无忧,然后又建了一个IndexSearcher来测试建的索引。
对于Indexer中的main多执行了几遍(产生了好几个索引文件,并没有覆盖),结果用IndexSearcher来搜索的时候就出现了重复的结果。
命名都设置openMode了怎么回事儿,然后去找资料。
在网上看到有个人说了一句索引是不是锁住了,indexWriter.isLocked(indexDir)果然是true;果断解锁indexWriter.unlock(indexDir);结果报错了,lucene好像不乐意大家这么干。
继续查看lucene锁的问题,看到[文档3]
问题就出在这。然后再代码里index(new File(data));复制了好几下,用同一个IndexWriter执行,果然覆盖了。
要是重启了怎么办呢。indexWriter.deleteAll();把索引都删了,重建吧!
Always Be Coding
代码参考[文档1]
1 package lucene; 2 3 import java.io.File; 4 import java.io.FileNotFoundException; 5 import java.io.FileReader; 6 import java.io.IOException; 7 8 import org.apache.lucene.analysis.Analyzer; 9 import org.apache.lucene.analysis.standard.StandardAnalyzer; 10 import org.apache.lucene.document.Document; 11 import org.apache.lucene.document.Field; 12 import org.apache.lucene.document.StringField; 13 import org.apache.lucene.document.Field.Store; 14 import org.apache.lucene.document.TextField; 15 import org.apache.lucene.index.IndexWriter; 16 import org.apache.lucene.index.IndexWriterConfig; 17 import org.apache.lucene.index.IndexWriterConfig.OpenMode; 18 import org.apache.lucene.store.Directory; 19 import org.apache.lucene.store.FSDirectory; 20 import org.apache.lucene.util.Version; 21 22 public class Indexer { 23 private static IndexWriter indexWriter;
//index索引目录 data文档目录 24 public static void index(String index,String data) { 25 Directory indexDir; 26 try { 27 indexDir = FSDirectory.open(new File(index)); //索引存放目录 28 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); //分词器 29 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_46, analyzer); 30 config.setOpenMode(OpenMode.CREATE);
/* OpenMode 设置索引是否覆盖
APPEND appens an existing index.CREATE reates a new index or overwrites an existing one.
CREATE_OR_APPEND reates a new index if one does not exist, otherwise it opens the index and
documents will be appended.
*/
31 indexWriter = new IndexWriter(indexDir, new IndexWriterConfig(Version.LUCENE_46, analyzer)); 32 indexWriter.deleteAll(); //有事儿
//System.out.println(indexWriter.isLocked(indexDir));
//indexWriter.unlock(indexDir); 33 index(new File(data)); //构建索引 34 indexWriter.close(); 35 } catch (IOException e) { 36 e.printStackTrace(); 37 } 38 } 39 private static void index(File dataFile) { 40 if(dataFile.isDirectory()) { //文件夹递归 41 File[] files = dataFile.listFiles(); 42 for(File file : files) { 43 index(file); 44 } 45 } else { 46 try { 47 Document doc = new Document(); //文档
//Field(name,value,store),Store.YES索引并存储,Store.NO只索引不存储
48 Field name = new StringField("name", dataFile.getName(), Store.YES); //注意StringField是不分词的! 49 doc.add(name); 50 Field path = new StringField("path", dataFile.getAbsolutePath(), Store.YES); 51 doc.add(path); 52 Field content = new TextField("content", new FileReader(dataFile));//TextField默认Store.NO 53 doc.add(content); 54 indexWriter.addDocument(doc); //加入索引 55 } catch (FileNotFoundException e) { 56 e.printStackTrace(); 57 } catch (IOException e) { 58 e.printStackTrace(); 59 } 60 } 61 } 62 public static void main(String[] args) throws InterruptedException { 63 64 index("C:/Users/Administrator/Desktop/df","E:/data/data"); 65 66 } 67 }
TO BE CONTINUED ……