系列汇总:
IndexWriter中的commit、rollback和close基础介绍(api级别)
Commit:
Commits all pending changes (added & deleted documents, optimizations, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. Note that this does not wait for any running background merges to finish. This may be a costly operation, so you should test the cost in your application and do it only when really necessary.
rollback:
Close the IndexWriter
without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called). This removes any temporary files that had been created, after which the state of the index will be the same as it was when commit() was last called or when this writer was first opened.
回滚可以理解为操作系统的还原操作,还原到最近一次提交时的状态。如果IndexWriter打开后没有commit过,则还原到IndexWriter打开时的状态。
注意一点:
rollback会关闭当前IndexWriter实例。
close:
Commits all changes to an index and closes all associated files. Note that this may be a costly operation, so, try to re-use a single writer instead of closing and opening a new one.
下面给出一个实例,说明三个动作的特点及注意事项:
首先,给出控制索引目录的类(很简单,不用啰嗦说明)
import java.io.File;
import java.io.IOException;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.SimpleFSDirectory;
/**
* @author huangfox
* 索引目录管理。
*/
publicclass Dir {
publicstatic String path ="d:/realtime";
publicstatic FSDirectory dir =null;
publicstatic String getPath() {
return path;
}
publicstatic FSDirectory getDir() {
if(dir==null){
try {
dir = SimpleFSDirectory.open(new File(path));
} catch (IOException e) {
e.printStackTrace();
}
}
return dir;
}
publicstaticvoid closeDir(){
if(dir!=null)
dir.close();
}
/**
* @param args
*/
publicstaticvoid main(String[] args) {
// TODO Auto-generated method stub
}
}
接着给出添加文档的类:
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;
/**
* @author huangfox
* 测试添加
*/
publicclass Writer {
IndexWriter writer =null ;
FSDirectory dir = Dir.getDir();
public Writer(){
try {
writer =new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED);
writer.setMaxBufferedDocs(4);
System.out.println(writer.getMaxBufferedDocs()+":max buffered docs");
System.out.println(writer.getMaxMergeDocs()+":max merge docs");
System.out.println(writer.getRAMBufferSizeMB()+":ram buffer size mb");
// System.out.println(writer.getMergePolicy().toString()+":merge plicy");
System.out.println(writer.getMergeFactor()+":merge factor");
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 添加数据,不commit
* @param content
* @return
*/
publicint add(String content) {
int res =0 ;
try {
System.out.println("version : "+ writer.getReader().getVersion() );
System.out.println("num doc : "+ writer.getReader().numDocs());
Document doc =new Document();
doc.add(new Field("f", content, Store.YES, Index.ANALYZED));
//
if(content==null|| content.trim().equals("")){
thrownew Exception("模拟异常,启用回滚机制!");
}
writer.addDocument(doc);
res =1 ;
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
try {
writer.rollback();
System.out.println("回滚...");
} catch (IOException e1) {
e1.printStackTrace();
}
} finally{
res =-1 ;
}
return res ;
}
/**
* 添加数据,进行commit
* @param content
* @return
*/
publicint addc(String content) {
int res =0 ;
try {
System.out.println("version : "+ writer.getReader().getVersion() );
System.out.println("num doc : "+ writer.getReader().numDocs());
Document doc =new Document();
doc.add(new Field("f", content, Store.YES, Index.ANALYZED));
//
if(content==null|| content.trim().equals("")){
thrownew Exception("模拟异常,启用回滚机制!");
}
writer.addDocument(doc);
writer.commit();
res =1 ;
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
try {
writer.rollback();
System.out.println("回滚...");
} catch (IOException e1) {
e1.printStackTrace();
}
} finally{
res =-1 ;
}
return res ;
}
/**
* 关闭IndexWriter实例。
*/
publicvoid close(){
if(writer!=null)
try {
writer.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
该添加文档的类,提供了两个添加文档的方法。
区别是:
add方法:添加后不提交(commit);
addc方法:添加后马上提交。
公用一个Indexwriter实例。
接着再给出一个测试检索的类:
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
/**
* @author huangfox
* 搜索
*/
publicclass Search {
IndexSearcher searcher =null ;
IndexReader reader =null ;
public Search(){
FSDirectory dir = Dir.getDir();
try {
reader = IndexReader.open(dir);
searcher =new IndexSearcher(dir);
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 重新代开IndexSearcher实例。
*/
publicvoid reopen(){
closeSearcher();
FSDirectory dir = Dir.getDir();
try {
searcher =new IndexSearcher(dir);
System.out.println(searcher.getIndexReader().numDocs());
long version = searcher.getIndexReader().getVersion();
System.out.println("version : "+ version);
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 模拟检索过程。
* @param queryString
*/
publicvoid search(String queryString){
try {
Query query =new QueryParser(Version.LUCENE_30, "f" , new StandardAnalyzer(Version.LUCENE_30)).parse(queryString);
TopScoreDocCollector results = TopScoreDocCollector.create(10, true);
searcher.search(query, results);
//
System.out.println("total hits :"+results.getTotalHits());
TopDocs top = results.topDocs(0, results.getTotalHits());
ScoreDoc[] docs = top.scoreDocs;
for (int i =0; i < docs.length; i++) {
System.out.println(searcher.doc(docs[i].doc));
}
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
}
/**
* 关闭Indexsearcher实例。
*/
publicvoid closeSearcher(){
if(this.searcher!=null)
try {
this.searcher.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
注意其中给出一个reopen的方法,即重新构造一个新的IndexSearcher实例。
最后给出一个总控制程序,进行测试:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
/**
* @author huangfox
* 控制程序。
*/
publicclass MainApp {
/**
* @param args
*/
publicstaticvoid main(String[] args) {
BufferedReader reader =new BufferedReader(new InputStreamReader(System.in));
Writer w =new Writer();
Search s =new Search();
String order ="" ;
while(true){
try {
System.out.println("输入指令:");
order = reader.readLine();
if(order.equals("add")){
System.out.println("字段内容:");
String content = reader.readLine();
w.add(content);
}
elseif(order.equals("addc")){
System.out.println("字段内容:");
String content = reader.readLine();
w.addc(content);
}
elseif(order.equals("sea")){
System.out.println("检索式:");
String queryString = reader.readLine();
s.search(queryString);
}
elseif(order.equals("reopen")){
s.reopen();
System.out.println("searcher was reopened...");
}
elseif(order.equals("e")){
w.close();
s.closeSearcher();
break;
}
else{
continue;
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
开始进行测试:
第一项测试:
测试目的:
添加的文档在当前IndexWriter实例没有关闭之前,要想让IndexSearcher可见,必要条件包括:
IndexWriter实例添加文档后commit;
IndexSearcher实例重新打开reopen;
测试过程:
add一个文档,
检索(预计检索不到);
addc一个文档,
检索(预计检索不到);
reopen
检索(预计检索到);
测试结果:
2 2147483647:max merge docs
3 16.0:ram buffer size mb
4 10:merge factor
5 输入指令:
6 add
7 字段内容:
8 fox1
9 version : 1287380905032
10 num doc : 0
11 输入指令:
12 sea
13 检索式:
14 fox*
15 total hits :0//说明:添加文档不提交IndexSearcher实例不可见。
16 输入指令:
17 addc
18 字段内容:
19 fox2
20 version : 1287380905032
21 num doc : 1
22 输入指令:
23 sea
24 检索式:
25 fox*
26 total hits :0//说明:添加文档提交IndexSearcher实例不重新打开,同样不可见。
27 输入指令:
28 reopen
29 2
30 version : 1287380905033
31 searcher was reopened...
32 输入指令:
33 sea
34 检索式:
35 fox*
36 total hits :2
37 Document<stored,indexed,tokenized<f:fox1>>
38 Document<stored,indexed,tokenized<f:fox2>>//说明:重新打开IndexSearcher实例,可见。
测试结论:
要使IndexSearcher对当前索引的更新可见,IndexWriter的更新动作后必须提交,并且IndexSearcher实例必须reopen。
第二项测试:
测试目的:
rollback将回滚掉最近一次commit动作后的所有更新。
测试过程:
第一项测试已经添加了两篇文档(fox1、fox2)
add一篇文档(fox3)
add一篇文档(fox4)
add一个空字符,造成异常模拟rollback(此时IndexWriter实例已经关闭。)
重新开启主控制程序(MainApp)
addc一篇文档(fox5)
reopen
sea(预计检索fox1、fox2、fox5)
测试结果:
制造异常导致rollback的结果略去分析。
2 2147483647:max merge docs
3 16.0:ram buffer size mb
4 10:merge factor
5 输入指令:
6 addc
7 字段内容:
8 fox5
9 version : 1287380905033
10 num doc : 2
11 输入指令:
12 reopen
13 3
14 version : 1287380905034
15 searcher was reopened...
16 输入指令:
17 sea
18 检索式:
19 fox*
20 total hits :3
21 Document<stored,indexed,tokenized<f:fox1>>
22 Document<stored,indexed,tokenized<f:fox2>>
23 Document<stored,indexed,tokenized<f:fox5>>
24 输入指令:
测试结果和预计相同。
测试结论:
rollback将回滚掉最近一次commit动作后的所有更新记录。
------------------------------------
以前做实验的时候,总觉得写实验报告很麻烦,而且无意义...
现在想想,其实这个过程是“有目的”、“有计划”的过程,他能帮我们理顺思路,一步一步获得实验结论,总结出自己的知识点。
以上实验虽然很简单,不过我感受到了“写实验报告”的好处!