lucene实现近实时索引
另一篇https://blog.csdn.net/cdnight/article/details/40273519
近实时搜索(near-real-time)可以搜索IndexWriter还未commit的内容。
Index索引的刷新过程:
只有IndexWriter上的commit操作才会导致Ram Directory内存上的数据完全同步到文件。
IndexWriter提供了实时获得reader的API,这个调用将会导致flush操作,生成新的segment,但不会commit (fsync),从而减少了IO。新的segment被加入到新生成的reader里。从返回的reader中可以看到更新。
所以,只要每次新的搜索都从IndexWriter获得一个新的reader,就可以搜索到最新的内容。这一操作的开销仅仅是flush,相对commmit来说,开销很小。
Lucene的index索引组织方式为一个index目录下的多个segment片段,新的doc会加入新的segment里,这些新的小segment每间隔一段时间就会合并起来。因为合并,总的sgement数量保持的较小,总体的search速度仍然很快。
为了防止读写冲突,lucene只创建新的segment,并对任何active状态的reader,不在使用后删除老的segment。
flush就是把数据写入操作系统的缓冲区,只要缓冲区不满,就不会有硬盘操作。
commit是把所有内存缓冲区内的数据写入到硬盘,是完全的硬盘操作,属于重量级的操作。这是因为Lucene索引中最主要的结构posting倒排通过VInt类型和delta的格式存储并紧密排列。合并时要对同一个term的posting(倒排)进行归并排序,是一个读出,合并再生成的过程。
SearchManager近实时搜索 实现原理:
Lucene通过NRTManager这个类来实现近实时搜索,所谓近实时搜索也就是在索引发生改变时,通过线程跟踪,在相对很短的时间内反映给用户程序的 调用NRTManager通过管理IndexWriter对象,并将IndexWriter的一些方法进行增删改,例如:addDocument,deleteDocument等方法暴漏给客户调用,它的操作全部在内存里面,所以如果你不调用IndexWriter的commit方法,通过以上的操作,用户硬盘里面的索引库是不会变化的,所以你每次更新完索引库请记得commit掉,这样才能将变化的索引一起写到硬盘中。
实现索引更新后的同步用户每次获取最新索引(IndexSearcher),可以通过两种方式:
第一种是通过调用NRTManagerReopenThread对象,该线程负责实时跟踪索引内存的变化,每次变化就调用maybeReopen方法,保持最新代索引,打开一个新的IndexSearcher对象,而用户所要的IndexSearcher对象是NRTManager通过调用getSearcherManager方法获得SearcherManager对象,然后通过SearcherManager对象获取IndexSearcher对象返回个客户使用,用户使用完之后调用SearcherManager的release释放IndexSearcher对象,最后记得关闭NRTManagerReopenThread;
第二种方式是不通过NRTManagerReopenThread对象,而是直接调用NRTManager的maybeReopen方法来获取最新的IndexSearcher对象来获取最新索引.
public void testSearch() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
SearcherManager sm = new SearcherManager(directory, null);
IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc doc : docs) {
System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
Document document = searcher.doc(doc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
}
public void testUpdateAndSearch() throws IOException, InterruptedException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(directory, config);
TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
SearcherManager sm = new SearcherManager(writer, true, null);
ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
thread.setDaemon(true);
thread.setName("NRT Index Manager Thread");
thread.start();
Document doc = new Document();
Field idField = new StringField("id", "3", Store.YES);
Field titleField = new TextField("title", "test for 3", Store.YES);
doc.add(idField);
doc.add(titleField);
long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
// Thread.sleep(1000);
// writer.close();
// sm.maybeRefresh();
// sm = new SearcherManager(writer, true, null);
thread.waitForGeneration(gerenation);
IndexSearcher searcher = sm.acquire();
Query query = new TermQuery(new Term("title", "test"));
TopDocs results = searcher.search(query, null, 100);
System.out.println(results.totalHits);
ScoreDoc[] docs = results.scoreDocs;
for (ScoreDoc scoreDoc : docs) {
System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
Document document = searcher.doc(scoreDoc.doc);
System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
}
sm.release(searcher);
sm.close();
// IndexSearcher searcher = sm.acquire();
// IndexReader reader = DirectoryReader.open(directory);
// IndexSearcher searcher = new IndexSearcher(reader);
// Query query = new TermQuery(new Term("title", "test"));
// TopDocs results = searcher.search(query, null, 100);
// System.out.println(results.totalHits);
// ScoreDoc[] docs = results.scoreDocs;
// for (ScoreDoc doc : docs) {
// System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
// doc.score);
// Document document = searcher.doc(doc.doc);
// System.out.println("id:" + document.get("id") + " ,title:" +
// document.get("title"));
// }
// sm.release(searcher);
}
创建索引:
public void testBulidIndex() throws IOException {
Directory directory = FSDirectory.open(new File("/root/data/03"));
// Directory directory=new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(directory, config);
Document doc1 = new Document();
Field idField1 = new StringField("id", "1", Store.YES);
Field titleField1 = new TextField("title", "test for 1", Store.YES);
doc1.add(idField1);
doc1.add(titleField1);
writer.addDocument(doc1);
Document doc2 = new Document();
Field idField2 = new StringField("id", "2", Store.YES);
Field titleField2 = new TextField("title", "test for 2", Store.YES);
doc2.add(idField2);
doc2.add(titleField2);
writer.addDocument(doc2);
writer.commit();
writer.close();
}