Lucene索引的【增、删、改、查】
前言
搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。
目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。
创建索引
Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:
1 2 3 4 5 6 | 1 Directory directory = FSDirectory.open( "/tmp/testindex" ); 2 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer); 3 IndexWriter iwriter = new IndexWriter(directory, config); 4 Document doc = new Document(); 5 String text = "This is the text to be indexed." ; 6 doc.add( new Field( "fieldname" , text, TextField.TYPE_STORED)); iwriter.close(); |
1 创建Directory,获取索引目录
2 创建词法分析器,创建IndexWriter对象
3 创建document对象,存储数据
4 关闭IndexWriter,提交
1 2 3 4 5 6 7 | 1 创建Directory,获取索引目录 2 创建词法分析器,创建IndexWriter对象 3 创建document对象,存储数据 4 关闭IndexWriter,提交 |
增量添加索引
Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /** * 增加索引 * * @throws Exception */ public static void insert() throws Exception { String text5 = "hello,goodbye,man,woman" ; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add( new TextField( "filename" , "text5" , Store.YES)); doc1.add( new TextField( "content" , text5, Store.YES)); indexWriter.addDocument(doc1); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System. out .println( "增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } |
删除索引
Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | /** * 删除索引 * * @param str 删除的关键字 * @throws Exception */ public static void delete(String str) throws Exception { Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); indexWriter.deleteDocuments( new Term( "filename" ,str)); indexWriter.close(); Date date2 = new Date(); System. out .println( "删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } |
更新索引
Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /** * 更新索引 * * @throws Exception */ public static void update() throws Exception { String text1 = "update,hello,man!" ; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add( new TextField( "filename" , "text1" , Store.YES)); doc1.add( new TextField( "content" , text1, Store.YES)); indexWriter.updateDocument( new Term( "filename" , "text1" ), doc1); indexWriter.close(); Date date2 = new Date(); System. out .println( "更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } |
通过索引查询关键字
Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | /** * 关键字查询 * * @param str * @throws Exception */ public static void search(String str) throws Exception { directory = FSDirectory.open( new File(INDEX_DIR)); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content" ,analyzer); Query query = parser.parse(str); ScoreDoc[] hits = isearcher.search(query, null , 1000).scoreDocs; for ( int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System. out .println(hitDoc. get ( "filename" )); System. out .println(hitDoc. get ( "content" )); } ireader.close(); directory.close(); } |
全部代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | package test; import java.io.File; import java.util.Date; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.LongField; import org.apache.lucene.document.TextField; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class TestLucene { // 保存路径 private static String INDEX_DIR = "D:\\luceneIndex" ; private static Analyzer analyzer = null ; private static Directory directory = null ; private static IndexWriter indexWriter = null ; public static void main(String[] args) { try { // index(); search( "man" ); // insert(); // delete("text5"); // update(); } catch (Exception e) { e.printStackTrace(); } } /** * 更新索引 * * @throws Exception */ public static void update() throws Exception { String text1 = "update,hello,man!" ; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add( new TextField( "filename" , "text1" , Store.YES)); doc1.add( new TextField( "content" , text1, Store.YES)); indexWriter.updateDocument( new Term( "filename" , "text1" ), doc1); indexWriter.close(); Date date2 = new Date(); System. out .println( "更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } /** * 删除索引 * * @param str 删除的关键字 * @throws Exception */ public static void delete(String str) throws Exception { Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); indexWriter.deleteDocuments( new Term( "filename" ,str)); indexWriter.close(); Date date2 = new Date(); System. out .println( "删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } /** * 增加索引 * * @throws Exception */ public static void insert() throws Exception { String text5 = "hello,goodbye,man,woman" ; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add( new TextField( "filename" , "text5" , Store.YES)); doc1.add( new TextField( "content" , text5, Store.YES)); indexWriter.addDocument(doc1); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System. out .println( "增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } /** * 建立索引 * * @param args */ public static void index() throws Exception { String text1 = "hello,man!" ; String text2 = "goodbye,man!" ; String text3 = "hello,woman!" ; String text4 = "goodbye,woman!" ; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open( new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add( new TextField( "filename" , "text1" , Store.YES)); doc1.add( new TextField( "content" , text1, Store.YES)); indexWriter.addDocument(doc1); Document doc2 = new Document(); doc2.add( new TextField( "filename" , "text2" , Store.YES)); doc2.add( new TextField( "content" , text2, Store.YES)); indexWriter.addDocument(doc2); Document doc3 = new Document(); doc3.add( new TextField( "filename" , "text3" , Store.YES)); doc3.add( new TextField( "content" , text3, Store.YES)); indexWriter.addDocument(doc3); Document doc4 = new Document(); doc4.add( new TextField( "filename" , "text4" , Store.YES)); doc4.add( new TextField( "content" , text4, Store.YES)); indexWriter.addDocument(doc4); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System. out .println( "创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n" ); } /** * 关键字查询 * * @param str * @throws Exception */ public static void search(String str) throws Exception { directory = FSDirectory.open( new File(INDEX_DIR)); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content" ,analyzer); Query query = parser.parse(str); ScoreDoc[] hits = isearcher.search(query, null , 1000).scoreDocs; for ( int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System. out .println(hitDoc. get ( "filename" )); System. out .println(hitDoc. get ( "content" )); } ireader.close(); directory.close(); } } |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· 【自荐】一款简洁、开源的在线白板工具 Drawnix