[转]Lucene 性能优化带数据
虽然是很久了的数据,还是有很好的参考价值的:
lucene.commit.batch.size=0
lucene.commit.time.interval=0
These properties allow commits in batch, you can either set how many document changes a batch will contain (commit will happen after X docs are modified) or set a time interval in milliseconds (commit will happen every X milliseconds).
lucene.buffer.size=16
This will call IndexWriter's setRAMBufferSizeMB method, this the max memory in megabytes used by Lucene before flushing documents, a higher number means less disk writes.
These 3 new properties will make reindexing faster, I've made stress tests and I could perceive a 30% improvement in certain configurations, I also learned the following while tweaking Lucene properties:
A higher lucene.buffer.size helps a lot during reindex, but there is a up bound limit, over that limit, reindex won't get faster, for example, setting to 32MB or 48MB gave the same results, but 32MB was much better than the default, 16MB.
It's a bad idea to set lucene.merge.factor to a very high number, there will be much more disk write accesses and this will degrade performance, keep it at 10.
Set lucene.autocommit.documents.interval to the number of documents you have in the index, this means only 1 commit will happen. I thought this property would bring better performance results, but it made it faster around 15% only.
Setting a higher lucene.optimize.interval can make some improvement, but since the reindex process also make searches, it's important that you also optimize the index often during reindex, you need to find a balance.
My stress tests consisted in reindexing 30.000 blog entries, I tested on a Intel Quadcore 2.66GHz, 4MB of RAM, Ubuntu 32 bits. The best result was around 4 minutes with this configuration:
lucene.commit.time.interval=30000
lucene.merge.factor=10
lucene.optimize.interval=10000
lucene.buffer.size=48
The worst result was 7:35 minutes (not considering the one I set lucene.merge.factor to 1000):
lucene.commit.time.interval=0
lucene.merge.factor=50
lucene.optimize.interval=1000
lucene.buffer.size=16
Here are the other results:
lucene.commit.time.interval=0
lucene.merge.factor=15
lucene.optimize.interval=30000
lucene.buffer.size=16
7:30 minutes
lucene.commit.time.interval=0
lucene.merge.factor=10
lucene.optimize.interval=100
lucene.buffer.size=16
7:18 minutes
lucene.commit.time.interval=10000
lucene.merge.factor=10
lucene.optimize.interval=100
lucene.buffer.size=16
06:23 minutes
lucene.commit.time.interval=1000
lucene.merge.factor=10
lucene.optimize.interval=100
lucene.buffer.size=16
6:00 minutes
lucene.commit.time.interval=30000
lucene.merge.factor=10
lucene.optimize.interval=100
lucene.buffer.size=32
5:00 minutes
lucene.commit.time.interval=30000
lucene.merge.factor=10
lucene.optimize.interval=100
lucene.buffer.size=48
5:00 minutes
lucene.commit.time.interval=30000
lucene.merge.factor=10
lucene.optimize.interval=50000
lucene.buffer.size=48
5:00 minutes
lucene.commit.time.interval=15000
lucene.merge.factor=10
lucene.optimize.interval=10000
lucene.buffer.size=48
5:00 minutes
lucene.commit.time.interval=30000
lucene.merge.factor=10
lucene.optimize.interval=1000
lucene.buffer.size=48
4:30 minutes
作者:today4king
出处:https://www.cnblogs.com/jinzhao/archive/2012/04/12/2444440.html
版权:本作品采用「署名-非商业性使用-相同方式共享 4.0 国际」许可协议进行许可。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
2010-04-12 34个优化SQL的技巧
2010-04-12 如何在 ASP.NET项目里面正确使用Linq to Sql(转)
2010-04-12 叠层模
2010-04-12 调用ThunderAgent 1.0 Type Library