MongoDB内存使用高的原因分析
2023-06-09 15:14 abce 阅读(922) 评论(0) 编辑 收藏 举报WT的缓存设置(cacheSizeGB)只是控制WT存储引擎使用到的内存,而不是整个mongod实例使用的内存。
MongoDB/WT配置中,还有很多地方需要使用内存:
·WT压缩磁盘存储,但是内存的数据没有压缩
·WT缺省不是每次提交都是fsync操作,因此日志文件也是在内存中。此外,为了更高效的使用I/O,WT将I/O请求分成chunk,也会使用一些内存
·WT会在内存中保存记录的多个版本
·WT检查缓存中的数据的校验和
·MongoDB需要内存来处理连接,聚合等代码
查看实例对内存的使用情况:
> db.serverStatus().mem { "bits" : 64, "resident" : 6931, "virtual" : 12383, "supported" : true }
其中:
·bits:标识32还是64位
·resident:可以粗略的等同使用的内存,单位是MB,表示当前被数据库进程使用的内存
·virtual:使用的虚拟内存,单位是MB
·supported:是否支持扩展的内存
常见的内存使用高的场景
1.WT引擎内存使用高
查看WT引擎对内存的使用情况:
> db.serverStatus().wiredTiger.cache { "application threads page read from disk to cache count" : 221561964, "application threads page read from disk to cache time (usecs)" : NumberLong("21902911799"), "application threads page write from cache to disk count" : 11929121, "application threads page write from cache to disk time (usecs)" : 119207458, "bytes allocated for updates" : 74468710, "bytes belonging to page images in the cache" : NumberLong("5350309948"), "bytes belonging to the history store table in the cache" : 47652, "bytes currently in the cache" : NumberLong("5426249734"), "bytes dirty in the cache cumulative" : NumberLong("788539520241"), "bytes not belonging to page images in the cache" : 75939786, "bytes read into cache" : NumberLong("11331422628708"), "bytes written from cache" : NumberLong("538082043341"), "cache overflow score" : 0, "checkpoint blocked page eviction" : 2515, "checkpoint of history store file blocked non-history store page eviction" : 0, "eviction calls to get a page" : 222623846, "eviction calls to get a page found queue empty" : 1558561, "eviction calls to get a page found queue empty after locking" : 2561019, "eviction currently operating in aggressive mode" : 0, "eviction empty score" : 0, "eviction gave up due to detecting an out of order on disk value behind the last update on the chain" : 0, "eviction gave up due to detecting an out of order tombstone ahead of the selected on disk update" : 0, "eviction gave up due to detecting an out of order tombstone ahead of the selected on disk update after validating the update chain" : 0, "eviction gave up due to detecting out of order timestamps on the update chain after the selected on disk update" : 0, "eviction passes of a file" : 12029193, "eviction server candidate queue empty when topping up" : 1056789, "eviction server candidate queue not empty when topping up" : 1630370, "eviction server evicting pages" : 0, "eviction server slept, because we did not make progress with eviction" : 8284783, "eviction server unable to reach eviction goal" : 0, "eviction server waiting for a leaf page" : 39546587, "eviction state" : 64, "eviction walk most recent sleeps for checkpoint handle gathering" : 118, "eviction walk target pages histogram - 0-9" : 7814276, "eviction walk target pages histogram - 10-31" : 1239602, "eviction walk target pages histogram - 128 and higher" : 0, "eviction walk target pages histogram - 32-63" : 679302, "eviction walk target pages histogram - 64-128" : 2296013, "eviction walk target pages reduced due to history store cache pressure" : 0, "eviction walk target strategy both clean and dirty pages" : 74687, "eviction walk target strategy only clean pages" : 11230619, "eviction walk target strategy only dirty pages" : 723887, "eviction walks abandoned" : 206303, "eviction walks gave up because they restarted their walk twice" : 7297449, "eviction walks gave up because they saw too many pages and found no candidates" : 320600, "eviction walks gave up because they saw too many pages and found too few candidates" : 11151, "eviction walks reached end of tree" : 14992319, "eviction walks restarted" : 0, "eviction walks started from root of tree" : 7757919, "eviction walks started from saved location in tree" : 4271274, "eviction worker thread active" : 4, "eviction worker thread created" : 0, "eviction worker thread evicting pages" : 218236828, "eviction worker thread removed" : 0, "eviction worker thread stable number" : 0, "files with active eviction walks" : 0, "files with new eviction walks started" : 7694870, "force re-tuning of eviction workers once in a while" : 0, "forced eviction - history store pages failed to evict while session has history store cursor open" : 6, "forced eviction - history store pages selected while session has history store cursor open" : 1243, "forced eviction - history store pages successfully evicted while session has history store cursor open" : 0, "forced eviction - pages evicted that were clean count" : 67803, "forced eviction - pages evicted that were clean time (usecs)" : 88110, "forced eviction - pages evicted that were dirty count" : 6442, "forced eviction - pages evicted that were dirty time (usecs)" : 16444442, "forced eviction - pages selected because of a large number of updates to a single item" : 6247, "forced eviction - pages selected because of too many deleted items count" : 41589, "forced eviction - pages selected count" : 93559, "forced eviction - pages selected unable to be evicted count" : 260, "forced eviction - pages selected unable to be evicted time" : 245, "hazard pointer blocked page eviction" : 122473, "hazard pointer check calls" : 218364087, "hazard pointer check entries walked" : 1636852354, "hazard pointer maximum array length" : 1, "history store score" : 0, "history store table insert calls" : 8602760, "history store table insert calls that returned restart" : 0, "history store table max on-disk size" : 0, "history store table on-disk size" : 36864, "history store table out-of-order resolved updates that lose their durable timestamp" : 0, "history store table out-of-order updates that were fixed up by reinserting with the fixed timestamp" : 0, "history store table reads" : 0, "history store table reads missed" : 0, "history store table reads requiring squashed modifies" : 0, "history store table truncation by rollback to stable to remove an unstable update" : 0, "history store table truncation by rollback to stable to remove an update" : 0, "history store table truncation to remove an update" : 0, "history store table truncation to remove range of updates due to key being removed from the data page during reconciliation" : 1349, "history store table truncation to remove range of updates due to out-of-order timestamp update on data page" : 0, "history store table writes requiring squashed modifies" : 86, "in-memory page passed criteria to be split" : 71721, "in-memory page splits" : 19164, "internal pages evicted" : 565347, "internal pages queued for eviction" : 630835, "internal pages seen by eviction walk" : 28818968, "internal pages seen by eviction walk that are already queued" : 98864, "internal pages split during eviction" : 183, "leaf pages split during eviction" : 54266, "maximum bytes configured" : NumberLong("7837057024"), "maximum page size at eviction" : 360, "modified pages evicted" : 758270, "modified pages evicted by application threads" : 0, "operations timed out waiting for space in cache" : 0, "overflow pages read into cache" : 0, "page split during eviction deepened the tree" : 9, "page written requiring history store records" : 327330, "pages currently held in the cache" : 2741, "pages evicted by application threads" : 773, "pages evicted in parallel with checkpoint" : 1802978, "pages queued for eviction" : 267242512, "pages queued for eviction post lru sorting" : 271117576, "pages queued for urgent eviction" : 6514928, "pages queued for urgent eviction during walk" : 970, "pages queued for urgent eviction from history store due to high dirty content" : 0, "pages read into cache" : 221574300, "pages read into cache after truncate" : 144789, "pages read into cache after truncate in prepare state" : 0, "pages requested from the cache" : 1532689858, "pages seen by eviction walk" : 376103054, "pages seen by eviction walk that are already queued" : 5006678, "pages selected for eviction unable to be evicted" : 143333, "pages selected for eviction unable to be evicted because of active children on an internal page" : 14363, "pages selected for eviction unable to be evicted because of failure in reconciliation" : 11, "pages selected for eviction unable to be evicted because of race between checkpoint and out of order timestamps handling" : 0, "pages walked for eviction" : NumberLong("4275968401"), "pages written from cache" : 13366790, "pages written requiring in-memory restoration" : 61027, "percentage overhead" : 8, "the number of times full update inserted to history store" : 3683821, "the number of times reverse modify inserted to history store" : 4918939, "tracked bytes belonging to internal pages in the cache" : 1264324, "tracked bytes belonging to leaf pages in the cache" : NumberLong("5424985410"), "tracked dirty bytes in the cache" : 14941279, "tracked dirty pages in the cache" : 11, "unmodified pages evicted" : 217443882 }
2.会话连接使用内存高
·每个连接对应的后台线程可以用到1MB的内存,通常是几百KB
·TCP连接的读写缓存是有参数tcp_rmem、tcp_wmem控制
·每个请求都有一个唯一的上下文。可能为请求分配多个临时buffers,这些临时buffers最初是释放给TCMallc缓存,最后被释放给操作系统。很多时候内部不足是由于TCMallc没有及时释放临时buffers
使用db.serverStatus().tcmalloc查看:TCMalloc缓存的大小=pageheap_free_bytes+total_free_byte values
> db.serverStatus().tcmalloc
格式话一下:
> var mem = db.serverStatus().tcmalloc; > mem.tcmalloc.formattedString ------------------------------------------------ MALLOC: 5222431464 ( 4980.5 MiB) Bytes in use by application MALLOC: + 1980387328 ( 1888.6 MiB) Bytes in page heap freelist MALLOC: + 61509992 ( 58.7 MiB) Bytes in central cache freelist MALLOC: + 8960 ( 0.0 MiB) Bytes in transfer cache freelist MALLOC: + 31453360 ( 30.0 MiB) Bytes in thread cache freelists MALLOC: + 34078720 ( 32.5 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 7329869824 ( 6990.3 MiB) Actual memory used (physical + swap) MALLOC: + 3715723264 ( 3543.6 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 11045593088 (10533.9 MiB) Virtual address space used MALLOC: MALLOC: 31525 Spans in use MALLOC: 214 Thread heaps in use MALLOC: 4096 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory.
3.元数据使用的内存高
4.创建索引导致内存使用高
通常复制集从节点使用一个256MB的buffer来做数据重放。在主节点创建索引后,从节点可能需要超过256M的内存来做数据重放。
在MongoDB 4.2之前的版本中,索引是在主节点上后台创建的。创建索引的串行重放可能最多消耗500 MB内存。在MongoDB 4.2及以后的版本中,索引不能在后台创建,辅助节点可以执行并行重放来创建索引。这需要更多内存,在一次创建多个索引时可能会发生实例OOM错误。
5.查询计划缓存使用内存高
最后:
内存优化的目标不是最小化内存使用。相反,内存优化寻求资源消耗和性能之间的平衡。理想情况下,内存保持充足和稳定,系统性能不受影响。
建议使用以下方法来优化内存使用:
·控制并发连接数。根据性能测试的结果,可以在数据库中创建100个持久连接。默认情况下,MongoDB驱动程序可以与后端建立100个连接池。如果存在大量客户端,则必须为每个客户端减小连接池的大小。我们建议在一个数据库中建立的持久连接不要超过1000个。否则,内存和多线程上下文中的开销可能会增加,并导致请求的处理延迟。
·减少单个请求的内存开销。例如,可以创建索引来减少收集扫描的次数并执行内存排序。
·如果连接数正常,但内存使用持续增加,建议升级内存配置。否则,可能会由于OOM错误和大量缓存清理而导致系统性能急剧下降。