代码改变世界

MongoDB内存使用高的原因分析

  abce  阅读(985)  评论(0编辑  收藏  举报

WT的缓存设置(cacheSizeGB)只是控制WT存储引擎使用到的内存,而不是整个mongod实例使用的内存。

MongoDB/WT配置中,还有很多地方需要使用内存:

·WT压缩磁盘存储,但是内存的数据没有压缩

·WT缺省不是每次提交都是fsync操作,因此日志文件也是在内存中。此外,为了更高效的使用I/O,WT将I/O请求分成chunk,也会使用一些内存

·WT会在内存中保存记录的多个版本

·WT检查缓存中的数据的校验和

·MongoDB需要内存来处理连接,聚合等代码

 

查看实例对内存的使用情况:

1
2
> db.serverStatus().mem
{ "bits" : 64, "resident" : 6931, "virtual" : 12383, "supported" : true }

其中:

·bits:标识32还是64位

·resident:可以粗略的等同使用的内存,单位是MB,表示当前被数据库进程使用的内存

·virtual:使用的虚拟内存,单位是MB

·supported:是否支持扩展的内存

 

常见的内存使用高的场景

1.WT引擎内存使用高

查看WT引擎对内存的使用情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
> db.serverStatus().wiredTiger.cache
{
        "application threads page read from disk to cache count" : 221561964,
        "application threads page read from disk to cache time (usecs)" : NumberLong("21902911799"),
        "application threads page write from cache to disk count" : 11929121,
        "application threads page write from cache to disk time (usecs)" : 119207458,
        "bytes allocated for updates" : 74468710,
        "bytes belonging to page images in the cache" : NumberLong("5350309948"),
        "bytes belonging to the history store table in the cache" : 47652,
        "bytes currently in the cache" : NumberLong("5426249734"),
        "bytes dirty in the cache cumulative" : NumberLong("788539520241"),
        "bytes not belonging to page images in the cache" : 75939786,
        "bytes read into cache" : NumberLong("11331422628708"),
        "bytes written from cache" : NumberLong("538082043341"),
        "cache overflow score" : 0,
        "checkpoint blocked page eviction" : 2515,
        "checkpoint of history store file blocked non-history store page eviction" : 0,
        "eviction calls to get a page" : 222623846,
        "eviction calls to get a page found queue empty" : 1558561,
        "eviction calls to get a page found queue empty after locking" : 2561019,
        "eviction currently operating in aggressive mode" : 0,
        "eviction empty score" : 0,
        "eviction gave up due to detecting an out of order on disk value behind the last update on the chain" : 0,
        "eviction gave up due to detecting an out of order tombstone ahead of the selected on disk update" : 0,
        "eviction gave up due to detecting an out of order tombstone ahead of the selected on disk update after validating the update chain" : 0,
        "eviction gave up due to detecting out of order timestamps on the update chain after the selected on disk update" : 0,
        "eviction passes of a file" : 12029193,
        "eviction server candidate queue empty when topping up" : 1056789,
        "eviction server candidate queue not empty when topping up" : 1630370,
        "eviction server evicting pages" : 0,
        "eviction server slept, because we did not make progress with eviction" : 8284783,
        "eviction server unable to reach eviction goal" : 0,
        "eviction server waiting for a leaf page" : 39546587,
        "eviction state" : 64,
        "eviction walk most recent sleeps for checkpoint handle gathering" : 118,
        "eviction walk target pages histogram - 0-9" : 7814276,
        "eviction walk target pages histogram - 10-31" : 1239602,
        "eviction walk target pages histogram - 128 and higher" : 0,
        "eviction walk target pages histogram - 32-63" : 679302,
        "eviction walk target pages histogram - 64-128" : 2296013,
        "eviction walk target pages reduced due to history store cache pressure" : 0,
        "eviction walk target strategy both clean and dirty pages" : 74687,
        "eviction walk target strategy only clean pages" : 11230619,
        "eviction walk target strategy only dirty pages" : 723887,
        "eviction walks abandoned" : 206303,
        "eviction walks gave up because they restarted their walk twice" : 7297449,
        "eviction walks gave up because they saw too many pages and found no candidates" : 320600,
        "eviction walks gave up because they saw too many pages and found too few candidates" : 11151,
        "eviction walks reached end of tree" : 14992319,
        "eviction walks restarted" : 0,
        "eviction walks started from root of tree" : 7757919,
        "eviction walks started from saved location in tree" : 4271274,
        "eviction worker thread active" : 4,
        "eviction worker thread created" : 0,
        "eviction worker thread evicting pages" : 218236828,
        "eviction worker thread removed" : 0,
        "eviction worker thread stable number" : 0,
        "files with active eviction walks" : 0,
        "files with new eviction walks started" : 7694870,
        "force re-tuning of eviction workers once in a while" : 0,
        "forced eviction - history store pages failed to evict while session has history store cursor open" : 6,
        "forced eviction - history store pages selected while session has history store cursor open" : 1243,
        "forced eviction - history store pages successfully evicted while session has history store cursor open" : 0,
        "forced eviction - pages evicted that were clean count" : 67803,
        "forced eviction - pages evicted that were clean time (usecs)" : 88110,
        "forced eviction - pages evicted that were dirty count" : 6442,
        "forced eviction - pages evicted that were dirty time (usecs)" : 16444442,
        "forced eviction - pages selected because of a large number of updates to a single item" : 6247,
        "forced eviction - pages selected because of too many deleted items count" : 41589,
        "forced eviction - pages selected count" : 93559,
        "forced eviction - pages selected unable to be evicted count" : 260,
        "forced eviction - pages selected unable to be evicted time" : 245,
        "hazard pointer blocked page eviction" : 122473,
        "hazard pointer check calls" : 218364087,
        "hazard pointer check entries walked" : 1636852354,
        "hazard pointer maximum array length" : 1,
        "history store score" : 0,
        "history store table insert calls" : 8602760,
        "history store table insert calls that returned restart" : 0,
        "history store table max on-disk size" : 0,
        "history store table on-disk size" : 36864,
        "history store table out-of-order resolved updates that lose their durable timestamp" : 0,
        "history store table out-of-order updates that were fixed up by reinserting with the fixed timestamp" : 0,
        "history store table reads" : 0,
        "history store table reads missed" : 0,
        "history store table reads requiring squashed modifies" : 0,
        "history store table truncation by rollback to stable to remove an unstable update" : 0,
        "history store table truncation by rollback to stable to remove an update" : 0,
        "history store table truncation to remove an update" : 0,
        "history store table truncation to remove range of updates due to key being removed from the data page during reconciliation" : 1349,
        "history store table truncation to remove range of updates due to out-of-order timestamp update on data page" : 0,
        "history store table writes requiring squashed modifies" : 86,
        "in-memory page passed criteria to be split" : 71721,
        "in-memory page splits" : 19164,
        "internal pages evicted" : 565347,
        "internal pages queued for eviction" : 630835,
        "internal pages seen by eviction walk" : 28818968,
        "internal pages seen by eviction walk that are already queued" : 98864,
        "internal pages split during eviction" : 183,
        "leaf pages split during eviction" : 54266,
        "maximum bytes configured" : NumberLong("7837057024"),
        "maximum page size at eviction" : 360,
        "modified pages evicted" : 758270,
        "modified pages evicted by application threads" : 0,
        "operations timed out waiting for space in cache" : 0,
        "overflow pages read into cache" : 0,
        "page split during eviction deepened the tree" : 9,
        "page written requiring history store records" : 327330,
        "pages currently held in the cache" : 2741,
        "pages evicted by application threads" : 773,
        "pages evicted in parallel with checkpoint" : 1802978,
        "pages queued for eviction" : 267242512,
        "pages queued for eviction post lru sorting" : 271117576,
        "pages queued for urgent eviction" : 6514928,
        "pages queued for urgent eviction during walk" : 970,
        "pages queued for urgent eviction from history store due to high dirty content" : 0,
        "pages read into cache" : 221574300,
        "pages read into cache after truncate" : 144789,
        "pages read into cache after truncate in prepare state" : 0,
        "pages requested from the cache" : 1532689858,
        "pages seen by eviction walk" : 376103054,
        "pages seen by eviction walk that are already queued" : 5006678,
        "pages selected for eviction unable to be evicted" : 143333,
        "pages selected for eviction unable to be evicted because of active children on an internal page" : 14363,
        "pages selected for eviction unable to be evicted because of failure in reconciliation" : 11,
        "pages selected for eviction unable to be evicted because of race between checkpoint and out of order timestamps handling" : 0,
        "pages walked for eviction" : NumberLong("4275968401"),
        "pages written from cache" : 13366790,
        "pages written requiring in-memory restoration" : 61027,
        "percentage overhead" : 8,
        "the number of times full update inserted to history store" : 3683821,
        "the number of times reverse modify inserted to history store" : 4918939,
        "tracked bytes belonging to internal pages in the cache" : 1264324,
        "tracked bytes belonging to leaf pages in the cache" : NumberLong("5424985410"),
        "tracked dirty bytes in the cache" : 14941279,
        "tracked dirty pages in the cache" : 11,
        "unmodified pages evicted" : 217443882
}

 

2.会话连接使用内存高

·每个连接对应的后台线程可以用到1MB的内存,通常是几百KB

·TCP连接的读写缓存是有参数tcp_rmem、tcp_wmem控制

·每个请求都有一个唯一的上下文。可能为请求分配多个临时buffers,这些临时buffers最初是释放给TCMallc缓存,最后被释放给操作系统。很多时候内部不足是由于TCMallc没有及时释放临时buffers

使用db.serverStatus().tcmalloc查看:TCMalloc缓存的大小=pageheap_free_bytes+total_free_byte values

1
> db.serverStatus().tcmalloc

格式话一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> var mem = db.serverStatus().tcmalloc;
> mem.tcmalloc.formattedString
------------------------------------------------
MALLOC:     5222431464 ( 4980.5 MiB) Bytes in use by application
MALLOC: +   1980387328 ( 1888.6 MiB) Bytes in page heap freelist
MALLOC: +     61509992 (   58.7 MiB) Bytes in central cache freelist
MALLOC: +         8960 (    0.0 MiB) Bytes in transfer cache freelist
MALLOC: +     31453360 (   30.0 MiB) Bytes in thread cache freelists
MALLOC: +     34078720 (   32.5 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   7329869824 ( 6990.3 MiB) Actual memory used (physical + swap)
MALLOC: +   3715723264 ( 3543.6 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  11045593088 (10533.9 MiB) Virtual address space used
MALLOC:
MALLOC:          31525              Spans in use
MALLOC:            214              Thread heaps in use
MALLOC:           4096              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

 

3.元数据使用的内存高

 

4.创建索引导致内存使用高

通常复制集从节点使用一个256MB的buffer来做数据重放。在主节点创建索引后,从节点可能需要超过256M的内存来做数据重放。

在MongoDB 4.2之前的版本中,索引是在主节点上后台创建的。创建索引的串行重放可能最多消耗500 MB内存。在MongoDB 4.2及以后的版本中,索引不能在后台创建,辅助节点可以执行并行重放来创建索引。这需要更多内存,在一次创建多个索引时可能会发生实例OOM错误。

 

5.查询计划缓存使用内存高

 

最后:

内存优化的目标不是最小化内存使用。相反,内存优化寻求资源消耗和性能之间的平衡。理想情况下,内存保持充足和稳定,系统性能不受影响。

建议使用以下方法来优化内存使用:

·控制并发连接数。根据性能测试的结果,可以在数据库中创建100个持久连接。默认情况下,MongoDB驱动程序可以与后端建立100个连接池。如果存在大量客户端,则必须为每个客户端减小连接池的大小。我们建议在一个数据库中建立的持久连接不要超过1000个。否则,内存和多线程上下文中的开销可能会增加,并导致请求的处理延迟。

·减少单个请求的内存开销。例如,可以创建索引来减少收集扫描的次数并执行内存排序。

·如果连接数正常,但内存使用持续增加,建议升级内存配置。否则,可能会由于OOM错误和大量缓存清理而导致系统性能急剧下降。

相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2022-06-09 【Docker】docker镜像下载到本地并在其他机器恢复
2022-06-09 【Oracle】Oracle 12C -- Far Sync
点击右上角即可分享
微信分享提示