linux内存条排查
已发现2个内存错误,应用名称(kernel:),日志内容(hangzhou-jishuan-DDS0248 kernel: sbridge: HANDLING MCE MEMORY ERROR hangzhou-jishuan-DDS0248 kernel: EDAC MC0: CE row 5, channel 0, label CPU_SrcID#0_Channel#3_DIMM#1:1 Unknown
error(s): memory scrubbing on FATAL area : cpu=0 Err=0008:00c1 (ch=1), addr = 0x1c9bea000 = socket=0, Channel=3(mask=8), rank=5)
如何判断是第几条内存?
获取服务器内存信息(此信息可以在报修的时候提供给硬件厂商工程师,记得告诉他们仅供参考)。
shell命令:dmidecode | grep -A 9 -B 6 DIMM | grep Bank
Bank Locator: BRANCH 0 CHANNEL 1 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 1 DIMM 1
Bank Locator: BRANCH 0 CHANNEL 2 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 2 DIMM 1
Bank Locator: BRANCH 0 CHANNEL 3 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 3 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 1 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 1 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 2 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 2 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 3 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 3 DIMM 1
内存顺序是从上向下1-12.根据报错信息CPU_SrcID#0_Channel#3_DIMM#1 : 得到CPU_SrcID 0,CHANNEL 3,DIMM 1。
可以判断为第六条条内存故障,也可以说第一颗cpu控制内存区域,CHANNEL为3,内存id为1。