linux numastat的理解
numa的统计数据及理解如下,
[root@localhost kernel]# numastat
node0 node1
numa_hit 26668467593 28643793617
numa_miss 49206566 19035412
numa_foreign 19035412 49206544
interleave_hit 63894 63259
local_node 26668451458 19175681813
other_node 49222701 9487147404
[root@localhost kernel]# expr 28643793617 + 19035412
28662829029
[root@localhost kernel]# expr 19175681813 + 9487147404
28662829217------------------------node1的numa_hit + numa_miss ,与 local_node + other_node 并不相等。
[root@localhost kernel]# expr 26668451458 + 49222701
26717674159
[root@localhost kernel]# expr 26668467593 + 49206566
26717674159------------------------node0的numa_hit + numa_miss ,与 local_node + other_node 相等。
简单地看,由于该设备是两个cpu,每个cpu若干个核,从访问路径来看,应该只分配两个node就ok。
由于只有两个node,那么node0的 numa_miss 和node1的numa_foreign 应该相等。
对于node0来说,numa_hit + numa_miss 的值,是和 local_node + other_node 相等的,但是node1的numa_hit + numa_miss ,与 local_node + other_node 并不相等,按道理也应该相等。
内核中针对这个统计:
enum zone_stat_item {
#ifdef CONFIG_NUMA
NUMA_HIT, /* allocated in intended node */
NUMA_MISS, /* allocated in non intended node */
NUMA_FOREIGN, /* was intended here, hit elsewhere */
NUMA_INTERLEAVE_HIT, /* interleaver preferred this zone */
NUMA_LOCAL, /* allocation from local node */
NUMA_OTHER, /* allocation from other node */
#endif
}
查看代码,想到这个统计毕竟是快速变化的值,误差范围内应该没有多少问题。
当然也有可能相差很小,因为毕竟跟访问的时间点有关系,如果看见不相等,可以多敲几遍numastat。
比如我过一会再敲就相等了,如下:
[root@localhost kernel]# numastat
node0 node1
numa_hit 27490751188 29654323053
numa_miss 52691771 19585046
numa_foreign 19585046 52691771
interleave_hit 63894 63259
local_node 27490734704 19826774263
other_node 52708255 9847133836
[root@localhost kernel]# expr 27490734704 + 52708255
27543442959
[root@localhost kernel]# expr 27490751188 + 52691771
27543442959
[root@localhost kernel]# expr 29654323053 + 19585046
29673908099
[root@localhost kernel]# expr 19826774263 + 9847133836
29673908099
可能有人会问,看数据,hit和local怎么相差这么少,一开始我也很迷惑,后来仔细看,
查看numastat的manpage。
numa_hit is memory successfully allocated on this node as intended.
numa_miss is memory allocated on this node despite the process preferring some different node. Each numa_miss has a numa_foreign on another node.
numa_foreign is memory intended for this node, but actually allocated on some different node. Each numa_foreign has a numa_miss on another node.
interleave_hit is interleaved memory successfully allocated on this node as intended.
local_node is memory allocated on this node while a process was running on it.
other_node is memory allocated on this node while a process was running on some other node.
hit是我本来想在这个node分配,然后刚好在这个node分配的次数,而local是,我本来进程就在该node对应的cpu上运行,当我要分配内存的时候,就在该节点分配成功了,看起来比较绕,。
举个栗子,当我分配内存的时候,我指定我要从node0上分配,并且分配成功了,这时候hit 要加1,如果我这时候进程在node0上运行,则我的local +1,如果我进程在node1上运行,则我的
other_node +1。