绑核不均匀问题
最近遇到一个io绑核不均衡问题,现象如下:
top - 10:14:24 up 2 days, 13:42, 13 users, load average: 53.83, 50.37, 48.42
Tasks: 1217 total, 8 running, 1209 sleeping, 0 stopped, 0 zombie
%Cpu0 : 14.7 us, 16.4 sy, 0.0 ni, 2.7 id, 65.6 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu1 : 12.5 us, 12.5 sy, 0.0 ni, 3.4 id, 70.8 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu2 : 11.2 us, 14.2 sy, 0.0 ni, 6.4 id, 67.8 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3 : 11.0 us, 15.1 sy, 0.0 ni, 4.0 id, 69.2 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu4 : 16.1 us, 79.0 sy, 0.0 ni, 0.7 id, 3.0 wa, 0.0 hi, 1.3 si, 0.0 st
%Cpu5 : 12.1 us, 13.1 sy, 0.0 ni, 0.0 id, 74.2 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu6 : 11.3 us, 12.3 sy, 0.0 ni, 4.3 id, 71.3 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu7 : 12.3 us, 12.6 sy, 0.0 ni, 9.3 id, 65.6 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8 : 11.8 us, 10.5 sy, 0.0 ni, 24.3 id, 53.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu9 : 12.0 us, 14.0 sy, 0.0 ni, 30.0 id, 43.3 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu10 : 10.7 us, 13.4 sy, 0.0 ni, 27.5 id, 47.3 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu11 : 10.3 us, 13.6 sy, 0.0 ni, 11.0 id, 64.8 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu12 : 10.3 us, 13.6 sy, 0.0 ni, 8.0 id, 67.8 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu13 : 9.3 us, 13.9 sy, 0.0 ni, 9.6 id, 66.9 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu14 : 9.7 us, 13.8 sy, 0.0 ni, 6.0 id, 69.8 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu15 : 9.1 us, 14.4 sy, 0.0 ni, 8.4 id, 67.4 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu16 : 9.1 us, 12.8 sy, 0.0 ni, 11.7 id, 65.4 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu17 : 10.7 us, 12.7 sy, 0.0 ni, 7.0 id, 69.2 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu18 : 10.1 us, 13.8 sy, 0.0 ni, 5.7 id, 70.1 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu19 : 9.1 us, 13.8 sy, 0.0 ni, 4.7 id, 71.8 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu20 : 3.1 us, 4.1 sy, 0.0 ni, 6.8 id, 85.8 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu21 : 4.7 us, 6.8 sy, 0.0 ni, 12.2 id, 76.4 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 4.1 us, 5.1 sy, 0.0 ni, 14.7 id, 76.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 7.4 us, 9.8 sy, 0.0 ni, 3.0 id, 79.1 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu24 : 7.4 us, 11.8 sy, 0.0 ni, 6.8 id, 73.6 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu25 : 2.4 us, 3.4 sy, 0.0 ni, 7.1 id, 86.9 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu26 : 7.4 us, 10.1 sy, 0.0 ni, 4.7 id, 77.2 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu27 : 3.4 us, 4.0 sy, 0.0 ni, 11.1 id, 81.5 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 10.9 us, 15.2 sy, 0.0 ni, 21.5 id, 51.3 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu29 : 5.7 us, 8.4 sy, 0.0 ni, 19.3 id, 66.2 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu30 : 4.1 us, 5.1 sy, 0.0 ni, 23.6 id, 66.9 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu31 : 3.7 us, 8.4 sy, 0.0 ni, 31.4 id, 56.1 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu32 : 6.4 us, 10.0 sy, 0.0 ni, 82.9 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu33 : 2.0 us, 3.7 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu34 : 6.0 us, 9.3 sy, 0.0 ni, 84.1 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu35 : 4.4 us, 5.1 sy, 0.0 ni, 89.9 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu36 : 13.2 us, 12.8 sy, 0.0 ni, 73.6 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
我奇怪地发现,只有前面的cpu有io,剩余几个没有。
查看线程绑核,发现
cpunum = (unsigned int)GetCpuNum();
if(0 !=cpunum)){
cpu_affi=1<<cpunum;
set_ret=SetAffinity(cpu_affi);
}
按道理应该没问题,由于没有gdb调试,后来经其他同事查看,发现cpu_affi设置的是
unsigned int cpu_affi=0;
这就意味着撑死了也只能绑到前面32个核上,后面的核是绑不上的。