随笔-man-linux load|ssar

linux load 计算过程

load计算过程,遍历每个cpu,累加每个cpu上的nr_active:

The global load average is an exponentially decaying average of nr_running + nr_uninterruptible.

Once every LOAD_FREQ(5 秒):

  nr_active = 0;
  for cpu in cpus:
      nr_active += cpu->nr_running + cpu->nr_uninterruptible;

  avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)

关于exp_n:首先,你可以当成没有这个东西,那么avenrun[n] = nr_active,但计算load计算是每5001ms计算一次,nr_active是此刻的nr_active,所以做一下加权平均是比较合理的,比较简单的做法就是一半一半,但是更合理的做法是指数移动加权平均法,是指各数值的加权系数随时间呈指数式递减,越靠近当前时刻的数值加权系数就越大,更能反映近期变化的趋势

ssar 架构

https://gitee.com/anolis/ssar

后台运行sressar常驻服务,将数据保存到/var/log/sre_proc/data/下

ssar 使用示例

date +%FT%T; sleep 5s; ./stress-ng -t 25 --mutex 1 
$ssar load5s -b 2024-08-25T23:29:39 -r 1

collect_datetime       threads    load1   runq load5s stype sstate zstate    act act_rto   actr actr_rto   actd 
2024-08-25T23:29:43        314     1.15      1      1 5s    N      U           -       -      -        -      - 
2024-08-25T23:29:48        317     1.22      3      2 5s    N      U           -       -      -        -      - 
2024-08-25T23:29:53        317     1.28      3      2 5s    N      U           -       -      -        -      - 
2024-08-25T23:29:58        317     1.34      3      2 5s    N      U           -       -      -        -      - 
2024-08-25T23:30:03        317     1.47      1      3 5s    N      U           -       -      -        -      - 

ssar load2p -c使用问题: ReadLoadrdFileData failed. Make sure the param -c is correct, act field is not -.

root@192.168.99.124:~ $ssar load2p -c 2024-08-25T23:30:00
ReadLoadrdFileData failed. Make sure the param -c <collect time> is correct, act field is not -.

解决方式:gdb --arg ssar load2p -c 2024-08-25T23:30:00 断点ReadLoadrdFileData 配合源码,打印it_path

reakpoint 1, ReadLoadrdFileData (seq_option=..., it_list_load2p_t=empty std::__cxx11::list) at ssar.cpp:2283
2283    ssar.cpp: No such file or directory.
Missing separate debuginfos, use: dnf debuginfo-install libgcc-10.3.1-10.oe2203.x86_64 libstdc++-10.3.1-10.oe2203.x86_64 zlib-1.2.11-24.oe2203.x86_64
(gdb) n
2284    in ssar.cpp
(gdb) p it_path
$1 = "/var/log/sre_proc/data/2024082523/20240825233000_loadrd"

然后看下有哪些文件是_loadrd结尾的,改成对应的时间即可:

root@192.168.99.124:~ $ll /var/log/sre_proc/data/2024082523/*_loadrd
-rw-r--r--    1 root     root            50 Aug 25 23:47 /var/log/sre_proc/data/2024082523/20240825234739_loadrd

ssar编译安装

点击查看代码
$ make 
make -C conf
make[1]: Entering directory '/home/aim/aim/ssar/conf'
gzip -c ssar.1       > ssar.1.gz 
gzip -c zh_CN.ssar.1 > zh_CN.ssar.1.gz
make[1]: Leaving directory '/home/aim/aim/ssar/conf'
make -C ssar
make[1]: Entering directory '/home/aim/aim/ssar/ssar'
g++ -g -std=c++11 -rdynamic -DCPPTOML_USE_MAP ssar.cpp -o ssar -lz
...
make[1]: Leaving directory '/home/aim/aim/ssar/ssar'
make -C sresar
make[1]: Entering directory '/home/aim/aim/ssar/sresar'
gcc -g -std=gnu99 -rdynamic -c toml.c        -o toml.o
gcc -g -std=gnu99 -rdynamic -c utils.c       -o utils.o
gcc -g -std=gnu99 -rdynamic -c collection.c  -o collection.o
gcc -g -std=gnu99 -rdynamic -c readprocess.c -o readprocess.o
gcc -g -std=gnu99 -rdynamic -c sresar.c      -o sresar.o
gcc -g -std=gnu99 -rdynamic toml.o utils.o collection.o readprocess.o sresar.o -o sresar -lpthread -lm -lz
make[1]: Leaving directory '/home/aim/aim/ssar/sresar'
root@192.168.99.124:/home/ssar $make V=1 install
install -d                           /etc/ssar/
install conf/ssar.conf               /etc/ssar/
install conf/sys.conf                /etc/ssar/
install -d                           /usr/src/os_health/ssar/
install conf/sresar.service          /usr/src/os_health/ssar/
install -d                           /usr/bin/
install ssar/ssar                    /usr/bin/ssar
install ssar/ssar+.py                /usr/bin/ssar+
install ssar/tsar2.py                /usr/bin/tsar2
install sresar/sresar                /usr/bin/sresar
install -d                           /run/lock/os_health/
touch                                /run/lock/os_health/sresar.pid
cp -f /usr/src/os_health/ssar/sresar.service /etc/systemd/system/sresar.service
chown root:root /etc/systemd/system/sresar.service
systemctl daemon-reload
if [ systemctl is-enabled sresar.service ]; then \
    systemctl disable sresar.service;  \
fi
/bin/sh: line 1: [: is-enabled: binary operator expected
systemctl enable sresar.service
Created symlink /etc/systemd/system/multi-user.target.wants/sresar.service → /etc/systemd/system/sresar.service.
if systemctl is-active sresar.service; then \
    systemctl stop sresar.service; \
fi
inactive
systemctl start sresar.service

linux load并不是一个很好能衡量问题的严重程度及进一步定位问题的指标,可以用sched latency

参考: Linux Load Average:算法、实现与实用指南(2023)

参考

posted @ 2024-08-26 00:51  LiYanbin  阅读(14)  评论(0编辑  收藏  举报