CPU利用率和CPU负荷(CPU usage vs CPU load)

对于CPU的性能监测，通常用top指令能显示出两个指标：cpu 利用率和cpu负荷。

其中%Cpu相关的内容：

us表示用户进程cpu利用率，sy表示系统内核进程cpu利用率，ni表示运行正常进程消耗的 CPU 时间百分比，id表示idle time，

wa表示IO waiting time，hi表示硬中断（Hardware IRQ）占用CPU的百分比；

si表示软中断（Software Interrupts）占用CPU的百分比；

st表示steal time：在内存紧张环境下，pagein 强制对不同的页面进行的 steal 操作。虚拟服务占用的CPU时间百分比。

(%steal 不为 0 说明，当前 OS 是在虚拟机调度器的管理下运行的，且存在其它 OS 也被虚拟机调度器管理。)

其中load average有3个值，分别记录了当前1min，5min,15min的系统平均负载。

用uptime指令也能显示这3个值：

root@Ubuntu01:~# uptime
02:55:15 up 43 min, 1 user, load average: 0.09, 0.25, 0.13

CPU usage：

cpu usage或cpu utilization即 cpu 利用率，就是程序对CPU时间片的占用情况。参见https://en.wikipedia.org/wiki/CPU_time。

cpu 利用率是基于 /proc/stat 文件中的内容得到的：

详细说明见参考文档。

=> 进程cpu使用率:
基于 /proc/<pid>/stat 文件计算
进程的总Cpu 时间计算公式（该值包括其所有线程的 cpu 时间）
processCpuTime = utime + stime + cutime + cstime

=> 线程的cpu使用率:
基于 /proc/<pid>/task/<tid>/stat 文件计算
线程Cpu 时间计算公式为
threadCpuTime = utime + stime

CPU load：

load average 表示的是CPU的负载，包含的信息不是CPU的使用率状况，而是在一段时间内CPU正在处理以及等待CPU处理的进程数之和的统计信息，也就是CPU使用队列的长度的统计信息。这个数字越小越好。参见https://en.wikipedia.org/wiki/Load_%28computing%29的解释：

that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization (CPU usage). The reason CPU queue length did better is probably because when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU.

如果load average值长期大于系统CPU的个数则说明CPU很繁忙，负载很高，可能会影响系统性能，导致系统卡顿响应时间长等等。

一般能够被接受的值是 load average <= CPU核数 *0.7。

cpu load 是从 /proc/loadavg 中读取的；

root@Ubuntu01:~# cat /proc/loadavg
0.00 0.00 0.00 1/272 20911

相关指令：

除了上面提及的top和uptime指令，还有这些：

1) 显示cpu信息：

root@Ubuntu01:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s):           1
NUMA node(s):        1

root@Ubuntu01:~# cat /proc/cpuinfo

root@Ubuntu01:~# grep 'model name' /proc/cpuinfo | wc -l （获取cpu数量）
1

2) vmstat：
#vmstat 10 5 （表示10秒钟内取样5次）

其中cpu相关的内容：

us表示用户进程cpu利用率，sy表示系统内核进程cpu利用率，id表示idle time，wa表示IO waiting time，st表示steal time（在内存紧张环境下，pagein 强制对不同的页面进行的 steal 操作。虚拟服务占用的CPU时间百分比）。

Note: system中cs表示上下文切换Context Switch。

3) mpstat:

%guest：Percentage of time spent by the CPU or CPUs to run a virtual processor.

%gnice：Percentage of time spent by the CPU or CPUs to run a niced guest.

==> totalCpuTime = user + nice + system + idle + iowait + irq + softirq + steal + guest + guest_nice
在纯粹的物理机上（即其上未跑其它 guest OS ，自身也未作为 guest OS 被虚拟机调度器管理），steal/guest/guest_nice 值应该都为 0 ；除此之外，上述值就应该不为 0 .

4) sar监控CPU:

# sar -u 6 3 （表示6秒钟内取样3次）