linux sar性能分析
sar性能分析
1. 开启sar
~$ sar
Cannot open /var/log/sysstat/sa28: No such file or directory Please check if data collecting is enable
解决办法:vim /etc/default/sysstat
文件内容如下:
#
# Default settings for /etc/init.d/sysstat, /etc/cron.d/sysstat
# and /etc/cron.daily/sysstat files
#
# Should sadc collect system activity informations? Valid values
# are "true" and "false". Please do not put other values, they
# will be overwritten by debconf!
ENABLED="false"
将false
改为true
,
然后重启sysstat:sudo service sysstat restart
2. sar应用
2.1 sar应用范围
- 整体CPU使用统计
- 各个CPU使用统计
- 内存使用情况统计
- 整体I/O情况
- 各个I/O设备情况
- 网络统计
2.2 sar命令常用格式
sar [ options ] [ <interval> [ <count> ] ]
其中:
interval: 采样周期,单位是秒;
count:采样次数,默认值是连续采样;
options:命令行选项。
sar命令的选项很多,下面只列出常用选项:
-A:所有报告的总和
-u:输出整体CPU使用情况的统计信息
-v:输出inode、文件和其他内核表的统计信息
-d:输出每一个块设备的活动信息
-r:输出内存和交换空间的统计信息
-b:显示I/O和传送速率的统计信息
-a:文件读写情况
-c:输出进程统计信息,每秒创建的进程数
-R:输出内存页面的统计信息
-y:终端设备活动情况
-w:输出系统交换活动信息
2.3 整体CPU使用统计(-u)
使用-u选项,sar输出整体CPU的使用情况,不加选项时,默认使用的就是-u选项。以下命令显示采样时间为3s,采样次数为2次,整体CPU的使用情况:
sar 3 2 (或者 sar -u 3 2)
输出项说明:
CPU:all表示统计信息为所有CPU的平均值。
%usr:CPU在用户态执行进程的时间百分比。
%nice:CPU在用户态模式下,用于nice操作,所占用CPU总时间的百分比。
%system:CPU处在内核态执行进程的时间百分比。
%iowait:CPU用于等待I/O操作占用CPU总时间的百分比。
%steal:管理程序(hypervisor)为另一个虚拟进程提供服务而等待虚拟CPU的百分比。
%idle:CPU空闲时间百分比。
1. 若 %iowait 的值过高,表示硬盘存在I/O瓶颈
2. 若 %idle 的值高但系统响应慢时,有可能是 CPU 等待分配内存,此时应加大内存容量
3. 若 %idle 的值持续低于1,则系统的 CPU 处理能力相对较低,表明系统中最需要解决的资源是 CPU
2.4 各个CPU使用统计(-P)
“-P ALL”选项指示对每个内核输出统计信息:
其中”CPU”列输出0,1,2,3,4,5指示对应的cpu核。也可针对单独一个内核显示,“-P 1”指示显示第二个内核的统计信息。
2.5 内存使用情况统计(-r)
使用-r选项可显示内存统计信息,以下命令显示以1秒为采样时间,显示2次内存信息:
sar -r 1 2
2.6 整体I/O情况(-b)
使用-b选项,可以显示磁盘I/O的使用情况:Report I/O and transfer rate statistics.
sar -b 3 2
输出项说明:
tps: 每秒向磁盘设备请求数据的次数,包括读、写请求,为rtps与wtps的和。出于效率考虑,每一次IO下发后并不是立即处理请求,而是将请求合并(merge),这里tps指请求合并后的请求计数。
rtps: 每秒向磁盘设备的读请求次数
wtps: 每秒向磁盘设备的写请求次数
bread/s:每秒钟从物理设备读入的数据量,单位为 块/s
bwrtn/s:每秒钟向物理设备写入的数据量,单位为 块/s
2.7 各个I/O设备情况(-d)
使用-d选项可以显示各个磁盘的统计信息,再增加-p选项可以以sdX的形式显示设备名称:
sar -d -p 3 2
输出项说明:
rd_sec/s: 每秒从设备读取的扇区数
wr_sec/s: 每秒往设备写入的扇区数
avgrq-sz: 发送给设备的请求的平均大小(以扇区为单位)
avgqu-sz: 发送给设备的请求队列的平均长度
await :服务等待I/O请求的平均时间,包括请求队列等待时间 (单位毫秒)
svctm :设备处理I/O请求的平均时间,不包括请求队列等待时间 (单位毫秒)
%util :一秒中有百分之多少的时间用于 I/O 操作,即被io消耗的cpu百分比。
备注:
- 如果 %util 接近 100%,说明产生的I/O请求太多,I/O系统已经满负荷,该磁盘可能存在瓶颈。
- 如果 svctm 比较接近 await,说明 I/O 几乎没有等待时间;如果 await 远大于 svctm,说明I/O 队列太长,io响应太慢,则需要进行必要优化。
- 如果avgqu-sz比较大,也表示有当量io在等待。
2.8 网络统计(-n)
使用-n选项可以对网络使用情况进行显示,-n后接关键词”DEV”可显示eth0、eth1等网卡的信息:
sar -n DEV 1 1
以上主要输出含义如下:
IFACE: Name of the network interface for which statistics are reported.
rxpck/s: Total number of packets received per second.
txpck/s: Total number of packets transmitted per second.
rxkB/s: Total number of kilobytes(kB) received per second.
txkB/s: Total number of kilobytes(kB) transmitted per second.
rxcmp/s: Number of compressed packets received per second (for cslip etc.).
txcmp/s: Number of compressed packets transmitted per second.
rxmcst/s: Number of multicast packets received per second.
2.9 sar日志保存(-o)
最后讲一下如何保存sar日志,使用-o选项,我们可以把sar统计信息保存到一个指定的文件,对于保存的日志,我们可以使用-f选项读取:
linux:~ # sar -n DEV 1 10 -o sar.out
linux:~ # sar -d 1 10 -f sar.out //查看历史的IO
linux:~ # sar -u 1 10 -f sar.out //查看历史的cpu,单位1s, 采样10次
相比将结果重定向到一个文件,使用-o选项,可以保存更多的系统资源信息。
1 http://www.thegeekstuff.com/2011/03/sar-examples/
2 http://www.cnblogs.com/bangerlee/articles/2545747.html
3 http://www.cnblogs.com/xianghang123/archive/2011/08/25/2153591.html
4 http://www.chinaz.com/server/2013/0401/297942.shtml
C++性能分析工具gperftools
最近编写的程序遇到了性能瓶颈,CPU吃满的情况下消费能力上不去,导致消息积压迟迟得不到消费,最终被kafka丢弃
于是开始对程序进行调优,提高消费能力,便找到这个工具,找到程序中使用CPU时间较多的操作,并对这些操作进行调整,反复测试和调整,不断迭代得到高性能的程序
这里只使用了gperftools里的cpuprofiler,谷歌这套工具里面也有堆栈的分析工具,那些下次有用到在做分享吧
编译安装
到github/gperftools上获取源码
wget https://github.com/gperftools/gperftools/archive/gperftools-2.7.tar.gz
tar xvf gperftools-2.7.tar.gz
cd gperftools-2.7
./configure
make -j8
sudo make install
如果是64位系统需要依赖到libunwind
cd libunwind-1.3.1
./configure
make && sudo make install
使用CPUprofiler
在编译的时候连接profiler
gcc [...] -0 proc -lprofiler
使用cmake的情况
target_link_libraries (
...
"profiler"
)
主动结束的程序
在代码中引入gperftools/profiler.h,使用ProfilerStart()和ProfilerStop()函数
#include <gperftools/profiler.h>
int main(int argc, const char* argv[]) {
ProfilerStart("test.prof");
...;
ProfilerStop();
}
在执行ProfilerStop()的时候就生成测试报告test.prof了
持续运行的程序
对于持续运行的程序,例如服务器程序和反复的消费程序,内部是死循环无法主动退出的情况下,可以通过设置开关来控制程序生成测试报告
#include <gperftools/profiler.h>
#include <signal.h>
#include <iostream>
void setGperfStatus(int signum) {
static bool is_open = false;
if (signum != SIGUSR1) {
return ;
}
if (!is_open) { // start
is_open = true;
ProfilerStart("test.prof");
std::cout << "ProfilerStart success" << std::endl;
} else { // stop
is_open = false;
ProfilerStop();
std::cout << "ProfilrerStop success" << std::endl;
}
}
int main(int argc, const char* argv[]) {
signal(SIGUSR1, setGperfStatus);
...;
}
在程序运行后,通过kill指令发送信号控制开启关闭性能分析模块
kill -s SIGUSR1 $PID
开启性能测试后运行一段时间,等待采集到的足够多的样例之后,再次发送信号关闭性能分析,生成测试报告
测试报告
性能分析通过抽样方法完成,默认是1秒100个样本,一个样本是10毫秒,即时间单位是10毫秒
可以通过环境变量CPUPROFILE_FREQUENCY设置采样频率
使用gperftools提供的pprof工具转换.prof文件为可阅读的形式,支持多种形式
文本形式
pprof proc test.prof --text > test.txt
信息
每行包含6列数据,依次为:
- 分析样本数量(不包含其他函数调用)
- 分析样本百分比(不包含其他函数调用)
- 目前为止的分析样本百分比(不包含其他函数调用)
- 分析样本数量(包含其他函数调用)
- 分析样本百分比(包含其他函数调用)
- 函数名
样本数量相当于消耗的CPU时间
整个函数消耗的CPU时间相当于包括函数内部其他函数调用所消耗的CPU时间
树形图
pprof proc test.prof --web > test.html
pprof proc test.prof --pdf > test.pdf
节点
每个节点列出的信息:
- 函数名 或者 类名+函数名
- 不包含内部函数调用的样本数(百分比)
- 包含内部函数调用的样本数(百分比),如果没有内部调用函数则这一项数据不显示
有向边
调用者指向被调用者,有向边上的时间表示被调用者所消耗的CPU时间
过滤
可以通过设置focus或者ignore来集中显示、过滤某个函数
pprof proc test.prof --gv --focus=vsnprintf # 只关注某个函数
pprof proc test.prof --gv --ignore=snprintf # 过滤掉某个函数
来源:今天也继续开心涅普涅普
3. 性能测试工具GNU gprof
代码剖析(Code profiling)
程序员在优化软件性能时要注意应尽量优化软件中被频繁调用的部分,这样才能对程序进行有效优化。使用真实的数据,精确的分析应用程序在时间上的花费的行为就成为_代码剖析_。现在几乎所有的开发平台都支持代码剖析,本文要介绍的是linux下针对c/c++的GNU的gprof代码剖析工具。
PS:gprof不只能对c/c++,还可对Pascal和Fortran 77进行代码剖析。
gprof
GNU gprof 是一款linux平台上的程序分析软件(unix也有prof)。借助gprof可以获得C/C++程序运行期间的统计数据,例如每个函数耗费的时间,函数被调用的次数以及各个函数相互之间的调用关系。gprof可以帮助我们找到程序运行的瓶颈,对占据大量CPU时间的函数进行调优。
PS:gprof统计的只是用户态CPU的占用时间,不包括内核态的CPU时间。gprof对I/O瓶颈无能为力,耗时甚久的I/O操作很可能只占据极少的CPU时间。
如何使用gprof
gprof的使用很简单,遵循以下步骤即可:
使用编译标志-pg编译代码。
运行程序生成剖析数据。
运行gprof分析剖析数据,得到可视结果。
下面,我们来演练一下:
test.c:
#include <stdio.h>
void func();
void a() {
printf("Inside a()\n");
int i = 0;
for (; i < 0xffffff; ++i);
func();
return;
}
static void b() {
printf("Inside b()\n");
int i = 0;
for (; i < 0xffffff; ++i);
return;
}
int main() {
printf("Inside main()\n");
int i = 0;
for (; i < 0xfffff; ++i);
a();
b();
return 0;
}
PS: for循环被用来产生执行时间。
func.c:
#include <stdio.h>
void func() {
printf("Inside func\n");
int i = 0;
for (; i < 0xffffff; ++i);
return;
}
Step 1: 使用-pg标识编译上述代码
gcc文档中对-pg的描述:
-pg : Generate extra code to write profile information suitable for the analysis program gprof. You must use this option when compiling the source files you want data about, and you must also use it when linking.
也就是,在编译和链接的时候都要使用-pg标识,所以,一起用吧:
$ gcc -Wall -pg test.c func.c -o test
Step 2: 运行程序
$ ls
func.c makefile test test.c
$ ./test
Inside main()
Inside a()
Inside func
Inside b()
$ ls
func.c gmon.out makefile test test.c
这时,会发现目录下多了一个文件gmon.out,可以用gprof来分析它了。
Step 3: 使用gprof分析工具
gprof可以把gmon.out以人可读的方式解析出来,解析出的内容包括两个表(flat profile和call graph),一个包含函数执行时间,一个包含函数调用过程。
把这两个表重定向到analysis.txt:
$ gprof test gmon.out > analysis.txt
得到analysis.txt:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
33.69 10.42 10.42 1 10.42 10.42 b
33.65 20.82 10.41 1 10.41 20.81 a
33.65 31.23 10.41 1 10.41 10.41 func
0.13 31.27 0.04 main
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Copyright (C) 2012-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.03% of 31.27 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.04 31.23 main [1]
10.41 10.41 1/1 a [2]
10.42 0.00 1/1 b [3]
-----------------------------------------------
10.41 10.41 1/1 main [1]
[2] 66.6 10.41 10.41 1 a [2]
10.41 0.00 1/1 func [4]
-----------------------------------------------
10.42 0.00 1/1 main [1]
[3] 33.3 10.42 0.00 1 b [3]
-----------------------------------------------
10.41 0.00 1/1 a [2]
[4] 33.3 10.41 0.00 1 func [4]
-----------------------------------------------
This table describes the call tree of the program, and was sorted by
the total amount of time spent in each function and its children.
Each entry in this table consists of several lines. The line with the
index number at the left hand margin lists the current function.
The lines above it list the functions that called this function,
and the lines below it list the functions this one called.
This line lists:
index A unique number given to each element of the table.
Index numbers are sorted numerically.
The index number is printed next to every function name so
it is easier to look up where the function is in the table.
% time This is the percentage of the `total' time that was spent
in this function and its children. Note that due to
different viewpoints, functions excluded by options, etc,
these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
called This is the number of times the function was called.
If the function called itself recursively, the number
only includes non-recursive calls, and is followed by
a `+' and the number of recursive calls.
name The name of the current function. The index number is
printed after it. If the function is a member of a
cycle, the cycle number is printed between the
function's name and the index number.
For the function's parents, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the function into this parent.
children This is the amount of time that was propagated from
the function's children into this parent.
called This is the number of times this parent called the
function `/' the total number of times the function
was called. Recursive calls to the function are not
included in the number after the `/'.
name This is the name of the parent. The parent's index
number is printed after it. If the parent is a
member of a cycle, the cycle number is printed between
the name and the index number.
If the parents of the function cannot be determined, the word
`<spontaneous>' is printed in the `name' field, and all the other
fields are blank.
For the function's children, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the child into the function.
children This is the amount of time that was propagated from the
child's children to the function.
called This is the number of times the function called
this child `/' the total number of times the child
was called. Recursive calls by the child are not
listed in the number after the `/'.
name This is the name of the child. The child's index
number is printed after it. If the child is a
member of a cycle, the cycle number is printed
between the name and the index number.
If there are any cycles (circles) in the call graph, there is an
entry for the cycle-as-a-whole. This entry shows who called the
cycle (as parents) and the members of the cycle (as children.)
The `+' recursive calls entry shows the number of function calls that
were internal to the cycle, and the calls entry for each member shows,
for that member, how many times it was called from other members of
the cycle.
Copyright (C) 2012-2014 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Index by function name
[2] a [4] func
[3] b [1] main
使用-a参数屏蔽静态(私有)函数信息:
$ gprof -a test gmon.out > analysis.txt
使用-b参数屏蔽冗余信息:
$ gprof -b test gmon.out > analysis.txt
得到如下信息:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
33.69 10.42 10.42 1 10.42 10.42 b
33.65 20.82 10.41 1 10.41 20.81 a
33.65 31.23 10.41 1 10.41 10.41 func
0.13 31.27 0.04 main
Call graph
granularity: each sample hit covers 2 byte(s) for 0.03% of 31.27 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.04 31.23 main [1]
10.41 10.41 1/1 a [2]
10.42 0.00 1/1 b [3]
-----------------------------------------------
10.41 10.41 1/1 main [1]
[2] 66.6 10.41 10.41 1 a [2]
10.41 0.00 1/1 func [4]
-----------------------------------------------
10.42 0.00 1/1 main [1]
[3] 33.3 10.42 0.00 1 b [3]
-----------------------------------------------
10.41 0.00 1/1 a [2]
[4] 33.3 10.41 0.00 1 func [4]
-----------------------------------------------
Index by function name
[2] a [4] func
[3] b [1] main
使用-p参数只打印flat profile信息:
$ gprof -p test gmon.out > analysis.txt
使用-p(function)参数只打印function函数信息:
只打印函数a()的flat profile信息
$ gprof -pa test gmon.out > analysis.txt
使用-P参数屏蔽flat profile信息:
$ gprof -P test gmon.out > analysis.txt
使用-q参数只打印call graph信息:
$ gprof -q test gmon.out > analysis.txt
使用-q(function)参数只打印function函数的call graph信息:
只打印函数a()的call graph信息
$ gprof -qa test gmon.out > analysis.txt
使用-Q参数屏蔽call graph信息:
$ gprof -Q test gmon.out > analysis.txt
可以组合使用这个参数。
参考:
http://www.cnblogs.com/rocketfan/archive/2009/11/15/1603465.html
http://www.thegeekstuff.com/2012/08/gprof-tutorial/
http://blog.csdn.net/leichelle/article/details/8208530
http://www.ibm.com/developerworks/cn/linux/l-gnuprof.html