随笔- 122 文章- 0 评论- 1 阅读- 75681

perf脚本示例

总结

1. 为了避免栈帧被折叠，可使用--call-graph dwarf，这个会使得perf.data文件变大20倍，perf script解析也会慢一些。

2. 使用--call-graph fp录制的话，需要开启-fno-omit-frame-pointor，且某些函数还是没有效果，详见：https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

3. 在intel CPU上可以考虑使用 --call-graph lbr，性能最高，但栈深度有限制，详见：https://gaomf.cn/2019/10/30/perf_stack_traceback

4. 也可通过perf report直接看热点调用链或被调用链，参考：https://docs.ceph.com/en/latest/dev/perf/

sudo perf report --call-graph caller/callee，调用栈显示为调用链或被调用链。

5. 使用-s pid，按线程查看热点：sudo perf report --show-cpu-utilization -s pid

6. 按线程显示火焰图：./FlameGraph/stackcollapse-perf.pl --tid

n. 如果火焰图的毛刺（头发）过多，说明很可能经常被中断打断，可以通过--reverse --inverted，看倒置的栈合并图，确定中断的热点。

问题

1. 能否按线程显示火焰图？

2. off-cpu火焰图、Wakeups火焰图、Chain Graphs

3. Memory火焰图

4. I/O 火焰图

5. Stall Cycles

6. CPI

refs：https://zhuanlan.zhihu.com/p/73385693

#!/bin/bash -x
# perf -v 根据提示安装perf
# 获取FlameGraph脚本
# git clone https://ghproxy.com/https://github.com/brendangregg/FlameGraph.git

PERF_DIR=`date +%Y-%m-%d-%H.%M.%S`
PERF_BASE_NAME=${PERF_DIR}/perf.data

PERF_TARGET=`pgrep dp_`
PERF_TIME=20


function print_ret()
{
    if [[ $1 != 0 ]]; then
        echo -e '\033[32;44;1m'$2" launch failed!"'\033[0m'
        exit $1
    else
        echo -e $2" launch successfully!"
    fi
}

mkdir -p ${PERF_DIR}

:<<BLOCK
#        -g
           Enables call-graph (stack chain/backtrace) recording.

       --call-graph
           Setup and enable call-graph (stack chain/backtrace) recording, implies -g. Default is "fp".

               Allows specifying "fp" (frame pointer) or "dwarf"
               (DWARF's CFI - Call Frame Information) or "lbr"
               (Hardware Last Branch Record facility) as the method to collect
               the information used to show the call graphs.

               In some systems, where binaries are build with gcc
               --fomit-frame-pointer, using the "fp" method will produce bogus
               call graphs, using "dwarf", if available (perf tools linked to
               the libunwind or libdw library) should be used instead.
               Using the "lbr" method doesn't require any compiler options. It
               will produce call graphs from the hardware LBR registers. The
               main limitation is that it is only available on new Intel
               platforms, such as Haswell. It can only get user call chain. It
               doesn't work with branch stack sampling at the same time.

               When "dwarf" recording is used, perf also records (user) stack dump
               when sampled.  Default size of the stack dump is 8192 (bytes).
               User can change the size by passing the size after comma like
               "--call-graph dwarf,4096".
BLOCK
# -g = --call-graph fp
# --call-graph lbr
# --call-graph dwarf 和gdb使用的信息是一致的
# -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace
sudo perf record -F 999 --call-graph dwarf -p ${PERF_TARGET} -o ${PERF_BASE_NAME} -- sleep ${PERF_TIME} 
print_ret $? "perf record"

sudo perf script -i ${PERF_BASE_NAME} >${PERF_BASE_NAME}.unfolded
print_ret $? "perf script"

./FlameGraph/stackcollapse-perf.pl --tid ${PERF_BASE_NAME}.unfolded > ${PERF_BASE_NAME}.folded
print_ret $? "./FlameGraph/stackcollapse-perf.pl"

# --reverse --inverted
./FlameGraph/flamegraph.pl ${PERF_BASE_NAME}.folded > ${PERF_BASE_NAME}.svg
print_ret $? "./FlameGraph/flamegraph.pl"

exit 0