bcc -execsnoop 性能---未完
目前使用到的bcc程序主要包括两个部分,一部分是python语言,一部分是c语言。python部分主要做的工作是BPF程序的加载和操作BPF程序的map,并进行数据处理。c部分会被llvm编译器编译为BPF字节码,经过BPF验证器验证安全后,加载到内核中执行。python和c中出现的陌生函数可以查下面这两个手册
python 等函数:Python链接
c等函数:链接
bcc 安装:bcc_install
bcc program book: book url
https://www.ebpf.top/post/bpf_learn_path/
什么是 bcc
- Bcc 的开源项目:https://github.com/iovisor/bcc
- eBPF 虚拟机使用的是类似于汇编语言的指令,对于程序编写来说直接使用难度非常大。bcc 提供了一个名为 bcc 的 python 库,简化了 eBPF 应用的开发过程
- Bcc 收集了大量现成的 eBPF 程序可以直接拿来使用,可以通过以下工具分布图感受一下
https://github.com/brendangregg/perf-tools/blob/master/execsnoop
其execsnoop 代码实现如下:
#!/bin/bash # # execsnoop - trace process exec() with arguments. # Written using Linux ftrace. # # This shows the execution of new processes, especially short-lived ones that # can be missed by sampling tools such as top(1). # # USAGE: ./execsnoop [-hrt] [-n name] # # REQUIREMENTS: FTRACE and KPROBE CONFIG, sched:sched_process_fork tracepoint, # and either the sys_execve, stub_execve or do_execve kernel function. You may # already have these on recent kernels. And awk. # # This traces exec() from the fork()->exec() sequence, which means it won't # catch new processes that only fork(). With the -r option, it will also catch # processes that re-exec. It makes a best-effort attempt to retrieve the program # arguments and PPID; if these are unavailable, 0 and "[?]" are printed # respectively. There is also a limit to the number of arguments printed (by # default, 8), which can be increased using -a. # # This implementation is designed to work on older kernel versions, and without # kernel debuginfo. It works by dynamic tracing an execve kernel function to # read the arguments from the %si register. The sys_execve function is tried # first, then stub_execve and do_execve. The sched:sched_process_fork # tracepoint is used to get the PPID. This program is a workaround that should be # improved in the future when other kernel capabilities are made available. If # you need a more reliable tool now, then consider other tracing alternatives # (eg, SystemTap). This tool is really a proof of concept to see what ftrace can # currently do. # # From perf-tools: https://github.com/brendangregg/perf-tools # # See the execsnoop(8) man page (in perf-tools) for more info. # # COPYRIGHT: Copyright (c) 2014 Brendan Gregg. # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # # (http://www.gnu.org/copyleft/gpl.html) # # 07-Jul-2014 Brendan Gregg Created this. ### default variables tracing=/sys/kernel/debug/tracing flock=/var/tmp/.ftrace-lock; wroteflock=0 opt_duration=0; duration=; opt_name=0; name=; opt_time=0; opt_reexec=0 opt_argc=0; argc=8; max_argc=16; ftext= trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section function usage { cat <<-END >&2 USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name] -d seconds # trace duration, and use buffers -a argc # max args to show (default 8) -r # include re-execs -t # include time (seconds) -h # this usage message name # process name to match (REs allowed) eg, execsnoop # watch exec()s live (unbuffered) execsnoop -d 1 # trace 1 sec (buffered) execsnoop grep # trace process names containing grep execsnoop 'udevd$' # process names ending in "udevd" See the man page and example file for more info. END exit } function warn { if ! eval "$@"; then echo >&2 "WARNING: command failed \"$@\"" fi } function end { # disable tracing echo 2>/dev/null echo "Ending tracing..." 2>/dev/null cd $tracing warn "echo 0 > events/kprobes/$kname/enable" warn "echo 0 > events/sched/sched_process_fork/enable" warn "echo -:$kname >> kprobe_events" warn "echo > trace" (( wroteflock )) && warn "rm $flock" } function die { echo >&2 "$@" exit 1 } function edie { # die with a quiet end() echo >&2 "$@" exec >/dev/null 2>&1 end exit 1 } ### process options while getopts a:d:hrt opt do case $opt in a) opt_argc=1; argc=$OPTARG ;; d) opt_duration=1; duration=$OPTARG ;; r) opt_reexec=1 ;; t) opt_time=1 ;; h|?) usage ;; esac done shift $(( $OPTIND - 1 )) if (( $# )); then opt_name=1 name=$1 shift fi (( $# )) && usage ### option logic (( opt_pid && opt_name )) && die "ERROR: use either -p or -n." (( opt_pid )) && ftext=" issued by PID $pid" (( opt_name )) && ftext=" issued by process name \"$name\"" (( opt_file )) && ftext="$ftext for filenames containing \"$file\"" (( opt_argc && argc > max_argc )) && die "ERROR: max -a argc is $max_argc." if (( opt_duration )); then echo "Tracing exec()s$ftext for $duration seconds (buffered)..." else echo "Tracing exec()s$ftext. Ctrl-C to end." fi ### select awk if (( opt_duration )); then [[ -x /usr/bin/mawk ]] && awk=mawk || awk=awk else # workarounds for mawk/gawk fflush behavior if [[ -x /usr/bin/gawk ]]; then awk=gawk elif [[ -x /usr/bin/mawk ]]; then awk="mawk -W interactive" else awk=awk fi fi ### check permissions cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE? debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)" ### ftrace lock [[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock" echo $$ > $flock || die "ERROR: unable to write $flock." wroteflock=1 ### build probe if [[ -x /usr/bin/getconf ]]; then bits=$(getconf LONG_BIT) else bits=64 [[ $(uname -m) == i* ]] && bits=32 fi (( offset = bits / 8 )) function makeprobe { func=$1 kname=execsnoop_$func kprobe="p:$kname $func" i=0 while (( i < argc + 1 )); do # p:kname do_execve +0(+0(%si)):string +0(+8(%si)):string ... kprobe="$kprobe +0(+$(( i * offset ))(%si)):string" (( i++ )) done } # try in this order: sys_execve, stub_execve, do_execve makeprobe sys_execve ### setup and begin tracing echo nop > current_tracer if ! echo $kprobe >> kprobe_events 2>/dev/null; then makeprobe stub_execve if ! echo $kprobe >> kprobe_events 2>/dev/null; then makeprobe do_execve if ! echo $kprobe >> kprobe_events 2>/dev/null; then edie "ERROR: adding a kprobe for execve. Exiting." fi fi fi if ! echo 1 > events/kprobes/$kname/enable; then edie "ERROR: enabling kprobe for execve. Exiting." fi if ! echo 1 > events/sched/sched_process_fork/enable; then edie "ERROR: enabling sched:sched_process_fork tracepoint. Exiting." fi echo "Instrumenting $func" (( opt_time )) && printf "%-16s " "TIMEs" printf "%6s %6s %s\n" "PID" "PPID" "ARGS" # # Determine output format. It may be one of the following (newest first): # TASK-PID CPU# |||| TIMESTAMP FUNCTION # TASK-PID CPU# TIMESTAMP FUNCTION # To differentiate between them, the number of header fields is counted, # and an offset set, to skip the extra column when needed. # offset=$($awk 'BEGIN { o = 0; } $1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; } $2 ~ /TASK/ { print o; exit }' trace) ### print trace buffer warn "echo > trace" ( if (( opt_duration )); then # wait then dump buffer sleep $duration cat -v trace else # print buffer live cat -v trace_pipe fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \ -v opt_duration=$opt_duration -v opt_time=$opt_time -v kname=$kname \ -v opt_reexec=$opt_reexec ' # common fields $1 != "#" { # task name can contain dashes comm = pid = $1 sub(/-[0-9][0-9]*/, "", comm) sub(/.*-/, "", pid) } $1 != "#" && $(4+o) ~ /sched_process_fork/ { cpid=$0 sub(/.* child_pid=/, "", cpid) sub(/ .*/, "", cpid) getppid[cpid] = pid delete seen[pid] } $1 != "#" && $(4+o) ~ kname { if (seen[pid]) next if (opt_name && comm !~ name) next # # examples: # ... arg1="/bin/echo" arg2="1" arg3="2" arg4="3" ... # ... arg1="sleep" arg2="2" arg3=(fault) arg4="" ... # ... arg1="" arg2=(fault) arg3="" arg4="" ... # the last example is uncommon, and may be a race. # if ($0 ~ /arg1=""/) { args = comm " [?]" } else { args=$0 sub(/ arg[0-9]*=\(fault\).*/, "", args) sub(/.*arg1="/, "", args) gsub(/" arg[0-9]*="/, " ", args) sub(/"$/, "", args) if ($0 !~ /\(fault\)/) args = args " [...]" } if (opt_time) { time = $(3+o); sub(":", "", time) printf "%-16s ", time } printf "%6s %6d %s\n", pid, getppid[pid], args if (!opt_duration) fflush() if (!opt_reexec) { seen[pid] = 1 delete getppid[pid] } } $0 ~ /LOST.*EVENT[S]/ { print "WARNING: " $0 > "/dev/stderr" } ' ### end tracing end
python 版本依赖于bcc bpf 如下:

#!/usr/bin/python # @lint-avoid-python-3-compatibility-imports # # execsnoop Trace new processes via exec() syscalls. # For Linux, uses BCC, eBPF. Embedded C. # # USAGE: execsnoop [-h] [-T] [-t] [-x] [-q] [-n NAME] [-l LINE] # [--max-args MAX_ARGS] # # This currently will print up to a maximum of 19 arguments, plus the process # name, so 20 fields in total (MAXARG). # # This won't catch all new processes: an application may fork() but not exec(). # # Copyright 2016 Netflix, Inc. # Licensed under the Apache License, Version 2.0 (the "License") # # 07-Feb-2016 Brendan Gregg Created this. from __future__ import print_function from bcc import BPF from bcc.utils import ArgString, printb import bcc.utils as utils import argparse import re import time import pwd from collections import defaultdict from time import strftime def parse_uid(user): try: result = int(user) except ValueError: try: user_info = pwd.getpwnam(user) except KeyError: raise argparse.ArgumentTypeError( "{0!r} is not valid UID or user entry".format(user)) else: return user_info.pw_uid else: # Maybe validate if UID < 0 ? return result # arguments examples = """examples: ./execsnoop # trace all exec() syscalls ./execsnoop -x # include failed exec()s ./execsnoop -T # include time (HH:MM:SS) ./execsnoop -U # include UID ./execsnoop -u 1000 # only trace UID 1000 ./execsnoop -u user # get user UID and trace only them ./execsnoop -t # include timestamps ./execsnoop -q # add "quotemarks" around arguments ./execsnoop -n main # only print command lines containing "main" ./execsnoop -l tpkg # only print command where arguments contains "tpkg" ./execsnoop --cgroupmap ./mappath # only trace cgroups in this BPF map """ parser = argparse.ArgumentParser( description="Trace exec() syscalls", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=examples) parser.add_argument("-T", "--time", action="store_true", help="include time column on output (HH:MM:SS)") parser.add_argument("-t", "--timestamp", action="store_true", help="include timestamp on output") parser.add_argument("-x", "--fails", action="store_true", help="include failed exec()s") parser.add_argument("--cgroupmap", help="trace cgroups in this BPF map only") parser.add_argument("-u", "--uid", type=parse_uid, metavar='USER', help="trace this UID only") parser.add_argument("-q", "--quote", action="store_true", help="Add quotemarks (\") around arguments." ) parser.add_argument("-n", "--name", type=ArgString, help="only print commands matching this name (regex), any arg") parser.add_argument("-l", "--line", type=ArgString, help="only print commands where arg contains this line (regex)") parser.add_argument("-U", "--print-uid", action="store_true", help="print UID column") parser.add_argument("--max-args", default="20", help="maximum number of arguments parsed and displayed, defaults to 20") parser.add_argument("--ebpf", action="store_true", help=argparse.SUPPRESS) args = parser.parse_args() # define BPF program bpf_text = """ #include <uapi/linux/ptrace.h> #include <linux/sched.h> #include <linux/fs.h> #define ARGSIZE 128 enum event_type { EVENT_ARG, EVENT_RET, }; struct data_t { u32 pid; // PID as in the userspace term (i.e. task->tgid in kernel) u32 ppid; // Parent PID as in the userspace term (i.e task->real_parent->tgid in kernel) u32 uid; char comm[TASK_COMM_LEN]; enum event_type type; char argv[ARGSIZE]; int retval; }; #if CGROUPSET BPF_TABLE_PINNED("hash", u64, u64, cgroupset, 1024, "CGROUPPATH"); #endif BPF_PERF_OUTPUT(events); static int __submit_arg(struct pt_regs *ctx, void *ptr, struct data_t *data) { bpf_probe_read(data->argv, sizeof(data->argv), ptr); events.perf_submit(ctx, data, sizeof(struct data_t)); return 1; } static int submit_arg(struct pt_regs *ctx, void *ptr, struct data_t *data) { const char *argp = NULL; bpf_probe_read(&argp, sizeof(argp), ptr); if (argp) { return __submit_arg(ctx, (void *)(argp), data); } return 0; } int syscall__execve(struct pt_regs *ctx, const char __user *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) { u32 uid = bpf_get_current_uid_gid() & 0xffffffff; UID_FILTER #if CGROUPSET u64 cgroupid = bpf_get_current_cgroup_id(); if (cgroupset.lookup(&cgroupid) == NULL) { return 0; } #endif // create data here and pass to submit_arg to save stack space (#555) struct data_t data = {}; struct task_struct *task; data.pid = bpf_get_current_pid_tgid() >> 32; task = (struct task_struct *)bpf_get_current_task(); // Some kernels, like Ubuntu 4.13.0-generic, return 0 // as the real_parent->tgid. // We use the get_ppid function as a fallback in those cases. (#1883) data.ppid = task->real_parent->tgid; bpf_get_current_comm(&data.comm, sizeof(data.comm)); data.type = EVENT_ARG; __submit_arg(ctx, (void *)filename, &data); // skip first arg, as we submitted filename #pragma unroll for (int i = 1; i < MAXARG; i++) { if (submit_arg(ctx, (void *)&__argv[i], &data) == 0) goto out; } // handle truncated argument list char ellipsis[] = "..."; __submit_arg(ctx, (void *)ellipsis, &data); out: return 0; } int do_ret_sys_execve(struct pt_regs *ctx) { #if CGROUPSET u64 cgroupid = bpf_get_current_cgroup_id(); if (cgroupset.lookup(&cgroupid) == NULL) { return 0; } #endif struct data_t data = {}; struct task_struct *task; u32 uid = bpf_get_current_uid_gid() & 0xffffffff; UID_FILTER data.pid = bpf_get_current_pid_tgid() >> 32; data.uid = uid; task = (struct task_struct *)bpf_get_current_task(); // Some kernels, like Ubuntu 4.13.0-generic, return 0 // as the real_parent->tgid. // We use the get_ppid function as a fallback in those cases. (#1883) data.ppid = task->real_parent->tgid; bpf_get_current_comm(&data.comm, sizeof(data.comm)); data.type = EVENT_RET; data.retval = PT_REGS_RC(ctx); events.perf_submit(ctx, &data, sizeof(data)); return 0; } """ bpf_text = bpf_text.replace("MAXARG", args.max_args) if args.uid: bpf_text = bpf_text.replace('UID_FILTER', 'if (uid != %s) { return 0; }' % args.uid) else: bpf_text = bpf_text.replace('UID_FILTER', '') if args.cgroupmap: bpf_text = bpf_text.replace('CGROUPSET', '1') bpf_text = bpf_text.replace('CGROUPPATH', args.cgroupmap) else: bpf_text = bpf_text.replace('CGROUPSET', '0') if args.ebpf: print(bpf_text) exit() # initialize BPF b = BPF(text=bpf_text) execve_fnname = b.get_syscall_fnname("execve") b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve") b.attach_kretprobe(event=execve_fnname, fn_name="do_ret_sys_execve") # header if args.time: print("%-9s" % ("TIME"), end="") if args.timestamp: print("%-8s" % ("TIME(s)"), end="") if args.print_uid: print("%-6s" % ("UID"), end="") print("%-16s %-6s %-6s %3s %s" % ("PCOMM", "PID", "PPID", "RET", "ARGS")) class EventType(object): EVENT_ARG = 0 EVENT_RET = 1 start_ts = time.time() argv = defaultdict(list) # This is best-effort PPID matching. Short-lived processes may exit # before we get a chance to read the PPID. # This is a fallback for when fetching the PPID from task->real_parent->tgip # returns 0, which happens in some kernel versions. def get_ppid(pid): try: with open("/proc/%d/status" % pid) as status: for line in status: if line.startswith("PPid:"): return int(line.split()[1]) except IOError: pass return 0 # process event def print_event(cpu, data, size): event = b["events"].event(data) skip = False if event.type == EventType.EVENT_ARG: argv[event.pid].append(event.argv) elif event.type == EventType.EVENT_RET: if event.retval != 0 and not args.fails: skip = True if args.name and not re.search(bytes(args.name), event.comm): skip = True if args.line and not re.search(bytes(args.line), b' '.join(argv[event.pid])): skip = True if args.quote: argv[event.pid] = [ b"\"" + arg.replace(b"\"", b"\\\"") + b"\"" for arg in argv[event.pid] ] if not skip: if args.time: printb(b"%-9s" % strftime("%H:%M:%S").encode('ascii'), nl="") if args.timestamp: printb(b"%-8.3f" % (time.time() - start_ts), nl="") if args.print_uid: printb(b"%-6d" % event.uid, nl="") ppid = event.ppid if event.ppid > 0 else get_ppid(event.pid) ppid = b"%d" % ppid if ppid > 0 else b"?" argv_text = b' '.join(argv[event.pid]).replace(b'\n', b'\\n') printb(b"%-16s %-6d %-6s %3d %s" % (event.comm, event.pid, ppid, event.retval, argv_text)) try: del(argv[event.pid]) except Exception: pass # loop with callback to print_event b["events"].open_perf_buffer(print_event) while 1: try: b.perf_buffer_poll() except KeyboardInterrupt: exit()
目前都是依赖于 kprobe实现
http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!!
但行好事 莫问前程
--身高体重180的胖子
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南