drgn: How the Linux Kernel Team at Meta Debugs the Kernel at Scale

原文:
https://developers.facebook.com/blog/post/2021/12/09/drgn-how-linux-kernel-team-meta-debugs-kernel-scale/

drgn (pronounced “dragon”) is a debugger that exposes the types and variables in a program for easy, expressive scripting in Python. The Linux kernel team at Meta originally built drgn to make it easier to investigate the kinds of difficult Linux kernel bugs that the team encounters at Meta. The team has since added further use cases for it, like monitoring and userspace memory profiling. This blog discusses why we built drgn, how drgn works, how to use it, and what the team would like to do with drgn next. The drgn project is open source and available in the drgn GitHub repo for you to try.

Why drgn?

Meta uses Linux for most of its infrastructure. The Linux kernel team at Meta has a comprehensive validation and phased rollout procedure to help catch bugs before they are in production. Of course, some rare bugs aren’t caught until the kernel is deployed at a huge scale. Debugging issues like this often required traversing and sifting through lots of interconnected data structures, which was difficult or tedious with existing tools.

The Kernel team built drgn to have a tool with:

  • Better scripting support than the crash utility
  • More natural scripting than GDB
  • Better Linux kernel support than GDB

How to Use drgn

To use drgn for debugging purposes, install drgn and get debugging symbols for your running kernel. drgn can debug a kernel core dump or the running kernel. To debug the running kernel, simply run sudo drgn. This starts a Python interpreter that is preloaded with the drgn environment. Namely, it provides the prog variable, which is drgn’s representation of the program that can be used to access variables, types, and more:

$ sudo drgn
>>> prog.type("struct list_head")
struct list_head {
	struct list_head *next;
	struct list_head *prev;
}
>>> prog.variable("jiffies_64")
(u64)4295733889
# Shorthand:
>>> prog["jiffies_64"]
(u64)4295735240

The drgn interpreter pretty-prints variables by default, but the prog.variable() method and prog[] syntax return one of the core building blocks of drgn: drgn.Object. A drgn object represents a variable, constant, function, or computed value. It has several fields and methods that can be used to get information about the variable:

>>> hex(prog["jiffies_64"].address_)
'0xffffffff8fa07980'
>>> prog["jiffies_64"].type_
typedef __u64 u64   

More importantly, a drgn object can be used in expressions just like in the actual program, like doing arithmetic or accessing structure members or array elements:

>>> prog["jiffies_64"] + 1
(u64)4295933335
>>> prog["init_task"].comm
(char [16])"swapper/0"
>>> prog["init_task"].comm[0]
(char)115          

drgn also provides helpers, which are predefined functions for working with common data structures and subsystems:

>>> for task in for_each_task(prog):
...     if task_state_to_char(task) == "R":
...             print(task.comm)
...
(char [16])"drgn"
>>> path_lookup(prog, "/etc/hosts")
(struct path){
       .mnt = (struct vfsmount *)0xffff9806008557e0,
       .dentry = (struct dentry *)0xffff980601678900,
}   

Helpers aren’t magic; they are just Python functions that use drgn objects and some knowledge of the Linux kernel.
Finally, drgn can also unwind stack traces and access local variables:

>>> trace = prog.stack_trace(508)
>>> trace
#0  context_switch (kernel/sched/core.c:4940:2)
#1  __schedule (kernel/sched/core.c:6287:8)
#2  schedule (kernel/sched/core.c:6366:3)
#3  io_schedule (kernel/sched/core.c:8389:2)
#4  wait_on_page_bit_common (mm/filemap.c:1356:4)
#5  __lock_page (mm/filemap.c:1648:2)
#6  lock_page (./include/linux/pagemap.h:625:3)
#7  pagecache_get_page (mm/filemap.c:1910:4)
#8  find_or_create_page (./include/linux/pagemap.h:420:9)
#9  cluster_pages_for_defrag (fs/btrfs/ioctl.c:1249:10)
#10 btrfs_defrag_file (fs/btrfs/ioctl.c:1549:10)
#11 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3179:9)
#12 btrfs_ioctl (fs/btrfs/ioctl.c:4935:10)
#13 vfs_ioctl (fs/ioctl.c:51:10)
#14 __do_sys_ioctl (fs/ioctl.c:874:11)
#15 __se_sys_ioctl (fs/ioctl.c:860:1)
#16 __x64_sys_ioctl (fs/ioctl.c:860:1)
#17 do_syscall_x64 (arch/x86/entry/common.c:50:14)
#18 do_syscall_64 (arch/x86/entry/common.c:80:7)
#19 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113)
#20 0x7f843e30d59b
>>> trace[6]
#6 at 0xffffffff8e62e573 (pagecache_get_page+0x433/0x4df) in lock_page at ./include/linux/pagemap.h:625:3 (inlined)
>>> inode_path(trace[6]["page"].mapping.host)
b'usr/bin/myservice'          

The drgn user guide goes into more details about these concepts.

How Meta Uses drgn

Meta has used drgn to investigate many Linux kernel bugs. To name a few:

In addition to ad hoc debugging sessions, we have also integrated drgn into our automatic kernel core dump collection system. Whenever we have a kernel crash, we use kdump to capture a core dump. When the machine reboots, we use drgn to automatically collect stack traces and information about what processes were running at the time of the crash, what BPF programs were loaded, etc. That information is sent to Meta’s monitoring systems, including Scuba. This is used both for automatic alerts of frequent crashes and for manual investigations.

drgn is not limited to debugging. We have also used drgn to collect statistics and monitor kernel internals that are not otherwise exposed. For example, we have published scripts to count RCU callbacks, inspect cgroup slab allocator usage, list BPF programs and maps, and monitor the iocost cgroup I/O controller.

Finally, although drgn was initially developed for the Linux kernel, it is designed as a generic library for introspecting program types and state. As such, the Capacity Engineering and Analysis team at Meta uses drgn as the backend for getting type information for their Object Introspection memory profiler. This profiler was used to implement some multi-gigabyte memory savings in a critical Meta service.
How drgn Works

The internals of drgn are not drastically different from other debuggers, although it has some important optimizations that make it more pleasant to use. drgn consumes two main inputs: the memory of the program and the debugging symbols for the program. The source of the memory depends on what is being debugged:

  • For a crashed program, drgn uses a core dump (which on Linux is an ELF file).
  • For a running program, drgn uses the /proc/[pid]/mem pseudo-file.
  • For a kernel crash, drgn can use an ELF core dump (e.g., /proc/vmcore) or a compressed dump generated by makedumpfile.
  • For the running kernel, drgn uses the /proc/kcore pseudo-file, which is also formatted like an ELF file.

On Linux, the standard debugging information format is DWARF. DWARF provides information about the locations of variables, layouts of types, how to unwind the stack at a given instruction pointer address, and more. drgn mostly uses libdw from elfutils to process ELF and DWARF data. However, when starting up, drgn uses a custom-built, parallelized DWARF parser to quickly build a cache of DWARF information indexed by the names of types, variables, etc. drgn starts up 5 times as fast as GDB, which provides a better user experience. drgn also has programming language-specific logic for emulating types and operators (currently mainly C, with C++ in progress) and instruction set architecture-specific logic for accessing CPU registers, page tables, and more (currently mainly x86_64).

The core drgn functionality is implemented in a native library, libdrgn. Python bindings are built on top of libdrgn, and the command-line interface and helpers are built on top of those bindings.

Next Steps

The long-term vision for drgn is to provide programmatic access to anything and everything about a program’s internals, enabling a wide variety of debugging and monitoring tasks. However, since we originally developed it with a focus on debugging the Linux kernel, there are several incomplete or missing features:

  • C++ support is in progress as part of the aforementioned Object Introspection project. Rust support could also be added in the future.
  • Userspace debugging is missing support for listing threads and attaching to threads to pause and resume them.
  • drgn is currently read-only. It cannot set breakpoints, modify memory, or call functions in the program (yet).
  • drgn could always use more helper functions for areas that are not already covered.

Conclusion

drgn is a powerful tool that makes debugging complex programs much easier. It is being actively developed to add more complete support for C++ and userspace applications as well as to continue improving its support for the Linux kernel. See the documentation, try it out, and share feedback, feature requests, or bugs to the issue tracker.

To learn more about Meta Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter and Facebook.

posted @ 2023-06-14 10:37  dolinux  阅读(49)  评论(0编辑  收藏  举报