排查程序死循环,死锁的方法 ——pstack
pstack命令可显示每个进程的栈跟踪,pstack $pid即可,pstack命令须由$pid进程的属主或者root运行。
这次出现cpu占比100%的情况,但看memory占比,并无异常,怀疑是某个地方死循环了。经同事提醒,用pstack命令查看相关进程,通过运行多次pstack,发现代码栈总是停在同一个位置。具体看代码,发现就是这个地方写错了。
这真的是个很强大的命令!和strace命令一样强大
下面是有七个线程的进程的代码栈打印情况
pstack 4551
Thread 7 (Thread 1084229984 (LWP 4552)):
#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1 0x00000000006f0730 in ub::EPollEx::poll ()
#2 0x00000000006f172a in ub::NetReactor::callback ()
#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 6 (Thread 1094719840 (LWP 4553)):
#0 0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1 0x00000000006f0730 in ub::EPollEx::poll ()
#2 0x00000000006f172a in ub::NetReactor::callback ()
#3 0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 5 (Thread 1105209696 (LWP 4554)):
#0 0x000000302b80baa5 in __nanosleep_nocancel ()
#1 0x000000000079e758 in comcm::ms_sleep ()
#2 0x00000000006c8581 in ub::UbClientManager::healthyCheck ()
#3 0x00000000006c8471 in ub::UbClientManager::start_healthy_check ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 4 (Thread 1115699552 (LWP 4555)):
#0 0x000000302b80baa5 in __nanosleep_nocancel ()
#1 0x0000000000482b0e in armor::armor_check_thread ()
#2 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#3 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 3 (Thread 1126189408 (LWP 4556)):
#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2 0x000000000044c972 in Business_config_manager::run ()
#3 0x0000000000457b83 in Thread::run_thread ()
#4 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6 0x0000000000000000 in ?? ()
Thread 2 (Thread 1136679264 (LWP 4557)):
#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2 0x00000000004524bb in Process_thread::sleep_period ()
#3 0x0000000000452641 in Process_thread::run ()
#4 0x0000000000457b83 in Thread::run_thread ()
#5 0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#6 0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#7 0x0000000000000000 in ?? ()
Thread 1 (Thread 182894129792 (LWP 4551)):
#0 0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1 0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2 0x0000000000420d79 in Ad_preprocess::run ()
#3 0x0000000000450ad0 in main ()