pstack与strace命令

strace跟踪程序使用的底层系统调用，可输出系统调用被执行的时间点以及各个调用耗时；pstack工具对指定PID的进程输出函数调用栈。

1.strace

strace是一个可用于诊断、调试和教学的Linux用户空间跟踪器。我们用它来监控用户空间进程和内核的交互，比如系统调用、信号传递、进程状态变更等。

strace常用来跟踪进程执行时的系统调用和所接收的信号。在Linux世界，进程不能直接访问硬件设备，当进程需要访问硬件设备(比如读取磁盘文件，接收网络数据等等)时，必须由用户态模式切换至内核态模式，通过系统调用访问硬件设备。strace可以跟踪到一个进程产生的系统调用,包括参数，返回值，执行消耗的时间。

strace -o output.txt -T -tt -e trace=all -p 28779

跟踪28779进程的所有系统调用（-e trace=all），并统计系统调用的花费时间，以及开始时间（并以可视化的时分秒格式显示），最后将记录结果存在output.txt文件里面。

参数：

-c 统计每一系统调用的所执行的时间,次数和出错的次数等.
-d 输出strace关于标准错误的调试信息.
-f 跟踪由fork调用所产生的子进程.
-ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
-h 输出简要的帮助信息.
-i 输出系统调用的入口指针.
-q 禁止输出关于脱离的消息.
-r 打印出相对时间关于,,每一个系统调用.
-t 在输出中的每一行前加上时间信息.
-tt 在输出中的每一行前加上时间信息,微秒级.
-ttt 微秒级输出,以秒了表示时间.
-T 显示每一调用所耗的时间.
-v 输出所有的系统调用.一些调用关于环境变量,状态,输入输出等调用由于使用频繁,默认不输出.
-V 输出strace的版本信息.
-x 以十六进制形式输出非标准字符串
-xx 所有字符串以十六进制形式输出.
-a column
设置返回值的输出位置.默认 为40.
-e expr
指定一个表达式,用来控制如何跟踪.格式如下:
[qualifier=][!]value1[,value2]...
qualifier只能是 trace,abbrev,verbose,raw,signal,read,write其中之一.value是用来限定的符号或数字.默认的 qualifier是 trace.感叹号是否定符号.例如:
-eopen等价于 -e trace=open,表示只跟踪open调用.而-etrace!=open表示跟踪除了open以外的其他调用.有两个特殊的符号 all 和 none.
注意有些shell使用!来执行历史记录里的命令,所以要使用\\.
-e trace=set
只跟踪指定的系统 调用.例如:-e trace=open,close,rean,write表示只跟踪这四个系统调用.默认的为set=all.
-e trace=file
只跟踪有关文件操作的系统调用.
-e trace=process
只跟踪有关进程控制的系统调用.
-e trace=network
跟踪与网络有关的所有系统调用.
-e strace=signal
跟踪所有与系统信号有关的 系统调用
-e trace=ipc
跟踪所有与进程通讯有关的系统调用
-e abbrev=set
设定 strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all.
-e raw=set
将指 定的系统调用的参数以十六进制显示.
-e signal=set
指定跟踪的系统信号.默认为all.如 signal=!SIGIO(或者signal=!io),表示不跟踪SIGIO信号.
-e read=set
输出从指定文件中读出 的数据.例如:
-e read=3,5
-e write=set
输出写入到指定文件中的数据.
-o filename
将strace的输出写入文件filename
-p pid
跟踪指定的进程pid.
-s strsize
指定输出的字符串的最大长度.默认为32.文件名一直全部输出.
-u username
以username 的UID和GID执行被跟踪的命令

2.pstack

pstack就是由gdb执行的shell脚本。用于打印正在运行的进程的栈跟踪信息。它能对潜在的死锁予以提示, 而pstack只提供了线索, 需要gdb进一步的确定。

pstack是gdb的一部分。此命令允许使用的唯一选项是要检查的进程的 PID

pstack可以打印出该进程的所有线程的情况，那它自然就可以用来检测死锁。

产生死锁的代码

#include <stdio.h>

#include <unistd.h>
#include <pthread.h>

// *) 引入该头文件即可, 不需要在修改任何代码了
#include "dead_lock_stub.h"

pthread_mutex_t mutex_1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex_2 = PTHREAD_MUTEX_INITIALIZER;

void *thread_rountine_1(void *args)
{
    pthread_mutex_lock(&mutex_1);
    sleep(1);
    pthread_mutex_lock(&mutex_2);

    pthread_mutex_unlock(&mutex_2);
    pthread_mutex_unlock(&mutex_1);
    return (void *)(0);
}

void *thread_rountine_2(void *args)
{
    pthread_mutex_lock(&mutex_2);
    sleep(1);
    pthread_mutex_lock(&mutex_1);

    pthread_mutex_unlock(&mutex_1);
    pthread_mutex_unlock(&mutex_2);
    return (void *)(0);
}

int main()
{

    // *) 添加该行， 表示启动死锁检测功能 
    DeadLockGraphic::getInstance().start_check();

    pthread_t tid1, tid2;
    pthread_create(&tid1, NULL, thread_rountine_1, NULL);
    pthread_create(&tid2, NULL, thread_rountine_2, NULL);

    pthread_join(tid1, NULL);
    pthread_join(tid2, NULL);
    return 0;
}

# gcc -g -Wall -Werror dead_lock.c -pthread -o test

# ./test // 则进程死锁一直卡住了

# pstack $pid

Thread 2和Thread 3都在等待锁，就是等待别人释放自己想要锁的那把锁，但是并不能看出来是否是死锁，继续使用gdb分析。

（3）使用GDB分析

# gdb -p $pid

# info thread // 打印所有的线程信息

*表示gdb锁定的线程，切换到第二个线程去查看

# thread 2 // 切换到第2个线程, 可以看到线程id 为 0x7f645e122710, 而LWP指定的值是gdb用来唯一标示该进程中线程的，便于调试的时候追踪

# frame 3 // 打印第三帧信息(#3).每次函数调用都会有压栈的过程，而frame则记录栈中的帧信息

# p mutext_1 // 打印mutex_1的值 , __owner表示gdb中标示线程的值，即LWP

# thread 3

# frame 3

# p mutex_2

LWP(3688)在等待LWP(3687)所拥有的mutex_1, 而同时LWP(3687)又在等待LWP(3688)所拥有的mutex_2, 死锁。

死锁检测的代码

#ifndef __DEAD_LOCK_STUB_H__
#define __DEAD_LOCK_STUB_H__

#include <stdio.h>
#include <stdint.h>

#include <unistd.h>
#include <pthread.h>

#include <deque>
#include <vector>
#include <map>

struct thread_graphic_vertex_t {
    int indegress;
    std::vector<uint64_t> vertexs;
    
    thread_graphic_vertex_t()
        : indegress(0) {
    }
};

class DeadLockGraphic {

public:
    static DeadLockGraphic &getInstance() {
        static DeadLockGraphic instance;
        return instance;
    }

    void lock_before(uint64_t thread_id, uint64_t lock_addr) {
        pthread_mutex_lock(&m_mutex);
        // (A) m_thread_apply_lock, 添加 thread_id => lock_addr
        m_thread_apply_lock[thread_id] = lock_addr;
        pthread_mutex_unlock(&m_mutex);
    }

    void lock_after(uint64_t thread_id, uint64_t lock_addr) {
        pthread_mutex_lock(&m_mutex);
        // (B)m_thread_apply_lock, 去除 thread_id => lock_addr
        m_thread_apply_lock.erase(thread_id);

        // (A)m_lock_belong_thread, add lock_addr => thread_id
        m_lock_belong_thread[lock_addr] = thread_id;
        pthread_mutex_unlock(&m_mutex);
    }
    
    void unlock_after(uint64_t thread_id, uint64_t lock_addr) {
        pthread_mutex_lock(&m_mutex);
        // (B)m_lock_belong_thread, remove lock_addr => thread_id
        m_lock_belong_thread.erase(lock_addr);
        pthread_mutex_unlock(&m_mutex);
    }

    void check_dead_lock()
    {
        std::map<uint64_t, uint64_t> lock_belong_thread;
        std::map<uint64_t, uint64_t> thread_apply_lock;

        pthread_mutex_lock(&m_mutex);
        lock_belong_thread = m_lock_belong_thread;
        thread_apply_lock = m_thread_apply_lock;
        pthread_mutex_unlock(&m_mutex);

        // 构建有向图
        std::map<uint64_t, thread_graphic_vertex_t> graphics;
        for ( std::map<uint64_t, uint64_t>::const_iterator iter = m_thread_apply_lock.begin(); 
                iter != m_thread_apply_lock.end(); iter++  ) {
            uint64_t thd_id1 = iter->first;
            uint64_t lock_id = iter->second; 
            if ( m_lock_belong_thread.find(lock_id) == m_lock_belong_thread.end() ) {
                continue;
            }           
            
            uint64_t thd_id2 = m_lock_belong_thread[lock_id];
            
            if ( graphics.find(thd_id1) == graphics.end() ) {
                graphics[thd_id1] = thread_graphic_vertex_t();
            }
            if ( graphics.find(thd_id2) == graphics.end() ) {
                graphics[thd_id2] = thread_graphic_vertex_t();
            }

            // 保存有向边
            graphics[thd_id1].vertexs.push_back(thd_id2);
            // 入度 indegress++
            graphics[thd_id2].indegress++;
        }

        // 检测流程一
        uint64_t counter = 0;
        std::deque<uint64_t> graphics_queue;
        for ( std::map<uint64_t, thread_graphic_vertex_t>::const_iterator iter = graphics.begin();
                iter != graphics.end(); iter++ ) {
            uint64_t thd_id = iter->first;
            const thread_graphic_vertex_t &gvert = iter->second;
            if ( gvert.indegress == 0 ) {
                graphics_queue.push_back(thd_id);
                counter ++;
            }
        }
            
        // 检测流程二
        while ( !graphics_queue.empty() ) {
            uint64_t thd_id = graphics_queue.front();
            graphics_queue.pop_front();

            const thread_graphic_vertex_t &gvert = graphics[thd_id];
            // 遍历邻近有向边
            for ( size_t i = 0; i < gvert.vertexs.size(); i++ ) {
                uint64_t thd_id2 = gvert.vertexs[i];
                graphics[thd_id2].indegress --;
                if ( graphics[thd_id2].indegress == 0 ) {
                    graphics_queue.push_back(thd_id2);
                    counter++;
                }
            }
        }

        // 检测流程三
        if ( counter != graphics.size() ) {
            printf("Found Dead Lock!!!!!!!!!!!!\n");
        } else {
            printf("No Found Dead Lock.\n");
        }        

    }

    void start_check() {
        pthread_t tid;
        pthread_create(&tid, NULL, thread_rountine, (void *)(this));
    }

    static void *thread_rountine(void *args) {
        DeadLockGraphic *ptr_graphics = static_cast<DeadLockGraphic *>(args);
        while ( true ) {
            // 每十秒检测一次
            sleep(10);
            ptr_graphics->check_dead_lock();
        }
    }

private:
    
    // lock 对应 线程 拥有者的map
    std::map<uint64_t, uint64_t> m_lock_belong_thread;

    // 线程尝试去申请的lock map
    std::map<uint64_t, uint64_t> m_thread_apply_lock; 

    pthread_mutex_t m_mutex;


private:
    DeadLockGraphic() {
        pthread_mutex_init(&m_mutex, NULL);
    }
    ~DeadLockGraphic() {
        pthread_mutex_destroy(&m_mutex);
    }
private:
    DeadLockGraphic(const DeadLockGraphic &) {
    }
    DeadLockGraphic& operator=(const DeadLockGraphic &) {
        return *this;    
    }

private:
    

};

#include <sys/syscall.h>

#define gettid() syscall(__NR_gettid)

// 拦截lock, 添加before, after操作, 记录锁与线程的关系
#define pthread_mutex_lock(x)                                                                       \
    do {                                                                                            \
        DeadLockGraphic::getInstance().lock_before(gettid(), reinterpret_cast<uint64_t>(x));        \
        pthread_mutex_lock(x);                                                                      \
        DeadLockGraphic::getInstance().lock_after(gettid(), reinterpret_cast<uint64_t>(x));         \
    } while (false);

// 拦截unlock, 添加after操作, 解除锁和线程的关系
#define pthread_mutex_unlock(x)                                                                     \
    do {                                                                                            \
        pthread_mutex_unlock(x);                                                                    \
        DeadLockGraphic::getInstance().unlock_after(gettid(), reinterpret_cast<uint64_t>(x));       \
    } while(false);



#endif

g++ -g main.cpp -lpthread -o dead_sample

3.死锁的概念

3.1 死锁概念
互斥锁是保护临界资源被线程间（或进程间）互斥的访问临界资源，当一个线程得到锁不释放时另一个线程申请时必须等待。当多个线程因为竞争资源而造成的一种僵局（互相等待），如果不施以援手，这些进程将永远等待。

3.2 产生条件
① 系统资源不足：系统中所拥有的资源其数量不足以满足线程运行的需要，使得在运行过程中，因争夺资源而陷入僵局。
② 线程间推进顺序非法：线程间在运行过程中，申请和释放的顺序不合法。
③ 资源分配不当。

4. 死锁样例

一、死锁会在什么情况发生

1、假设有如下代码

mutex;  //代表一个全局互斥对象
void A()
{
     mutex.lock();
     //这里操作共享数据
     B(); //这里调用B方法
     mutex.unlock();
     return;
}
  
void B()
{
     mutex.lock();
     //这里操作共享数据
     mutex.unlock();
     return;
}

此时会由于在A、B方法中相互等待unlock而导致死锁。

2、假设有如何代码

mutex;  //代表一个全局互斥对象
void A()
{
     mutex.lock();
     //这里操作共享数据
      if(.....)
     {
       return;
      }
     mutex.unlock();
     return;
}

由于在if的执行体内直接retun，而没有调用unlock，导致另一个线程再调用A方法就出现死锁。

二、另一个总结

不管什么原因，死锁的危机都是存在的。那么，通常出现的死锁都有哪些呢？我们可以一个一个看过来，

（1）忘记释放锁

void data_process()
{
  EnterCriticalSection(); 
  if(/* error happens */)
    return; 
  LeaveCriticalSection();
}

（2）单线程重复申请锁

void sub_func()
{
  EnterCriticalSection();
  do_something();
  LeaveCriticalSection();
}
  
void data_process()
{
  EnterCriticalSection();
  sub_func();
  LeaveCriticalSection();
}

（3）双线程多锁申请

void data_process1()
{
  EnterCriticalSection(&cs1);
  EnterCriticalSection(&cs2);
  do_something1();
  LeaveCriticalSection(&cs2);
  LeaveCriticalSection(&cs1);
}
  
void data_process2()
{
  EnterCriticalSection(&cs2);
  EnterCriticalSection(&cs1);
  do_something2();
  LeaveCriticalSection(&cs1);
  LeaveCriticalSection(&cs2);
}

（4）环形锁申请

/*
* A - B
* | |
* C - D
*/

假设有A、B、C、D四个人在一起吃饭，每个人左右各有一只筷子。所以，这其中要是有一个人想吃饭，他必须首先拿起左边的筷子，再拿起右边的筷子。现在，我们让所有的人同时开始吃饭。那么就很有可能出现这种情况。每个人都拿起了左边的筷子，或者每个人都拿起了右边的筷子，为了吃饭，他们现在都在等另外一只筷子。此时每个人都想吃饭，同时每个人都不想放弃自己已经得到的一那只筷子。所以，事实上大家都吃不了饭。

总结：

（1）死锁的危险始终存在，但是我们应该尽量减少这种危害存在的范围
（2）解决死锁花费的代价是异常高昂的
（3）最好的死锁处理方法就是在编写程序的时候尽可能检测到死锁
（4）多线程是一把双刃剑，有了效率的提高当然就有死锁的危险
（5）某些程序的死锁是可以容忍的，大不了重启机器，但是有些程序不行

参考文献：

【1】linux命令之pstack

【2】Linux strace、pstack 命令使用详解

【3】linux命令-- pstack命令(跟踪进程栈)

【4】Linux strace命令

【5】Linux strace命令详解

【6】Linux---死锁及避免死锁的方法

转载自herryone123

posted @ 2022-02-13 16:49 PKICA 阅读(39) 评论(0) 收藏举报

刷新页面返回顶部