【ceph】Admin Socket机制|ceph dump 命令原理

目录

python 前端部分

Admin Socket后端部分


Admin Socket机制--->进程间通信

前端python进程,后端C++进程。

ceph 命令下发流程:命令行-->python 前端部分处理-->Admin Socket后端部分处理

前端部分 python

https://blog.csdn.net/bandaoyu/article/details/123746755

后端部分 Admin Socket

任何一个成熟的项目,必须要提供出接口,就像探针一样,可以让我们探查进程内部的运行情况,进程不能是一个黑盒子。对于ceph而言,Admin Socket 提供了该功能。

Admin Socket 不仅可以查看当前的配置,进程运行的状态,而且,还可以修改配置,获取log等。

下面help可以看出,ceph给出了很多命令,来了解ceph内部的运行情况。

root@test3:~# ceph daemon /var/run/ceph/ceph-osd.4.asok help

{ "config get": "config get : get the config value",
  "config set": "config set [ ...]: set a config variable",
  "config show": "dump current config settings",
  "dump_blacklist": "dump blacklisted clients and times",
  "dump_historic_ops": "show slowest recent ops",
  "dump_op_pq_state": "dump op priority queue state",

…… 
}
  
  
root@test3:~# ceph daemon /var/run/ceph/ceph-mon.*.asok help

{ "add_bootstrap_peer_hint": "add peer address as potential bootstrap peer for cluster bringup",
  "config get": "config get : get the config value",
  "config set": "config set [ ...]: set a config variable",
  "config show": "dump current config settings",
  "get_command_descriptions": "list available commands",
  "git_version": "get git sha1",
  "help": "list available commands",
  "log dump": "dump recent log entries to log file",
……
}
  
  
  
root@test3:~# ceph daemon /var/run/ceph/ceph-mds.*.asok help

{ "config get": "config get : get the config value",
  "config set": "config set [ ...]: set a config variable",
  "config show": "dump current config settings",
  "get_command_descriptions": "list available commands",
  "git_version": "get git sha1",
……
}

比如可以查看 ceph的各个模块的当前配置

ceph daemon /var/run/ceph/ceph-mds.*.asok config show
ceph daemon /var/run/ceph/ceph-mon.*.asok config show
ceph daemon /var/run/ceph/ceph-osd.4.asok config show

这个是怎么实现的呢? 这是依靠ceph的AdminSocket机制完成。


CephContext中会创建一个AdminSocket对象,该对象本质是一个线程。

ceph-mon/ceph-osd/ceph-mds这些进程都会有创建一个AdminSocket的线程,负责响应用户的探查命令。

从上面的OSD MON MDS help的不同输出可以看出,他们支持的命令有共同的,也有各自独立的。
首先是共同支持的命令有:

  • config show        显示所有的配置项
  • config get           获取某个配置项
  • config set           设置某个配置项
  • log flush        将log 刷入日志文件
  • log dump           将最近的若干笔log刷入到log文件
  • log reopen         重新打开log文件
  • perf dump         输出统计信息
  • perf schema      输出统计信息的类型
  • version             版本信息
  • git_version   :    git 版本信息

支持的大部分common的命令,注册发生在 CephContext的构造函数:

 _admin_hook = new CephContextHook(this);
  _admin_socket->register_command("perfcounters_dump", "perfcounters_dump", _admin_hook, "");
  _admin_socket->register_command("1", "1", _admin_hook, "");
  _admin_socket->register_command("perf dump", "perf dump", _admin_hook, "dump perfcounters value");
  _admin_socket->register_command("perfcounters_schema", "perfcounters_schema", _admin_hook, "");
  _admin_socket->register_command("2", "2", _admin_hook, "");
  _admin_socket->register_command("perf schema", "perf schema", _admin_hook, "dump perfcounters schema");
  _admin_socket->register_command("config show", "config show", _admin_hook, "dump current config settings");
  _admin_socket->register_command("config set", "config set name=var,type=CephString name=val,type=CephString,n=N", _admin_hook, "config set [ ...]: set a config variable");
  _admin_socket->register_command("config get", "config get name=var,type=CephString", _admin_hook, "config get : get the config value");
  _admin_socket->register_command("log flush", "log flush", _admin_hook, "flush log entries to log file");
  _admin_socket->register_command("log dump", "log dump", _admin_hook, "dump recent log entries to log file");
  _admin_socket->register_command("log reopen", "log reopen", _admin_hook, "reopen log file")

(一个AdminSocketHook(_admin_hook)可以对应多个command,里面再细分)

首先定义了个CephContextHook,注册该Hook也作为一个参数传递进去,这个register_command比较简单,就是建立了command字符串和Hook的关联,这个关联有何作用,后面会分析到,暂时按下不表。

int AdminSocket::register_command(std::string command, std::string cmddesc, AdminSocketHook *hook, std::string help)
{
  int ret;
  m_lock.Lock();
  if (m_hooks.count(command)) {
    ldout(m_cct, 5) << "register_command " << command << " hook " << hook << " EEXIST" << dendl;
    ret = -EEXIST;
  } else {
    ldout(m_cct, 5) << "register_command " << command << " hook " << hook << dendl;
    m_hooks[command] = hook;
    m_descs[command] = cmddesc;
    m_help[command] = help;
    ret = 0;
  }
  m_lock.Unlock();
  return ret;
}

后面从注册部分的代码我们可以知道下面三个命令是等效的。
 

ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perf dump
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perfcounters_dump
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok 1

还有下面三个命令是等效的:

ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perf schema
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perfcounters_schema
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok 2


 ceph-osd/ceph-mon/ceph-mds有一些自己的独特的注册函数,大家检索register_command可以检索的到这些独特的命令。


这个AdminSocket 对应的线程,在common_init_finish函数中负责创建。

void common_init_finish(CephContext *cct)
{
  ceph::crypto::init(cct);
  cct->start_service_thread();
  if (cct->_conf->lockdep) {
    g_lockdep = true;
    ldout(cct,0) << "lockdep is enabled" << dendl;
    lockdep_register_ceph_context(cct);
  }
}

而在start_service_thread函数中会调用adminsocket的init函数:
 

void CephContext::start_service_thread()
{
  pthread_spin_lock(&_service_thread_lock);
  if (_service_thread) {
    pthread_spin_unlock(&_service_thread_lock);
    return;
  }
  _service_thread = new CephContextServiceThread(this);
  _service_thread->create();
  pthread_spin_unlock(&_service_thread_lock);
  // make logs flush on_exit()
  if (_conf->log_flush_on_exit)
    _log->set_flush_on_exit();
  // Trigger callbacks on any config observers that were waiting for
  // it to become safe to start threads.
  _conf->set_val("internal_safe_to_start_threads", "true");
  _conf->call_all_observers();
  // start admin socket
  if (_conf->admin_socket.length())
    _admin_socket->init(_conf->admin_socket); <------------------------------------init
}


接下来可以分析下admin_socket的初始化函数init:

bool AdminSocket::init(const std::string &path)
{
  ldout(m_cct, 5) << "init " << path << dendl;
  /* Set up things for the new thread */
  std::string err;
  int pipe_rd = -1, pipe_wr = -1;
  err = create_shutdown_pipe(&pipe_rd, &pipe_wr);
  if (!err.empty()) {
    lderr(m_cct) << "AdminSocketConfigObs::init: error: " << err << dendl;
    return false;
  }
  int sock_fd;
  err = bind_and_listen(path, &sock_fd);
  if (!err.empty()) {
    lderr(m_cct) << "AdminSocketConfigObs::init: failed: " << err << dendl;
    close(pipe_rd);
    close(pipe_wr);
    return false;
  }
  /* Create new thread */
  m_sock_fd = sock_fd;
  m_shutdown_rd_fd = pipe_rd;
  m_shutdown_wr_fd = pipe_wr;
  m_path = path;
  m_version_hook = new VersionHook;
  register_command("0", "0", m_version_hook, "");
  register_command("version", "version", m_version_hook, "get ceph version");
  register_command("git_version", "git_version", m_version_hook, "get git sha1");
  m_help_hook = new HelpHook(this);
  register_command("help", "help", m_help_hook, "list available commands");
  m_getdescs_hook = new GetdescsHook(this);
  register_command("get_command_descriptions", "get_command_descriptions",
  m_getdescs_hook, "list available commands");
  create();
  add_cleanup_file(m_path.c_str());
  return true;
}


首先是创建了管道,读取端的文件描述符记录在m_shutdown_rd_fd中,写入端的文件描述符记录在m_shutdown_wr_fd中。
从变量名字也可以看出,该文件描述符的作用是收取关闭信息。因为adminsocket一旦创建,必须能够通知到该线程及时退出。
退出的事情会写入管道的写入端,而线程会通过多路复用接口,监听读取端,一旦发现m_shutdown_rd_fd中读出内容,线程就知道,可以退出了。
按下不表。

AdminSocket最重要的是监听发过来的请求,它是用socket来实现的,初始化在bind_and_listen 函数:

std::string AdminSocket::bind_and_listen(const std::string &sock_path, int *fd)
{
  ldout(m_cct, 5) << "bind_and_listen " << sock_path << dendl;

  struct sockaddr_un address;
  if (sock_path.size() > sizeof(address.sun_path) - 1) {
    ostringstream oss;
    oss << "AdminSocket::bind_and_listen: "
    << "The UNIX domain socket path " << sock_path << " is too long! The "
    << "maximum length on this system is "
    << (sizeof(address.sun_path) - 1);
    return oss.str();
  }
  int sock_fd = socket(PF_UNIX, SOCK_STREAM, 0);
  if (sock_fd < 0) {
    int err = errno;
    ostringstream oss;
    oss << "AdminSocket::bind_and_listen: "
    << "failed to create socket: " << cpp_strerror(err);
    return oss.str();
  }
  int r = fcntl(sock_fd, F_SETFD, FD_CLOEXEC);
  if (r < 0) {
    r = errno;
    TEMP_FAILURE_RETRY(::close(sock_fd));
    ostringstream oss;
    oss << "AdminSocket::bind_and_listen: failed to fcntl on socket: " << cpp_strerror(r);
    return oss.str();
  }
  memset(&address, 0, sizeof(struct sockaddr_un));
  address.sun_family = AF_UNIX;
  snprintf(address.sun_path, sizeof(address.sun_path),
     "%s", sock_path.c_str());
  if (bind(sock_fd, (struct sockaddr*)&address,
     sizeof(struct sockaddr_un)) != 0) {
    int err = errno;
    if (err == EADDRINUSE) {
      AdminSocketClient client(sock_path);
      bool ok;
      client.ping(&ok);
      if (ok) {
    ldout(m_cct, 20) << "socket " << sock_path << " is in use" << dendl;
    err = EEXIST;
      } else {
    ldout(m_cct, 20) << "unlink stale file " << sock_path << dendl;
    TEMP_FAILURE_RETRY(unlink(sock_path.c_str()));
    if (bind(sock_fd, (struct sockaddr*)&address,
         sizeof(struct sockaddr_un)) == 0) {
     err = 0;
    } else {
     err = errno;
    }
      }
    }
    if (err != 0) {
      ostringstream oss;
      oss << "AdminSocket::bind_and_listen: "
     << "failed to bind the UNIX domain socket to '" << sock_path
     << "': " << cpp_strerror(err);
      close(sock_fd);
      return oss.str();
    }
  }
  if (listen(sock_fd, 5) != 0) {
    int err = errno;
    ostringstream oss;
    oss << "AdminSocket::bind_and_listen: "
     << "failed to listen to socket: " << cpp_strerror(err);
    close(sock_fd);
    TEMP_FAILURE_RETRY(unlink(sock_path.c_str()));
    return oss.str();
  }
  *fd = sock_fd;
  return "";
}


这个函数做的事情并不难理解,做的事情比较老套:
1 创建一个socket,该函数需要一个入参,指定socket路径名:
2  bind
3 listen
而传入的路径名是这个

root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok config get admin_socket
{ "admin_socket": "\/var\/run\/ceph\/ceph-mds.bfudz.asok"}

socket建好之后,因为线程还没有创建,所以至今还没有accept接口 。然后我们再次回到init函数,该函数又注册了几个函数:
 

m_version_hook = new VersionHook;
  register_command("0", "0", m_version_hook, "");
  register_command("version", "version", m_version_hook, "get ceph version");
  register_command("git_version", "git_version", m_version_hook, "get git sha1");
  m_help_hook = new HelpHook(this);
  register_command("help", "help", m_help_hook, "list available commands");
  m_getdescs_hook = new GetdescsHook(this);
  register_command("get_command_descriptions", "get_command_descriptions",
  m_getdescs_hook, "list available commands");

这几个函数的重要性并不大,基本是用来查版本信息的,如下所示,按下不表。
 

root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok version
{"version":"0.67.9-222-g014b35f"}
root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok git_version
{"git_version":"014b35fc1ee0a1ad1f699a3705f3481a88614d36"}
root@test3:/var/run/ceph#


init函数最后调用了create函数。create函数是老朋友,前面分析Log的时候已经提到,对于Thread这个类,做的事情无非就是创建线程。关键内容是,线程执行的函数是哪个?
和Log一样,是entry函数。AdminSocket类也有entry函数,该函数是AdminSocket 对应线程指定的函数:
 

void* AdminSocket::entry()
{
  ldout(m_cct, 5) << "entry start" << dendl;
  while (true) {
    struct pollfd fds[2];
    memset(fds, 0, sizeof(fds));
    fds[0].fd = m_sock_fd;
    fds[0].events = POLLIN | POLLRDBAND;
    fds[1].fd = m_shutdown_rd_fd;
    fds[1].events = POLLIN | POLLRDBAND;
    int ret = poll(fds, 2, -1);
    if (ret < 0) {
      int err = errno;
      if (err == EINTR) {
continue;
      }
      lderr(m_cct) << "AdminSocket: poll(2) error: '"
  << cpp_strerror(err) << dendl;
      return PFL_FAIL;
    }
    if (fds[0].revents & POLLIN) {
      // Send out some data
      do_accept();
    }
    if (fds[1].revents & POLLIN) {
      // Parent wants us to shut down
      return PFL_SUCCESS;
    }
  }
  ldout(m_cct, 5) << "entry exit" << dendl;
}


这个线程函数比较简单,它监听socket fd和管道的读取端。
1 管道的读取端负责管理何时退出
2 socket fd 负责监听用户发过来的指令。

处理用户发过来的命令,是do_accept函数干的事情:

bool AdminSocket::do_accept()
{
    struct sockaddr_un address;
    socklen_t address_length = sizeof(address);
    ldout(m_cct, 30) << "AdminSocket: calling accept" << dendl;
    int connection_fd = accept(m_sock_fd, (struct sockaddr *) &address,
                               &address_length);
    ldout(m_cct, 30) << "AdminSocket: finished accept" << dendl;
    if (connection_fd < 0)
    {
        int err = errno;
        lderr(m_cct) << "AdminSocket: do_accept error: '"
                     << cpp_strerror(err) << dendl;
        return false;
    }

    char cmd[1024];
    int pos = 0;
    string c;
    while (1)
    {
        int ret = safe_read(connection_fd, &cmd[pos], 1);
        if (ret <= 0)
        {
            lderr(m_cct) << "AdminSocket: error reading request code: "
                         << cpp_strerror(ret) << dendl;
            close(connection_fd);
            return false;
        }
        //ldout(m_cct, 0) << "AdminSocket read byte " << (int)cmd[pos] << " pos " << pos << dendl;
        if (cmd[0] == '\0')
        {
            // old protocol: __be32
            if (pos == 3 && cmd[0] == '\0')
            {
                switch (cmd[3])  //<-------------------------------按数字搜索指令
                {
                case 0:
                    c = "0";
                    break;
                case 1:
                    c = "perfcounters_dump"; //<-----数字1对应perfcounters_dump指令,交给c
                    break;
                case 2:
                    c = "perfcounters_schema";
                    break;
                default:
                    c = "foo";
                    break;
                }
                break;
            }
        }
        else
        {
            // new protocol: null or \n terminated string
            if (cmd[pos] == '\n' || cmd[pos] == '\0')
            {
                cmd[pos] = '\0';
                c = cmd; //<------------传入的如果是字符指令,直接把指令字符给c
                break;
            }
        }
        pos++;
    }

    bool rval = false;

    map<string, cmd_vartype> cmdmap;
    string format;
    vector<string> cmdvec;
    stringstream errss;
    cmdvec.push_back(cmd);
    if (!cmdmap_from_json(cmdvec, &cmdmap, errss))
    {
        ldout(m_cct, 0) << "AdminSocket: " << errss << dendl;
        return false;
    }
    cmd_getval(m_cct, cmdmap, "format", format);
    if (format != "json" && format != "json-pretty" &&
            format != "xml" && format != "xml-pretty")
        format = "json-pretty";
    cmd_getval(m_cct, cmdmap, "prefix", c);

    string firstword;
    if (c.find(" ") == string::npos)
        firstword = c;
    else
        firstword = c.substr(0, c.find(" "));

    m_lock.Lock();
    map<string, AdminSocketHook *>::iterator p;
    string match = c;
    while (match.size())
    {
        p = m_hooks.find(match); //<---在m_hooks按字符指令找注册的AdminSocketHook 
        if (p != m_hooks.end())
            break;

        // drop right-most word
        size_t pos = match.rfind(' ');
        if (pos == std::string::npos)
        {
            match.clear(); // we fail
            break;
        }
        else
        {
            match.resize(pos);
        }
    }

    bufferlist out;
    if (p == m_hooks.end())
    {
        lderr(m_cct) << "AdminSocket: request '" << c << "' not defined" << dendl;
    }
    else
    {
        string args;
        if (match != c)
            args = c.substr(match.length() + 1);
        bool success = p->second->call(match, cmdmap, format, out);//执行call处理请求
        if (!success)
        {
            ldout(m_cct, 0) << "AdminSocket: request '" << match << "' args '" << args
                            << "' to " << p->second << " failed" << dendl;
            out.append("failed");
        }
        else
        {
            ldout(m_cct, 5) << "AdminSocket: request '" << match << "' '" << args
                            << "' to " << p->second
                            << " returned " << out.length() << " bytes" << dendl;
        }
        uint32_t len = htonl(out.length());
        int ret = safe_write(connection_fd, &len, sizeof(len));
        if (ret < 0)
        {
            lderr(m_cct) << "AdminSocket: error writing response length "
                         << cpp_strerror(ret) << dendl;
        }
        else
        {
            if (out.write_fd(connection_fd) >= 0)
                rval = true;
        }
    }
    m_lock.Unlock();

    TEMP_FAILURE_RETRY(close(connection_fd));
    return rval;


这个函数有点长,但是并不复杂。简单说,如果有个client尝试 connect ,该线程就poll就会感知到,然后进入do_accept函数。
do_accept首先执行accept,和client 搭上线,然后开始通信。 

safe_read负责 读取客户发过来的指令。前面已经提到过,AdminSocket支持的命令是有限的,初始化之前都已注册过了。
如果client 发来的指令时注册过的指令,就见招拆招,返回相应的结果给客户端。

每一个命令的字符串,都是和一个AdminSocketHook 的类型关联的,但是一个AdminSocketHook可以对应多个command
 

std::map<std::string,AdminSocketHook*> m_hooks

比如说 config show  / config get   / perf dump对应的都是 CephContextHook,前面已经提到过。

见招拆招的函数,就记录在对应的Hook上:
 

class CephContextHook : public AdminSocketHook {
  CephContext *m_cct;
public:
  CephContextHook(CephContext *cct) : m_cct(cct) {}
  bool call(std::string command, cmdmap_t& cmdmap, std::string format,
   bufferlist& out) {
    m_cct->do_command(command, cmdmap, format, &out);
    return true;
  }
};


do_accept函数中黄色的一行,具体是实现,就是对应Hook的call函数,对于CephContextHook,就是这个类的call方法。

下面我们看下CephContextHook的call方法:即它的do_command函数:

根据命令:perf dump  执行: _perf_counters_collection->dump_formatted(f, false);

void CephContext::do_command(std::string command, cmdmap_t &cmdmap,
                             std::string format, bufferlist *out)
{
    Formatter *f = new_formatter(format);
    if (!f)
        f = new_formatter("json-pretty");
    stringstream ss;
    for (cmdmap_t::iterator it = cmdmap.begin(); it != cmdmap.end(); ++it)
    {
        if (it->first != "prefix")
        {
            ss << it->first << ":" << cmd_vartype_stringify(it->second) << " ";

        }
    }
    lgeneric_dout(this, 1) << "do_command '" << command << "' '"
                           << ss.str() << dendl;
    if (command == "perfcounters_dump" || command == "1" ||
            command == "perf dump")
    {
        _perf_counters_collection->dump_formatted(f, false);

    }
    else if (command == "perfcounters_schema" || command == "2" ||
             command == "perf schema")
    {
        _perf_counters_collection->dump_formatted(f, true);
    }
    else
    {
        f->open_object_section(command.c_str());
        if (command == "config show")
        {
            _conf->show_config(f);
        }
        else if (command == "config set")
        {
            std::string var;
            std::vector<std::string> val;

            if (!(cmd_getval(this, cmdmap, "var", var)) ||
                    !(cmd_getval(this, cmdmap, "val", val)))
            {
                f->dump_string("error", "syntax error: 'config set '");
            }
            else
            {
                // val may be multiple words
                string valstr = str_join(val, " ");
                int r = _conf->set_val(var.c_str(), valstr.c_str());
                if (r < 0)
                {
                    f->dump_stream("error") << "error setting '" << var << "' to '" << valstr << "': " << cpp_strerror(r);
                }
                else
                {
                    ostringstream ss;
                    _conf->apply_changes(&ss);
                    f->dump_string("success", ss.str());
                }
            }
        }
        else if (command == "config get")
        {
            std::string var;
            if (!cmd_getval(this, cmdmap, "var", var))
            {
                f->dump_string("error", "syntax error: 'config get '");
            }
            else
            {
                char buf[4096];
                memset(buf, 0, sizeof(buf));
                char *tmp = buf;
                int r = _conf->get_val(var.c_str(), &tmp, sizeof(buf));
                if (r < 0)
                {
                    f->dump_stream("error") << "error getting '" << var << "': " << cpp_strerror(r);
                }
                else
                {
                    f->dump_string(var.c_str(), buf);
                }
            }

        }
        else if (command == "log flush")
        {
            _log->flush();
        }
        else if (command == "log dump")
        {
            _log->dump_recent();

        }
        else if (command == "log reopen")
        {
            _log->reopen_log_file();
        }
        else
        {
            assert(0 == "registered under wrong command");
        }
        f->close_section();
    }
    f->flush(*out);
    delete f;
    lgeneric_dout(this, 1) << "do_command '" << command << "' '" << ss.str()
                           << "result is " << out->length() << " bytes" << dendl;
};

毫不意外,前面注册的每一个方法,在do_command函数中都做了处理。

  

转自:http://blog.chinaunix.net/uid-24774106-id-5059727.html

posted on 2022-10-04 01:21  bdy  阅读(80)  评论(0编辑  收藏  举报

导航