【ceph】Admin Socket机制|ceph dump 命令原理
目录
Admin Socket机制--->进程间通信
前端python进程,后端C++进程。
ceph 命令下发流程:命令行-->python 前端部分处理-->Admin Socket后端部分处理
前端部分 python
https://blog.csdn.net/bandaoyu/article/details/123746755
后端部分 Admin Socket
任何一个成熟的项目,必须要提供出接口,就像探针一样,可以让我们探查进程内部的运行情况,进程不能是一个黑盒子。对于ceph而言,Admin Socket 提供了该功能。
Admin Socket 不仅可以查看当前的配置,进程运行的状态,而且,还可以修改配置,获取log等。
下面help可以看出,ceph给出了很多命令,来了解ceph内部的运行情况。
root@test3:~# ceph daemon /var/run/ceph/ceph-osd.4.asok help
{ "config get": "config get : get the config value",
"config set": "config set [ ...]: set a config variable",
"config show": "dump current config settings",
"dump_blacklist": "dump blacklisted clients and times",
"dump_historic_ops": "show slowest recent ops",
"dump_op_pq_state": "dump op priority queue state",
……
}
root@test3:~# ceph daemon /var/run/ceph/ceph-mon.*.asok help
{ "add_bootstrap_peer_hint": "add peer address as potential bootstrap peer for cluster bringup",
"config get": "config get : get the config value",
"config set": "config set [ ...]: set a config variable",
"config show": "dump current config settings",
"get_command_descriptions": "list available commands",
"git_version": "get git sha1",
"help": "list available commands",
"log dump": "dump recent log entries to log file",
……
}
root@test3:~# ceph daemon /var/run/ceph/ceph-mds.*.asok help
{ "config get": "config get : get the config value",
"config set": "config set [ ...]: set a config variable",
"config show": "dump current config settings",
"get_command_descriptions": "list available commands",
"git_version": "get git sha1",
……
}
比如可以查看 ceph的各个模块的当前配置
ceph daemon /var/run/ceph/ceph-mds.*.asok config show
ceph daemon /var/run/ceph/ceph-mon.*.asok config show
ceph daemon /var/run/ceph/ceph-osd.4.asok config show
这个是怎么实现的呢? 这是依靠ceph的AdminSocket机制完成。
CephContext中会创建一个AdminSocket对象,该对象本质是一个线程。
ceph-mon/ceph-osd/ceph-mds这些进程都会有创建一个AdminSocket的线程,负责响应用户的探查命令。
从上面的OSD MON MDS help的不同输出可以看出,他们支持的命令有共同的,也有各自独立的。
首先是共同支持的命令有:
- config show 显示所有的配置项
- config get 获取某个配置项
- config set 设置某个配置项
- log flush 将log 刷入日志文件
- log dump 将最近的若干笔log刷入到log文件
- log reopen 重新打开log文件
- perf dump 输出统计信息
- perf schema 输出统计信息的类型
- version 版本信息
- git_version : git 版本信息
支持的大部分common的命令,注册发生在 CephContext的构造函数:
_admin_hook = new CephContextHook(this);
_admin_socket->register_command("perfcounters_dump", "perfcounters_dump", _admin_hook, "");
_admin_socket->register_command("1", "1", _admin_hook, "");
_admin_socket->register_command("perf dump", "perf dump", _admin_hook, "dump perfcounters value");
_admin_socket->register_command("perfcounters_schema", "perfcounters_schema", _admin_hook, "");
_admin_socket->register_command("2", "2", _admin_hook, "");
_admin_socket->register_command("perf schema", "perf schema", _admin_hook, "dump perfcounters schema");
_admin_socket->register_command("config show", "config show", _admin_hook, "dump current config settings");
_admin_socket->register_command("config set", "config set name=var,type=CephString name=val,type=CephString,n=N", _admin_hook, "config set [ ...]: set a config variable");
_admin_socket->register_command("config get", "config get name=var,type=CephString", _admin_hook, "config get : get the config value");
_admin_socket->register_command("log flush", "log flush", _admin_hook, "flush log entries to log file");
_admin_socket->register_command("log dump", "log dump", _admin_hook, "dump recent log entries to log file");
_admin_socket->register_command("log reopen", "log reopen", _admin_hook, "reopen log file")
(一个AdminSocketHook(_admin_hook)可以对应多个command,里面再细分)
首先定义了个CephContextHook,注册该Hook也作为一个参数传递进去,这个register_command比较简单,就是建立了command字符串和Hook的关联,这个关联有何作用,后面会分析到,暂时按下不表。
int AdminSocket::register_command(std::string command, std::string cmddesc, AdminSocketHook *hook, std::string help)
{
int ret;
m_lock.Lock();
if (m_hooks.count(command)) {
ldout(m_cct, 5) << "register_command " << command << " hook " << hook << " EEXIST" << dendl;
ret = -EEXIST;
} else {
ldout(m_cct, 5) << "register_command " << command << " hook " << hook << dendl;
m_hooks[command] = hook;
m_descs[command] = cmddesc;
m_help[command] = help;
ret = 0;
}
m_lock.Unlock();
return ret;
}
后面从注册部分的代码我们可以知道下面三个命令是等效的。
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perf dump
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perfcounters_dump
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok 1
还有下面三个命令是等效的:
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perf schema
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok perfcounters_schema
ceph daemon /var/run/ceph/ceph-mds.bfudz.asok 2
ceph-osd/ceph-mon/ceph-mds有一些自己的独特的注册函数,大家检索register_command可以检索的到这些独特的命令。
这个AdminSocket 对应的线程,在common_init_finish函数中负责创建。
void common_init_finish(CephContext *cct)
{
ceph::crypto::init(cct);
cct->start_service_thread();
if (cct->_conf->lockdep) {
g_lockdep = true;
ldout(cct,0) << "lockdep is enabled" << dendl;
lockdep_register_ceph_context(cct);
}
}
而在start_service_thread函数中会调用adminsocket的init函数:
void CephContext::start_service_thread()
{
pthread_spin_lock(&_service_thread_lock);
if (_service_thread) {
pthread_spin_unlock(&_service_thread_lock);
return;
}
_service_thread = new CephContextServiceThread(this);
_service_thread->create();
pthread_spin_unlock(&_service_thread_lock);
// make logs flush on_exit()
if (_conf->log_flush_on_exit)
_log->set_flush_on_exit();
// Trigger callbacks on any config observers that were waiting for
// it to become safe to start threads.
_conf->set_val("internal_safe_to_start_threads", "true");
_conf->call_all_observers();
// start admin socket
if (_conf->admin_socket.length())
_admin_socket->init(_conf->admin_socket); <------------------------------------init
}
接下来可以分析下admin_socket的初始化函数init:
bool AdminSocket::init(const std::string &path)
{
ldout(m_cct, 5) << "init " << path << dendl;
/* Set up things for the new thread */
std::string err;
int pipe_rd = -1, pipe_wr = -1;
err = create_shutdown_pipe(&pipe_rd, &pipe_wr);
if (!err.empty()) {
lderr(m_cct) << "AdminSocketConfigObs::init: error: " << err << dendl;
return false;
}
int sock_fd;
err = bind_and_listen(path, &sock_fd);
if (!err.empty()) {
lderr(m_cct) << "AdminSocketConfigObs::init: failed: " << err << dendl;
close(pipe_rd);
close(pipe_wr);
return false;
}
/* Create new thread */
m_sock_fd = sock_fd;
m_shutdown_rd_fd = pipe_rd;
m_shutdown_wr_fd = pipe_wr;
m_path = path;
m_version_hook = new VersionHook;
register_command("0", "0", m_version_hook, "");
register_command("version", "version", m_version_hook, "get ceph version");
register_command("git_version", "git_version", m_version_hook, "get git sha1");
m_help_hook = new HelpHook(this);
register_command("help", "help", m_help_hook, "list available commands");
m_getdescs_hook = new GetdescsHook(this);
register_command("get_command_descriptions", "get_command_descriptions",
m_getdescs_hook, "list available commands");
create();
add_cleanup_file(m_path.c_str());
return true;
}
首先是创建了管道,读取端的文件描述符记录在m_shutdown_rd_fd中,写入端的文件描述符记录在m_shutdown_wr_fd中。
从变量名字也可以看出,该文件描述符的作用是收取关闭信息。因为adminsocket一旦创建,必须能够通知到该线程及时退出。
退出的事情会写入管道的写入端,而线程会通过多路复用接口,监听读取端,一旦发现m_shutdown_rd_fd中读出内容,线程就知道,可以退出了。
按下不表。
AdminSocket最重要的是监听发过来的请求,它是用socket来实现的,初始化在bind_and_listen 函数:
std::string AdminSocket::bind_and_listen(const std::string &sock_path, int *fd)
{
ldout(m_cct, 5) << "bind_and_listen " << sock_path << dendl;
struct sockaddr_un address;
if (sock_path.size() > sizeof(address.sun_path) - 1) {
ostringstream oss;
oss << "AdminSocket::bind_and_listen: "
<< "The UNIX domain socket path " << sock_path << " is too long! The "
<< "maximum length on this system is "
<< (sizeof(address.sun_path) - 1);
return oss.str();
}
int sock_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if (sock_fd < 0) {
int err = errno;
ostringstream oss;
oss << "AdminSocket::bind_and_listen: "
<< "failed to create socket: " << cpp_strerror(err);
return oss.str();
}
int r = fcntl(sock_fd, F_SETFD, FD_CLOEXEC);
if (r < 0) {
r = errno;
TEMP_FAILURE_RETRY(::close(sock_fd));
ostringstream oss;
oss << "AdminSocket::bind_and_listen: failed to fcntl on socket: " << cpp_strerror(r);
return oss.str();
}
memset(&address, 0, sizeof(struct sockaddr_un));
address.sun_family = AF_UNIX;
snprintf(address.sun_path, sizeof(address.sun_path),
"%s", sock_path.c_str());
if (bind(sock_fd, (struct sockaddr*)&address,
sizeof(struct sockaddr_un)) != 0) {
int err = errno;
if (err == EADDRINUSE) {
AdminSocketClient client(sock_path);
bool ok;
client.ping(&ok);
if (ok) {
ldout(m_cct, 20) << "socket " << sock_path << " is in use" << dendl;
err = EEXIST;
} else {
ldout(m_cct, 20) << "unlink stale file " << sock_path << dendl;
TEMP_FAILURE_RETRY(unlink(sock_path.c_str()));
if (bind(sock_fd, (struct sockaddr*)&address,
sizeof(struct sockaddr_un)) == 0) {
err = 0;
} else {
err = errno;
}
}
}
if (err != 0) {
ostringstream oss;
oss << "AdminSocket::bind_and_listen: "
<< "failed to bind the UNIX domain socket to '" << sock_path
<< "': " << cpp_strerror(err);
close(sock_fd);
return oss.str();
}
}
if (listen(sock_fd, 5) != 0) {
int err = errno;
ostringstream oss;
oss << "AdminSocket::bind_and_listen: "
<< "failed to listen to socket: " << cpp_strerror(err);
close(sock_fd);
TEMP_FAILURE_RETRY(unlink(sock_path.c_str()));
return oss.str();
}
*fd = sock_fd;
return "";
}
这个函数做的事情并不难理解,做的事情比较老套:
1 创建一个socket,该函数需要一个入参,指定socket路径名:
2 bind
3 listen
而传入的路径名是这个
root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok config get admin_socket
{ "admin_socket": "\/var\/run\/ceph\/ceph-mds.bfudz.asok"}
socket建好之后,因为线程还没有创建,所以至今还没有accept接口 。然后我们再次回到init函数,该函数又注册了几个函数:
m_version_hook = new VersionHook;
register_command("0", "0", m_version_hook, "");
register_command("version", "version", m_version_hook, "get ceph version");
register_command("git_version", "git_version", m_version_hook, "get git sha1");
m_help_hook = new HelpHook(this);
register_command("help", "help", m_help_hook, "list available commands");
m_getdescs_hook = new GetdescsHook(this);
register_command("get_command_descriptions", "get_command_descriptions",
m_getdescs_hook, "list available commands");
这几个函数的重要性并不大,基本是用来查版本信息的,如下所示,按下不表。
root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok version
{"version":"0.67.9-222-g014b35f"}
root@test3:/var/run/ceph# ceph daemon /var/run/ceph/ceph-mds.bfudz.asok git_version
{"git_version":"014b35fc1ee0a1ad1f699a3705f3481a88614d36"}
root@test3:/var/run/ceph#
init函数最后调用了create函数。create函数是老朋友,前面分析Log的时候已经提到,对于Thread这个类,做的事情无非就是创建线程。关键内容是,线程执行的函数是哪个?
和Log一样,是entry函数。AdminSocket类也有entry函数,该函数是AdminSocket 对应线程指定的函数:
void* AdminSocket::entry()
{
ldout(m_cct, 5) << "entry start" << dendl;
while (true) {
struct pollfd fds[2];
memset(fds, 0, sizeof(fds));
fds[0].fd = m_sock_fd;
fds[0].events = POLLIN | POLLRDBAND;
fds[1].fd = m_shutdown_rd_fd;
fds[1].events = POLLIN | POLLRDBAND;
int ret = poll(fds, 2, -1);
if (ret < 0) {
int err = errno;
if (err == EINTR) {
continue;
}
lderr(m_cct) << "AdminSocket: poll(2) error: '"
<< cpp_strerror(err) << dendl;
return PFL_FAIL;
}
if (fds[0].revents & POLLIN) {
// Send out some data
do_accept();
}
if (fds[1].revents & POLLIN) {
// Parent wants us to shut down
return PFL_SUCCESS;
}
}
ldout(m_cct, 5) << "entry exit" << dendl;
}
这个线程函数比较简单,它监听socket fd和管道的读取端。
1 管道的读取端负责管理何时退出
2 socket fd 负责监听用户发过来的指令。
处理用户发过来的命令,是do_accept函数干的事情:
bool AdminSocket::do_accept()
{
struct sockaddr_un address;
socklen_t address_length = sizeof(address);
ldout(m_cct, 30) << "AdminSocket: calling accept" << dendl;
int connection_fd = accept(m_sock_fd, (struct sockaddr *) &address,
&address_length);
ldout(m_cct, 30) << "AdminSocket: finished accept" << dendl;
if (connection_fd < 0)
{
int err = errno;
lderr(m_cct) << "AdminSocket: do_accept error: '"
<< cpp_strerror(err) << dendl;
return false;
}char cmd[1024];
int pos = 0;
string c;
while (1)
{
int ret = safe_read(connection_fd, &cmd[pos], 1);
if (ret <= 0)
{
lderr(m_cct) << "AdminSocket: error reading request code: "
<< cpp_strerror(ret) << dendl;
close(connection_fd);
return false;
}
//ldout(m_cct, 0) << "AdminSocket read byte " << (int)cmd[pos] << " pos " << pos << dendl;
if (cmd[0] == '\0')
{
// old protocol: __be32
if (pos == 3 && cmd[0] == '\0')
{
switch (cmd[3]) //<-------------------------------按数字搜索指令
{
case 0:
c = "0";
break;
case 1:
c = "perfcounters_dump"; //<-----数字1对应perfcounters_dump指令,交给c
break;
case 2:
c = "perfcounters_schema";
break;
default:
c = "foo";
break;
}
break;
}
}
else
{
// new protocol: null or \n terminated string
if (cmd[pos] == '\n' || cmd[pos] == '\0')
{
cmd[pos] = '\0';
c = cmd; //<------------传入的如果是字符指令,直接把指令字符给c
break;
}
}
pos++;
}bool rval = false;
map<string, cmd_vartype> cmdmap;
string format;
vector<string> cmdvec;
stringstream errss;
cmdvec.push_back(cmd);
if (!cmdmap_from_json(cmdvec, &cmdmap, errss))
{
ldout(m_cct, 0) << "AdminSocket: " << errss << dendl;
return false;
}
cmd_getval(m_cct, cmdmap, "format", format);
if (format != "json" && format != "json-pretty" &&
format != "xml" && format != "xml-pretty")
format = "json-pretty";
cmd_getval(m_cct, cmdmap, "prefix", c);string firstword;
if (c.find(" ") == string::npos)
firstword = c;
else
firstword = c.substr(0, c.find(" "));m_lock.Lock();
map<string, AdminSocketHook *>::iterator p;
string match = c;
while (match.size())
{
p = m_hooks.find(match); //<---在m_hooks按字符指令找注册的AdminSocketHook
if (p != m_hooks.end())
break;// drop right-most word
size_t pos = match.rfind(' ');
if (pos == std::string::npos)
{
match.clear(); // we fail
break;
}
else
{
match.resize(pos);
}
}bufferlist out;
if (p == m_hooks.end())
{
lderr(m_cct) << "AdminSocket: request '" << c << "' not defined" << dendl;
}
else
{
string args;
if (match != c)
args = c.substr(match.length() + 1);
bool success = p->second->call(match, cmdmap, format, out);//执行call处理请求
if (!success)
{
ldout(m_cct, 0) << "AdminSocket: request '" << match << "' args '" << args
<< "' to " << p->second << " failed" << dendl;
out.append("failed");
}
else
{
ldout(m_cct, 5) << "AdminSocket: request '" << match << "' '" << args
<< "' to " << p->second
<< " returned " << out.length() << " bytes" << dendl;
}
uint32_t len = htonl(out.length());
int ret = safe_write(connection_fd, &len, sizeof(len));
if (ret < 0)
{
lderr(m_cct) << "AdminSocket: error writing response length "
<< cpp_strerror(ret) << dendl;
}
else
{
if (out.write_fd(connection_fd) >= 0)
rval = true;
}
}
m_lock.Unlock();TEMP_FAILURE_RETRY(close(connection_fd));
return rval;
这个函数有点长,但是并不复杂。简单说,如果有个client尝试 connect ,该线程就poll就会感知到,然后进入do_accept函数。
do_accept首先执行accept,和client 搭上线,然后开始通信。
safe_read负责 读取客户发过来的指令。前面已经提到过,AdminSocket支持的命令是有限的,初始化之前都已注册过了。
如果client 发来的指令时注册过的指令,就见招拆招,返回相应的结果给客户端。
每一个命令的字符串,都是和一个AdminSocketHook 的类型关联的,但是一个AdminSocketHook可以对应多个command
std::map<std::string,AdminSocketHook*> m_hooks
比如说 config show / config get / perf dump对应的都是 CephContextHook,前面已经提到过。
见招拆招的函数,就记录在对应的Hook上:
class CephContextHook : public AdminSocketHook {
CephContext *m_cct;
public:
CephContextHook(CephContext *cct) : m_cct(cct) {}
bool call(std::string command, cmdmap_t& cmdmap, std::string format,
bufferlist& out) {
m_cct->do_command(command, cmdmap, format, &out);
return true;
}
};
do_accept函数中黄色的一行,具体是实现,就是对应Hook的call函数,对于CephContextHook,就是这个类的call方法。
下面我们看下CephContextHook的call方法:即它的do_command函数:
根据命令:perf dump 执行: _perf_counters_collection->dump_formatted(f, false);
void CephContext::do_command(std::string command, cmdmap_t &cmdmap,
std::string format, bufferlist *out)
{
Formatter *f = new_formatter(format);
if (!f)
f = new_formatter("json-pretty");
stringstream ss;
for (cmdmap_t::iterator it = cmdmap.begin(); it != cmdmap.end(); ++it)
{
if (it->first != "prefix")
{
ss << it->first << ":" << cmd_vartype_stringify(it->second) << " ";}
}
lgeneric_dout(this, 1) << "do_command '" << command << "' '"
<< ss.str() << dendl;
if (command == "perfcounters_dump" || command == "1" ||
command == "perf dump")
{
_perf_counters_collection->dump_formatted(f, false);}
else if (command == "perfcounters_schema" || command == "2" ||
command == "perf schema")
{
_perf_counters_collection->dump_formatted(f, true);
}
else
{
f->open_object_section(command.c_str());
if (command == "config show")
{
_conf->show_config(f);
}
else if (command == "config set")
{
std::string var;
std::vector<std::string> val;if (!(cmd_getval(this, cmdmap, "var", var)) ||
!(cmd_getval(this, cmdmap, "val", val)))
{
f->dump_string("error", "syntax error: 'config set '");
}
else
{
// val may be multiple words
string valstr = str_join(val, " ");
int r = _conf->set_val(var.c_str(), valstr.c_str());
if (r < 0)
{
f->dump_stream("error") << "error setting '" << var << "' to '" << valstr << "': " << cpp_strerror(r);
}
else
{
ostringstream ss;
_conf->apply_changes(&ss);
f->dump_string("success", ss.str());
}
}
}
else if (command == "config get")
{
std::string var;
if (!cmd_getval(this, cmdmap, "var", var))
{
f->dump_string("error", "syntax error: 'config get '");
}
else
{
char buf[4096];
memset(buf, 0, sizeof(buf));
char *tmp = buf;
int r = _conf->get_val(var.c_str(), &tmp, sizeof(buf));
if (r < 0)
{
f->dump_stream("error") << "error getting '" << var << "': " << cpp_strerror(r);
}
else
{
f->dump_string(var.c_str(), buf);
}
}}
else if (command == "log flush")
{
_log->flush();
}
else if (command == "log dump")
{
_log->dump_recent();}
else if (command == "log reopen")
{
_log->reopen_log_file();
}
else
{
assert(0 == "registered under wrong command");
}
f->close_section();
}
f->flush(*out);
delete f;
lgeneric_dout(this, 1) << "do_command '" << command << "' '" << ss.str()
<< "result is " << out->length() << " bytes" << dendl;
};
毫不意外,前面注册的每一个方法,在do_command函数中都做了处理。