KingbaseES V8R6集群运维案例之--- 禁用root建立ssh互信通过es_server通讯

案例说明:
在通用机生产环境下,由于安全需要,集群节点之间不允许建立root用户的ssh互信连接,这样导致早期KingbaseES V8R6集群,通过sys_monitor.sh脚本启动集群时,节点之间不能通过ssh正常访问,导致集群启动失败。本案描述如何通过es_server和es_client建立节点之间的连接,代替ssh互信访问,保证集群节点的正常通讯。

Tips:
现KingbaseES V8R6新版本,在不支持ssh互信的环境下,可以使用securecmdd工具执行节点间的通讯。本案例适用于在不支持securecmdd工具的版本下,可以通过es_server通讯作为过渡。

适用版本:

KingbaseES V8R6

如下图所示,由于不能建立root用户的信任连接,导致sys_monitor.sh启动无法正常启动:

一、配置es_server启动(所有node)

1、es_server 配置:

2、启动es_server:

[kingbase@node3 bin]$ ./esHAmodel.sh start
[kingbase@node3 bin]$ ps -ef |grep es_server
kingbase 28024     1  0 15:18 pts/2    00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/es_server

[kingbase@node3 bin]$ netstat -an |grep 8890
tcp        0      0 0.0.0.0:8890            0.0.0.0:*               LISTEN  

3、测试es_server的连接:

[kingbase@node3 bin]$ ./es_client --help
es-client 
Usage:
es-client [OPTION...] -o
Options:
  -U, --username=NAME    username for ES authentication
  -h, --host=HOSTNAME    ES Server host
  -p, --port=PORT        ES Server port number
  -W, --password         password
  -d, --debug            enable debug message (optional)
  -?, --help             print this help

  -o, --option           use user-define cmd: like "ls ."

[kingbase@node3 bin]$ ./es_client -h 192.168.7.248 -U kingbase -W 123456 -o "hostname"
node1

[kingbase@node3 bin]$ ./es_client -h 192.168.7.249 -U kingbase -W 123456 -o "hostname"
node2

---如上所示,es_client和es_server的连接测试成功。

二、配置repmgr.conf支持bmj方式连接

=如下图所示:在sys_monitor.sh脚本中,如果bmj=on,则使用es_server和es_client通讯,所以需修改repmgr.conf启动bmj通讯。=

1、配置repmgr.conf:(所有node)

[kingbase@node3 bin]$ cat ../etc/repmgr.conf
# 启用bmj
on_bmj=on
node_id=3
node_name=node243
......
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'

如下图所示,当配置on_bmj=on,集群节点通讯会使用es_server代替ssh建立节点间的通讯:

三、sys_monitor.sh启动集群测试

[kingbase@node3 bin]$ ./sys_monitor.sh restart
.......
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | standby |   running | node243  | running | 3589  | no      | 0 second(s) ago    
 2  | node249 | witness | * running | node243  | running | 23739 | no      | 0 second(s) ago    
 3  | node243 | primary | * running |          | running | 30496 | no      | n/a                
sh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permitted2021-03-01 15:26:44 Done.

如下图所示:sys_monitor.sh脚本启动访问“/etc/cron.d/KINGBASECRON”和“/etc/lograte.d/kingbase”文件时,出现权限错误:

Tips:

1)/etc/cron.d/KINGBASECRON,是repmgr集群启动时建立的计划任务,用于启动repmgrd进程。
2)/etc/logrotate.d/kingbase,配置文件用于切割hamgr.log和kbha.log日志

sys_monitor.sh脚本中/etc/cron.d/KINGBASECRON相关配置:

sys_monitor.sh脚本中/etc/logrotate.d/kingbase相关配置:

1)修改/etc/cron.d/KINGBASECRON文件相关权限(如下图所示)(所有node)

2)修改/etc/logrotate.d/kingbase相关权限(所有node)

修改kingbase文件所有者:(所有node)

注释sys_monitor.sh脚本中修改kingbase配置文件所有者和权限的语句:

function init_log_rotate()
{
_host="$1"
_final_target_file="/etc/logrotate.d/kingbase"
eval _rep_log_file=`grep log_file ${rep_conf} | awk -F '=' '{print $2}'`
execute_command ${super_user} $host "\
echo -e '# Generate by sys_monitor.sh at `date`\n\
${kbha_file} {\n\
        weekly\n\
        maxsize 100M\n\
        su ${execute_user} ${execute_user}\n\
        create 0600 ${execute_user} ${execute_user}\n\
        rotate 3\n\
        copytruncate\n\
        dateext\n\
}\n\
${_rep_log_file} {\n\
        weekly\n\
        maxsize 100M\n\
        su ${execute_user} ${execute_user}\n\
        create 0600 ${execute_user} ${execute_user}\n\
        rotate 3\n\
        copytruncate\n\
        dateext\n\
}\n\
' > ${_final_target_file}"
#execute_command ${super_user} $host "chown ${super_user}:${super_user} ${_final_target_file}"
#execute_command ${super_user} $host "chmod 644 ${_final_target_file}"

如下图所示:

四、测试集群启动

[kingbase@node3 bin]$ ./sys_monitor.sh restart
2021-03-01 15:52:08 Ready to stop all DB ...
......
[2021-03-01 14:50:47] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
2021-03-01 15:52:53 repmgrd on "[192.168.7.249]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | standby |   running | node243  | running | 13909 | no      | 0 second(s) ago    
 2  | node249 | witness | * running | node243  | running | 28830 | no      | n/a                
 3  | node243 | primary | * running |          | running | 6643  | no      | n/a                
2021-03-01 15:52:53 Done.

如下图所示:集群启动正常

附件:/etc/logrotate.d/kingbase权限故障处理

如下图所示:sys_monitor.sh脚本启动集群出现以下错误:

解决方案:

[root@node3 ~]# which chmod
/usr/bin/chmod
[root@node3 ~]# which chown
/usr/bin/chown

[root@node3 ~]# ls -lh /usr/bin/chown
-rwxr-xr-x. 1 root root 62K Nov 20  2015 /usr/bin/chown
[root@node3 ~]# ls -lh /usr/bin/chmod
-rwxr-xr-x. 1 root root 58K Nov 20  2015 /usr/bin/chmod

[root@node3 ~]# chmod u+s /usr/bin/chown
[root@node3 ~]# chmod u+s /usr/bin/chmod

[root@node3 ~]# ls -lh /usr/bin/chmod
-rwsr-xr-x. 1 root root 58K Nov 20  2015 /usr/bin/chmod
[root@node3 ~]# ls -lh /usr/bin/chown
-rwsr-xr-x. 1 root root 62K Nov 20  2015 /usr/bin/chown

[root@node3 ~]# ls -lh /etc/logrotate.d/kingbase 
-rw-r--r--. 1 kingbase kingbase 492 Mar  1 15:52 /etc/logrotate.d/kingbase

[root@node3 ~]# su - kingbase
Last login: Mon Mar  1 15:51:39 CST 2021 on pts/1
Last failed login: Mon Mar  1 15:58:21 CST 2021 from :0 on :0
There was 1 failed login attempt since the last successful login.
[kingbase@node3 ~]$ chown root.root /etc/logrotate.d/kingbase 
[kingbase@node3 ~]$ ls -lh  /etc/logrotate.d/kingbase 
-rw-r--r--. 1 root root 492 Mar  1 15:52 /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ chown kingbase.kingbase /etc/logrotate.d/kingbase 
[kingbase@node3 ~]$ ls -lh  /etc/logrotate.d/kingbase 
-rw-r--r--. 1 kingbase kingbase 492 Mar  1 15:52 /etc/logrotate.d/kingbase

#手工执行“sh /etc/logrotate.d/kingbase”
[kingbase@node3 bin]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Permission denied
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found

[kingbase@node3 kingbase]$ chmod u+x kbha.log
[kingbase@node3 kingbase]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Text file busy
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found
Password: 

=通过以上处理,在通过sys_monitor.sh脚本启动集群时,仍然出现“sh /etc/logrotate.d/kingbase"错误,故修改了sys_monitor.sh脚本后,问题解决。=

posted @ 2021-12-30 17:29  天涯客1224  阅读(187)  评论(0编辑  收藏  举报