KingbaseES V8R6 集群运维案例 -- ssh互信改为es_server通讯

     

原文:

[KingbaseES R6 集群禁用 root ssh 后需要修改集群为es_server 案例 - KINGBASE研究院 - 博客园](https://www.cnblogs.com/kingbase/p/15774419.html)

**案例说明:**
在通用机生产环境下,由于安全需要,集群节点之间不允许建立root用户的ssh互信连接,这样导致早期KingbaseES V8R6集群,通过sys_monitor.sh脚本启动集群时,节点之间不能通过ssh正常访问,导致集群启动失败。本案描述如何通过es_server和es_client建立节点之间的连接,代替ssh互信访问,保证集群节点的正常通讯。

**Tips:**
`现KingbaseES V8R6新版本,在不支持ssh互信的环境下,可以使用securecmdd工具执行节点间的通讯。本案例适用于在不支持securecmdd工具的版本下,可以通过es_server通讯作为过渡。`

**适用版本:**

KingbaseES V8R6


如下图所示,由于不能建立root用户的信任连接,导致sys_monitor.sh启动无法正常启动:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230171515497-1661707954.png)

**一、配置es_server启动(所有node)**

**1、es_server 配置:**
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230171554570-8192285.png)

**2、启动es_server:**

[kingbase@node3 bin]$ ./esHAmodel.sh start
[kingbase@node3 bin]$ ps -ef |grep es_server
kingbase 28024 1 0 15:18 pts/2 00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/es_server

[kingbase@node3 bin]$ netstat -an |grep 8890
tcp 0 0 0.0.0.0:8890 0.0.0.0:* LISTEN


**3、测试es_server的连接:**

[kingbase@node3 bin]$ ./es_client --help
es-client
Usage:
es-client [OPTION...] -o
Options:
-U, --username=NAME username for ES authentication
-h, --host=HOSTNAME ES Server host
-p, --port=PORT ES Server port number
-W, --password password
-d, --debug enable debug message (optional)
-?, --help print this help

-o, --option use user-define cmd: like "ls ."

[kingbase@node3 bin]$ ./es_client -h 192.168.7.248 -U kingbase -W 123456 -o "hostname"
node1

[kingbase@node3 bin]$ ./es_client -h 192.168.7.249 -U kingbase -W 123456 -o "hostname"
node2

---如上所示,es_client和es_server的连接测试成功。


**二、配置repmgr.conf支持bmj方式连接**

===如下图所示:在sys_monitor.sh脚本中,如果bmj=on,则使用es_server和es_client通讯,所以需修改repmgr.conf启动bmj通讯。===
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230171836952-227728149.png)

**1、配置repmgr.conf:(所有node)**

[kingbase@node3 bin]$ cat ../etc/repmgr.conf

启用bmj

on_bmj=on
node_id=3
node_name=node243
......
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'


如下图所示,当配置on_bmj=on,集群节点通讯会使用es_server代替ssh建立节点间的通讯:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230171937251-562089674.png)

**三、sys_monitor.sh启动集群测试**

[kingbase@node3 bin]$ ./sys_monitor.sh restart
.......
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | standby | running | node243 | running | 3589 | no | 0 second(s) ago
2 | node249 | witness | * running | node243 | running | 23739 | no | 0 second(s) ago
3 | node243 | primary | * running | | running | 30496 | no | n/a
sh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permittedsh: /etc/cron.d/KINGBASECRON: Permission deniedsh: /etc/logrotate.d/kingbase: Permission deniedchown: changing ownership of ‘/etc/logrotate.d/kingbase’: Operation not permittedchmod: changing permissions of ‘/etc/logrotate.d/kingbase’: Operation not permitted2021-03-01 15:26:44 Done.


如下图所示:sys_monitor.sh脚本启动访问“/etc/cron.d/KINGBASECRON”和“/etc/lograte.d/kingbase”文件时,出现权限错误:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172006207-938144038.png)

**Tips:**

1)/etc/cron.d/KINGBASECRON,是repmgr集群启动时建立的计划任务,用于启动repmgrd进程。
2)/etc/logrotate.d/kingbase,配置文件用于切割hamgr.log和kbha.log日志


sys_monitor.sh脚本中/etc/cron.d/KINGBASECRON相关配置:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172046242-1233653960.png)

sys_monitor.sh脚本中/etc/logrotate.d/kingbase相关配置:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172125119-521090712.png)

1)修改/etc/cron.d/KINGBASECRON文件相关权限(如下图所示)(所有node)
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172146695-276671904.png)

2)修改/etc/logrotate.d/kingbase相关权限(所有node)
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172210804-1058364912.png)

修改kingbase文件所有者:(所有node)
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172232718-2033934551.png)

注释sys_monitor.sh脚本中修改kingbase配置文件所有者和权限的语句:

function init_log_rotate()
{
_host="$1"
_final_target_file="/etc/logrotate.d/kingbase"
eval _rep_log_file=grep log_file ${rep_conf} | awk -F '=' '{print $2}'
execute_command ${super_user} $host "
echo -e '# Generate by sys_monitor.sh at date\n
${kbha_file} {\n
weekly\n
maxsize 100M\n
su ${execute_user} ${execute_user}\n
create 0600 ${execute_user} ${execute_user}\n
rotate 3\n
copytruncate\n
dateext\n
}\n
${_rep_log_file} {\n
weekly\n
maxsize 100M\n
su ${execute_user} ${execute_user}\n
create 0600 ${execute_user} ${execute_user}\n
rotate 3\n
copytruncate\n
dateext\n
}\n
' > ${_final_target_file}"

execute_command ${super_user} $host "chown ${super_user}😒{super_user} ${_final_target_file}"

execute_command ${super_user} $host "chmod 644 ${_final_target_file}"


如下图所示:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172314121-1358970462.png)

**四、测试集群启动**

[kingbase@node3 bin]$ ./sys_monitor.sh restart
2021-03-01 15:52:08 Ready to stop all DB ...
......
[2021-03-01 14:50:47] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
2021-03-01 15:52:53 repmgrd on "[192.168.7.249]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | standby | running | node243 | running | 13909 | no | 0 second(s) ago
2 | node249 | witness | * running | node243 | running | 28830 | no | n/a
3 | node243 | primary | * running | | running | 6643 | no | n/a
2021-03-01 15:52:53 Done.


如下图所示:集群启动正常
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172528866-1895035951.png)

**附件:/etc/logrotate.d/kingbase权限故障处理**

如下图所示:sys_monitor.sh脚本启动集群出现以下错误:
![](https://img2020.cnblogs.com/blog/2420370/202112/2420370-20211230172607064-1659072055.png)

**解决方案:**

[root@node3 ~]# which chmod
/usr/bin/chmod
[root@node3 ~]# which chown
/usr/bin/chown

[root@node3 ~]# ls -lh /usr/bin/chown
-rwxr-xr-x. 1 root root 62K Nov 20 2015 /usr/bin/chown
[root@node3 ~]# ls -lh /usr/bin/chmod
-rwxr-xr-x. 1 root root 58K Nov 20 2015 /usr/bin/chmod

[root@node3 ~]# chmod u+s /usr/bin/chown
[root@node3 ~]# chmod u+s /usr/bin/chmod

[root@node3 ~]# ls -lh /usr/bin/chmod
-rwsr-xr-x. 1 root root 58K Nov 20 2015 /usr/bin/chmod
[root@node3 ~]# ls -lh /usr/bin/chown
-rwsr-xr-x. 1 root root 62K Nov 20 2015 /usr/bin/chown

[root@node3 ~]# ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 kingbase kingbase 492 Mar 1 15:52 /etc/logrotate.d/kingbase

[root@node3 ~]# su - kingbase
Last login: Mon Mar 1 15:51:39 CST 2021 on pts/1
Last failed login: Mon Mar 1 15:58:21 CST 2021 from :0 on :0
There was 1 failed login attempt since the last successful login.
[kingbase@node3 ~]$ chown root.root /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 root root 492 Mar 1 15:52 /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ chown kingbase.kingbase /etc/logrotate.d/kingbase
[kingbase@node3 ~]$ ls -lh /etc/logrotate.d/kingbase
-rw-r--r--. 1 kingbase kingbase 492 Mar 1 15:52 /etc/logrotate.d/kingbase

手工执行“sh /etc/logrotate.d/kingbase”

[kingbase@node3 bin]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Permission denied
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found

[kingbase@node3 kingbase]$ chmod u+x kbha.log
[kingbase@node3 kingbase]$ sh /etc/logrotate.d/kingbase
/etc/logrotate.d/kingbase: line 2: /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log: Text file busy
/etc/logrotate.d/kingbase: line 3: weekly: command not found
/etc/logrotate.d/kingbase: line 4: maxsize: command not found
Password:


===通过以上处理,在通过sys_monitor.sh脚本启动集群时,仍然出现“sh /etc/logrotate.d/kingbase"错误,故修改了sys_monitor.sh脚本后,问题解决。===
posted @ 2022-01-07 11:36  KINGBASE研究院  阅读(192)  评论(0编辑  收藏  举报