Mariadb galera 无法添加新节点故障处理
Mariadb数据库已用了一段时间,最近为了HA,需要配置 Mariadb galera 集群
配置情况,节点数3,原数据库节点node1,新增node2,node3
系统:CentOS Stream release 8
mariadb版本:mariadb-server-10.3.28
一、数据库安装
新节点安装数据库相关包
1 | yum install mariadb mariadb-server python3-PyMySQL -y |
所有节点安装galara相关包
1 | yum install galera mariadb-server-galera -y |
node2,node3初始化数据库
1 2 3 4 5 6 7 8 | systemctl enable mariadb.service systemctl start mariadb.service mysql_secure_installation 配置密码 Remove anonymous users ? [Y /n ] y Disallow root login remotely? [Y /n ] n Remove test database and access to it? [Y /n ] y Reload privilege tables now? [Y /n ] y |
二、galera配置
node1配置如下,其他节点修改wsrep_node_address、wsrep_node_address参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # cat /etc/my.cnf.d/galera.cnf [mysqld] bind-address=0.0.0.0 binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 innodb_buffer_pool_size=122M wsrep_auto_increment_control=1 wsrep_causal_reads=0 wsrep_certify_nonPK=1 wsrep_cluster_name= "my_wsrep_cluster" wsrep_node_address=node1 wsrep_node_address=192.168.1.1 wsrep_cluster_address= "gcomm://192.168.1.1,192.168.2.1,192.168.1.3" wsrep_convert_LOCK_to_trx=0 wsrep_debug=0 wsrep_drupal_282555_workaround=0 wsrep_max_ws_rows=0 wsrep_max_ws_size=2147483647 wsrep_notify_cmd= wsrep_on=ON wsrep_provider= /usr/lib64/galera/libgalera_smm .so wsrep_provider_options= "gcache.size=300M; gcache.page_size=300M" wsrep_retry_autocommit=1 wsrep_slave_threads=1 wsrep_sst_method= rsync |
三、启动服务
node1节点执行
1 | galera_new_cluster |
如果mariadb服务已开启,需要先关闭
node2,node3执行
1 | systemctl restart mariadb.service |
正常情况下,应该就已经可以了
四、问题
我按以上步骤完成后,发现node2和node3,无法启动
1 2 3 | # systemctl restart mariadb Job for mariadb.service failed because a fatal signal was delivered to the control process. See "systemctl status mariadb.service" and "journalctl -xe" for details. |
无论重装,调整配置,怎么弄都不行
日志报错
1 2 3 4 5 6 7 | # grep -Ei "err|war" /var/log/mariadb/mariadb.log WSREP_SST: [ERROR] Parent mysqld process (PID:379774) terminated unexpectedly. (20221009 11:01:54.254) 2022-10-09 11:01:59 0 [Warning] WSREP: access file ( /var/lib/mysql//gvwstate .dat) failed(No such file or directory) 2022-10-09 11:02:00 1 [Warning] WSREP: Gap in state sequence. Need state transfer. 2022-10-09 11:02:00 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ab886bd7-46d6-11ed-8a83-fe4004c311ab): 1 (Operation not permitted) 2022-10-09 11:02:01 0 [Warning] WSREP: 0.0 (node-1): State transfer to 1.0 (node-2) failed: -255 (Unknown error 255) 2022-10-09 11:02:01 0 [ERROR] WSREP: gcs /src/gcs_group .cpp:gcs_group_handle_join_msg():780: Will never receive state. Need to abort. |
谷歌百度搜遍了也无法解决
a,删除galera.cache、grastate.dat、gvwstate.dat文件 (无效)我甚至将galera相关配置及文件全部删除,重新创建或安装,都不行
b,修改mariadb.service的TimeoutSec,(无效)
c,wsrep_cluster_address配置的地址顺序等(无效),这方案看着就不太靠谱,死马当活马医了
d,防火墙,selinux等等,(无效)
还有一些奇葩方法,一点用都没
直到后来无意中在/var/log/message中看到一条关于rsync的报错
1 2 3 4 5 6 7 | rsyncd[380389]: rsyncd version 3.1.3 starting, listening on port 4444 rsyncd[380409]: connect from node1 (192.168.0.1) rsyncd[380409]: rsync to rsync_sst/ from node1 (192.168.0.1) rsyncd[380409]: rsync : on remote machine: --sparse-block=1024: unknown option rsyncd[380409]: rsync error: requested action not supported (code 4) at clientserver.c(971) [Receiver=3.1.3] rsyncd[380389]: sent 0 bytes received 0 bytes total size 0 rsyncd[380605]: rsyncd version 3.1.3 starting, listening on port 4444 |
结合mariadb.log中rsync的日志
1 2 3 | 2022-10-09 3:27:39 2 [Warning] WSREP: Gap in state sequence. Need state transfer. 2022-10-09 3:27:39 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role ' joiner ' --address ' 192.168.0.2 ' --datadir ' /var/lib/mysql/ ' --parent ' 3693645 ' --mysqld-args --basedir=/usr' 2022-10-09 3:27:40 2 [Note] WSREP: Prepared SST request: rsync |192.168.0.2:4444 /rsync_sst <br>2022-10-09 3:27:40 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-10-09 3:27:40 2 [Note] WSREP: Assign initial position for certification: 237433, protocol version: 4 2022-10-09 3:27:40 0 [Note] WSREP: Service thread queue flushed. 2022-10-09 3:27:40 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ab886bd7-46d6-11ed-8a83-fe4004c311ab): 1 (Operation not permitted) at galera /src/replicator_str .cpp:prepare_for_IST():467. IST will be unavailable. |
怀疑是rsync有问题,可能版本太低,导致无法识别--sparse-block=1024这个选项,从而导致同步失败无法启动mariadb
于是顺手升级下rsync
1 | # yum update rsync |
再次启动mariadb
1 | # systemctl restart mariadb |
居然启动成功了,热泪盈眶啊
原版本:rsync-3.1.3-14.el8.2.x86_64
新版本:rsync-3.1.3-19.el8.x86_64
1 2 3 4 5 6 7 8 9 10 11 | # rpm -qa |grep rsync rsync -3.1.3-14.el8.2.x86_64 # rsync --help |grep sparse -S, --sparse turn sequences of nulls into sparse blocks # rpm -qa |grep rsync rsync -3.1.3-19.el8.x86_64 # rsync --help |grep sparse -S, --sparse turn sequences of nulls into sparse blocks --sparse-block=SIZE set block size used to handle sparse files |
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 手把手教你更优雅的享受 DeepSeek
· AI工具推荐:领先的开源 AI 代码助手——Continue
· 探秘Transformer系列之(2)---总体架构
· V-Control:一个基于 .NET MAUI 的开箱即用的UI组件库
· 乌龟冬眠箱湿度监控系统和AI辅助建议功能的实现