作者:@张扶摇
本文为作者原创,转载请注明出处:https://www.cnblogs.com/zhangshengdong/p/12171963.html
目录
一、简介
1.1MHA角色
二、MHA的工具
三、MHA部署过程
3.1.1 配置
3.1.2 环境规划
3.1.3 配置一主多从
3.2 MHA配置
3.2.1 master权限授予
3.2.2 ssh互信
3.2.3 安装mha包
3.2.4 MHA管理节点配置
3.2.5 MHA节点检测
3.2.6 MHA启动
3.2.7 MHA模拟故障
3.2.8 修复已损坏的节点加入MHA中
MHA 问题集锦
一、简介
MHA的逻辑是,为了保证其MySQL的高可用,会有一个StandBy状态的master.在mysql故障切换的过程中,MHA 能做到在 0~30 秒内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA 能最大程度的保证数据的一致性,以达到相对意义上的高可用。
1.1MHA角色
如下图,整个 MHA 架构分为
- MHA Manager 节点
- MHA Node 节点
其中 MHA Manager 节点是单点部署,MHA Node 节点是部署在每个需要监控的 MySQL 集群节点上的。MHA Manager 会定时探测集群中的 Master 节点,当 Master 出现故障时,它可以自动将最新数据的 Standby Master 或 Slave 提升为新的 Master,然后将其他的 Slave 重新指向新的 Master。
二、MHA的工具
Manager节点:
masterha_check_ssh
:MHA 依赖的 ssh 环境监测工具;masterha_check_repl
:MYSQL 复制环境检测工具;masterga_manager
:MHA 服务主程序;masterha_check_status
:MHA 运行状态探测工具;masterha_master_monitor
:MYSQL master 节点可用性监测工具;masterha_master_swith:master
:节点切换工具;masterha_conf_host
:添加或删除配置的节点;masterha_stop
:关闭 MHA 服务的工具。
Node节点:(这些工具通常由MHA Manager的脚本触发,无需人为操作)
save_binary_logs
:保存和复制 master 的二进制日志;apply_diff_relay_logs
:识别差异的中继日志事件并应用于其他 slave;purge_relay_logs
:清除中继日志(不会阻塞 SQL 线程);
自定义扩展:
secondary_check_script
:通过多条网络路由检测master的可用性;master_ip_failover_script
:更新application使用的masterip;report_script
:发送报告;init_conf_load_script
:加载初始配置参数;master_ip_online_change_script
;更新master节点ip地址。
三、MHA部署过程
3.1.1 配置
MHA 对 MYSQL 复制环境有特殊要求,例如各节点都要开启二进制日志及中继日志,各从节点必须显示启用其read-only
属性,并关闭relay_log_purge
功能等,这里对配置做事先说明。
3.1.2 环境规划
机器名 | IP | 角色 | 备注 |
---|---|---|---|
manager | 172.30.200.100 | manager控制器 | 用于管理和故障切换 |
master | 172.30.200.101 | 数据库主服务器 | 开启binlog,relay-log。关闭relay_log_purge |
slave1 | 172.30.200.102 | 数据库从服务器 | 开启binlog,relay-log。关闭relay_log_purge |
slave2 | 172.30.200.103 | 数据库从服务器 | 开启binlog,relay-log。关闭relay_log_purge |
在各个节点的/etc/hosts文件配置内容添加如下:
[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.200.100 arpmgr
172.30.200.101 arpmaster
172.30.200.102 arpslave1
172.30.200.103 arpslave2
创建binlog的目录
mkdir -p /data/mysqldata/binlog
chown -R mysql:mysql /data/mysqldata/binlog
101节点配置:
server-id = 200101
log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14
lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1
102节点配置:
server-id = 200102
log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14
read_only = ON
relay_log_purge = 0
lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1
103节点配置:
server-id = 200103
log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
read_only = ON
relay_log_purge = 0
expire-logs-days = 14
lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1
3.1.3 配置一主多从
master节点配置:
MariaDB [(none)]>grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 | 548 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.000 sec)
slave节点配置:
grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';
change master to master_host='172.30.200.101',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000001',
master_log_pos=548;
start slave;
show slave status\G;
至此,一主多从配置完毕。
3.2 MHA配置
3.2.1 master权限授予
可以在所有节点上面配置,其拥有管理权限,目前只需在master结点上设置权限:
grant all on *.* to 'mhaadmin'@'172.30.%.%' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';
3.2.2 ssh互信
四个节点都执行如下语句:
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@arpmgr
然后在arpmgr
结点上面,可以看到authorized_keys
文件的信息内容如下:
[root@localhost .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY3yFhR5uzcFEpd+q+1Uw/cRF9ZRraygms7OZoefLFzY/ydSi6yYuCighG8WquvRep7XDNjFI71HAUagSoXiyPoCe1lqEnzpxSc+fQpIeQqEhUmLJ2bk+R83EskzwRGh+S/D4yp/swWz1vRgUGoTWevLCs33q7ZrsM8i+jB0uwZmzOV+CyQAPW9vLkRjZa4y1sx65lbR0HbdTQWQYZ4IyZauoU8XQjAIOs/CdLw2nBt8dPO53jT7NS7Ywx6eu/Wj9k/sYVVZT3jTb+pBIVs+Du5+tdUDX5aLKzxINpLlqNhorNevoC9iE0Ame1qvYonQfyWQ52Ae0y+58vFfG6PyV3 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/ZPihYSC6ArawKRU75aQRVSFsQ5S89SrYHGWdzyluB4spj+UDUmWH1kLGYr715/HD5hh22KdLmIs7R4jviOeao1HK52fpMvklYaNtYRHuV63Zkg5sOLvLfhrHdta9wuHlW1NyWx75+wIl2LvKBRtnSddwf5ZvitJ/kChf2gpNhHAWidyjGsPoJdr0OBCNHvz1y6oON6cnMb07ExaIjptRnkbCOU0QSVjFq4+Jmh8zTTbJC2up50s15gSfWXH0+WLXmJXJGkvgHdSYqw4vJt/l25f5qAKKZsfnyfC0iyct4GyHPF6trpvQ/c2lqr/Rg4xLWgdxlyt4aBJYl5adIRK/ root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDba26wV0KwQNTb4pKuiFDCcVMNRLGMXSiJC8ucN4/KIqzoOYJ747QL8GL5F8ePnRaZ1rtOwdjnlTiC0a4Tcg4JLs+JSnJgzvepuixmGgSJfLbJ36iN1WFh6fP2GZEDdR7Qum4sBUpQyYJ20Kf9rKfQQv2wq6csK5IlFk/OoO+zTySauLnYvRxvKY2avVDXPPFJvpqimKXn59MIAoJr6YEKvncbYyqvrSgUy7klZDys9IIjYcWfO7VKjQ5bwbHrrKtNbedME+KPQld7e8ZVL66Omik4Z6ip7DQEHRKWMmuBIpL99AgOOjPLbzJFWLUPOwvy3DtmEBnZ+0NVf/1obC11 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwrgZtGC31EgixeY4SVl4h64m1r8LdL3hM4Be2/I+6Xw7hCzZyKKTAFgz9W/ukfx6WmZwoqp1VO/7Jp6KO1FhYOi5u0q6J1KIObFNp+3E6cB2P0q39WqmZpQ9cNPYrbs9U2Ej0L0JwUtf/xLh334PaSlv/LcNy+p1dWya2OqsBeraiXZ4MgEBzcb+0twkpfpD327VgT/mRHPmA6fPRJOOJti1u4isHeotE4i13YIqQYfBfmbfiLdXKAvgI8FuTf0i91Re/FUBOgBfBcJbqIQNR0Nh5wZ/LvNxkstDQvypZIZwiK+wN+aZZOQ7jF/+997Z9QQleC9OOoHOJR7+fisLb root@localhost.localdomain
正好有四个形如ssh-rsa
相关 的密钥信息。
把如上的公钥信息,拷贝到其余四台服务器上面:
scp authorized_keys root@arpmaster:~/.ssh/
scp authorized_keys root@arpslave1:~/.ssh/
scp authorized_keys root@arpslave2:~/.ssh/
测试ssh是否可用
[root@localhost .ssh]# ssh arpmaster
[root@localhost ~]# ssh arpslave1
[root@localhost ~]# ssh arpslave2
[root@localhost ~]# ssh arpmgr
3.2.3 安装mha包
mha安装包分为两个,一个是node
,另外一个是manager
。
四个节点安装:mha4mysql-node-0.57-0.el7.centos.noarch.rpm
管理节点安装:mha4mysql-manager-0.57-0.el7.centos.noarch.rpm
在安装``mha4mysql-node-0.57-0.el7.centos.noarch.rpm过程中,有对
perl-DBD-mysql,
perl-DBI`前置依赖,安装步骤如下:
yum install perl-DBD-mysql perl-DBI
在安装``mha4mysql-manager-0.57-0.el7.centos.noarch.rpm过程中,有对
perl`前置依赖,安装步骤如下:
安装yum 扩展包
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-Config-Tiny perl-Log-Dispatch-* perl-Parallel-ForkManager
然后安装信息,都成功,如下:
[root@localhost ~]# rpm -ivh mha4mysql-node-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]
0.58中有一个super_read_only不可用在mariadb,所以使用0.57版本。
3.2.4 MHA管理节点配置
[root@localhost ~]# cd /etc/mha_master/
[root@localhost mha_master]# vi /etc/mha_master/mha.cnf
配置文件内容如下:
[server default]
user=mhaadmin
password=mha7101
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/data/mha_master/app1
repl_user=repl
repl_password=repl7101
ping_interval=1
[server1]
hostname=172.30.200.101
ssh_port=22
[server2]
hostname=172.30.200.102
ssh_port=22
candidate_master=1
[server3]
hostname=172.30.200.103
ssh_port=22
no_master=1
3.2.5 MHA节点检测
-
在管理节点检测ssh连通性如下:
[root@localhost ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
有如下日志,代表正常:
Thu Jan 9 14:43:09 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Thu Jan 9 14:43:09 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf.. root@172.30.200.103(172.30.200.103:22) to root@172.30.200.101(172.30.200.101:22).. Thu Jan 9 14:43:11 2020 - [debug] ok. Thu Jan 9 14:43:11 2020 - [debug] Connecting via SSH from root@172.30.200.103(172.30.200.103:22) to root@172.30.200.102(172.30.200.102:22).. Thu Jan 9 14:43:11 2020 - [debug] ok. Thu Jan 9 14:43:12 2020 - [info] All SSH connection tests passed successfully.
-
检测MySQL replication是否正常
masterha_check_repl --conf=/etc/mha_master/mha.cnf
有如下日志,说明正常:
Thu Jan 9 14:44:54 2020 - [info] Slaves settings check done. Thu Jan 9 14:44:54 2020 - [info] 172.30.200.101(172.30.200.101:3306) (current master) +--172.30.200.102(172.30.200.102:3306) +--172.30.200.103(172.30.200.103:3306) Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.102.. Thu Jan 9 14:44:54 2020 - [info] ok. Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.103.. Thu Jan 9 14:44:54 2020 - [info] ok. Thu Jan 9 14:44:54 2020 - [warning] master_ip_failover_script is not defined. Thu Jan 9 14:44:54 2020 - [warning] shutdown_script is not defined. Thu Jan 9 14:44:54 2020 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
3.2.6 MHA启动
-
启动mha manager:
nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
-
检测master节点状态:
[root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf mha (pid:31709) is running(0:PING_OK), master:172.30.200.101
说明主数据库
172.30.200.101
启动正常。 -
关闭mha manager:
masterha_stop -conf=/etc/mha_master/mha.cnf
3.2.7 MHA模拟故障
-
master
直接kill mysql节点[root@localhost ~]# ps -ef |grep mysql root 19864 1 0 08:51 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysqldata --pid-file=/data/mysqldata/localhost.localdomain.pid mysql 19976 19864 0 08:51 ? 00:00:13 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysqldata --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysqldata/mysqld.log --pid-file=/data/mysqldata/localhost.localdomain.pid --socket=/tmp/mysql.sock root 22166 21525 0 14:55 pts/0 00:00:00 grep --color=auto mysql [root@localhost ~]# kill -9 19864 19976
-
MHA
转移日志。[root@localhost ~]# tail -f /etc/mha_master/manager.log From: 172.30.200.101(172.30.200.101:3306) (current master) +--172.30.200.102(172.30.200.102:3306) +--172.30.200.103(172.30.200.103:3306) To: 172.30.200.102(172.30.200.102:3306) (new master) +--172.30.200.103(172.30.200.103:3306) Master 172.30.200.101(172.30.200.101:3306) is down! Check MHA Manager logs at localhost.localdomain:/etc/mha_master/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.30.200.102(172.30.200.102:3306) has all relay logs for recovery. Selected 172.30.200.102(172.30.200.102:3306) as a new master. 172.30.200.102(172.30.200.102:3306): OK: Applying all logs succeeded. 172.30.200.103(172.30.200.103:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.30.200.103(172.30.200.103:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.30.200.102(172.30.200.102:3306) 172.30.200.102(172.30.200.102:3306): Resetting slave info succeeded. Master failover to 172.30.200.102(172.30.200.102:3306) completed successfully.
从上述日志来看,
172.30.200.102
已经成为了新的master,而172.30.200.103
还是slave
数据库。
3.2.8 修复已损坏的节点加入MHA中
由于这里是实验环境,可以不到处mysqldump的备份。如果是生产环境恢复,可以停掉slave的SQL thread,记住对应的pos的位置,然后备份出数据,保证数据一致性之后,同步数据,恢复损坏的结点。
change master to master_host='172.30.200.102',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000003',
master_log_pos=401;
查看slave状态:
MariaDB [(none)]> start slave;
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.30.200.102
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 401
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
再次启动,如下:
[root@localhost ~]# nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
mha (pid:31905) is running(0:PING_OK), master:172.30.200.101
至此,MHA实验完毕。由于生产环境会用到VIP,后续会继续编写。
MHA 问题集锦
错误一
日志错误:
Thu Jan 9 11:31:36 2020 - [info] Connecting to root@172.30.200.102(172.30.200.102:22)..
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options
解决方法:
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
错误二
日志错误:
Checking if super_read_only is defined and turned on..DBI connect(';host=172.30.200.102;port=3306','mhaadmin',...) failed: Access denied for user 'mhaadmin'@'arpslave1' (using password: YES) at /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm line 239
解决方法:
manager节点,执行:
grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmaster' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave1' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave2' identified by 'mha7101';
错误三
日志如下:
Testing mysql connection and privileges..sh: mysql: 未找到命令
mysql command failed with rc 127:0!
解决方法:
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
感谢您的阅读,如果您觉得阅读本文对您有帮助,请点一下“推荐”按钮。本文欢迎各位转载,但是转载文章之后必须在文章页面中给出作者和原文连接。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· Manus的开源复刻OpenManus初探
· 三行代码完成国际化适配,妙~啊~
· .NET Core 中如何实现缓存的预热?
· 如何调用 DeepSeek 的自然语言处理 API 接口并集成到在线客服系统