mha0.56版本安装使用排错
1.master_check_ssh --conf=/etc/app1.conf
这个检查就报错的我觉得百分之九十都是ssh之间连接问题。务必要保证各节点之间都可以免秘钥访问!
2.master_check_repl --conf=/etc/app1.conf
(1)报错代码:
类似就是说什么copyuser复制用户在节点没有权限的代码,解决方法是每个节点创建这个用户即可。要是主从复制已经开启,记得节点先stop slave; 再分别创建用户。
MHA版本,应该需要在所有的数据库中都开启二进制日志,中继日志,授权也应该都相同,配置文件也基本相同。我想在这个前提下在安装执行MHA应该不会遇上太多问题了。只是目前还不能确定这种做法是不是正解。
(2)报错代码:
Tue Apr 30 09:26:44 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Apr 30 09:26:44 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Tue Apr 30 09:26:44 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Tue Apr 30 09:26:44 2019 - [info] MHA::MasterMonitor version 0.56. Tue Apr 30 09:26:45 2019 - [info] GTID failover mode = 0 Tue Apr 30 09:26:45 2019 - [info] Dead Servers: Tue Apr 30 09:26:45 2019 - [info] Alive Servers: Tue Apr 30 09:26:45 2019 - [info] 103.75.1.22(103.75.1.22:3306) Tue Apr 30 09:26:45 2019 - [info] 103.75.1.23(103.75.1.23:3306) Tue Apr 30 09:26:45 2019 - [info] 103.75.1.24(103.75.1.24:3306) Tue Apr 30 09:26:45 2019 - [info] Alive Slaves: Tue Apr 30 09:26:45 2019 - [info] 103.75.1.23(103.75.1.23:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled Tue Apr 30 09:26:45 2019 - [info] Replicating from 103.75.1.22(103.75.1.22:3306) Tue Apr 30 09:26:45 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Apr 30 09:26:45 2019 - [info] 103.75.1.24(103.75.1.24:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled Tue Apr 30 09:26:45 2019 - [info] Replicating from 103.75.1.22(103.75.1.22:3306) Tue Apr 30 09:26:45 2019 - [info] Current Alive Master: 103.75.1.22(103.75.1.22:3306) Tue Apr 30 09:26:45 2019 - [info] Checking slave configurations.. Tue Apr 30 09:26:45 2019 - [info] read_only=1 is not set on slave 103.75.1.24(103.75.1.24:3306). Tue Apr 30 09:26:45 2019 - [info] Checking replication filtering settings.. Tue Apr 30 09:26:45 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Apr 30 09:26:45 2019 - [info] Replication filtering check ok. Tue Apr 30 09:26:45 2019 - [info] GTID (with auto-pos) is not supported Tue Apr 30 09:26:45 2019 - [info] Starting SSH connection tests.. Tue Apr 30 09:26:53 2019 - [info] All SSH connection tests passed successfully. Tue Apr 30 09:26:53 2019 - [info] Checking MHA Node version.. Tue Apr 30 09:26:57 2019 - [info] Version check ok. Tue Apr 30 09:26:57 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Apr 30 09:26:58 2019 - [info] HealthCheck: SSH to 103.75.1.22 is reachable. Tue Apr 30 09:26:59 2019 - [info] Master MHA Node version is 0.56. Tue Apr 30 09:26:59 2019 - [info] Checking recovery script configurations on 103.75.1.22(103.75.1.22:3306).. Tue Apr 30 09:26:59 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008 Tue Apr 30 09:26:59 2019 - [info] Connecting to root@103.75.1.22(103.75.1.22:22).. Failed to save binary log: Binlog not found from /data! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66 Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln158] Binlog setting check failed! Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln405] Master configuration failed. Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48 Tue Apr 30 09:27:00 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Tue Apr 30 09:27:00 2019 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!
解决方法:
如果手动定义了二进制日志文件的路径,就必须在mha的配置文件中制定master_binlog_dir=‘二进制日志文件所在目录' 我是直接在app1.conf配置文件#注释掉这个master_binlog_dir=/data
(3)报错代码:
Tue Apr 30 10:04:21 2019 - [info] Checking replication health on 103.75.1.23.. Tue Apr 30 10:04:21 2019 - [info] ok. Tue Apr 30 10:04:21 2019 - [info] Checking replication health on 103.75.1.24.. Tue Apr 30 10:04:21 2019 - [info] ok. Tue Apr 30 10:04:21 2019 - [warning] master_ip_failover_script is not defined. Tue Apr 30 10:04:21 2019 - [warning] shutdown_script is not defined. Tue Apr 30 10:04:21 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
这个报错代码出现在检查的最后面,意思是未定义这两个文件。未定义这两个文件我直接启动manage是卡住的。解决方法,在app1.conf配置文件添加master_ip_failover_scipt='脚本文件目录'
附脚本地址:http://control.blog.sina.com.cn/admin/article/article_edit.php?blog_id=b4fca5310102yan0
(3)报错代码:
103.75.1.22(103.75.1.22:3306) (current master) +--103.75.1.23(103.75.1.23:3306) +--103.75.1.24(103.75.1.24:3306) Tue Apr 30 10:44:55 2019 - [info] Checking replication health on 103.75.1.23.. Tue Apr 30 10:44:55 2019 - [info] ok. Tue Apr 30 10:44:55 2019 - [info] Checking replication health on 103.75.1.24.. Tue Apr 30 10:44:55 2019 - [info] ok. Tue Apr 30 10:44:55 2019 - [info] Checking master_ip_failover_script status: Tue Apr 30 10:44:55 2019 - [info] /data/mastermha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=103.75.1.22 --orig_master_ip=103.75.1.22 --orig_master_port=3306 Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. Can't exec "/data/mastermha/app1/master_ip_failover": Permission denied at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68. Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Tue Apr 30 10:44:55 2019 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK! Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln226] Failed to get master_ip_failover_script status with return code 1:0. Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48 Tue Apr 30 10:44:55 2019 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Tue Apr 30 10:44:55 2019 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!
这个报错查了很多资料。我一直以为是我的master_ip_fialover脚本有问题。结果不是,是这个脚本没有执行权限,参考
解决办法:赋权! chmod +x
/data/mastermha/app1/master_ip_failover
再次执行发现解决!!
附完工图!
[root@localhost ~]# chmod +x /data/mastermha/app1/master_ip_failover [root@localhost ~]# masterha_check_repl --conf=/etc/mha/app1.cnf Tue Apr 30 10:51:59 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Apr 30 10:51:59 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Tue Apr 30 10:51:59 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Tue Apr 30 10:51:59 2019 - [info] MHA::MasterMonitor version 0.56. Tue Apr 30 10:52:00 2019 - [info] GTID failover mode = 0 Tue Apr 30 10:52:00 2019 - [info] Dead Servers: Tue Apr 30 10:52:00 2019 - [info] Alive Servers: Tue Apr 30 10:52:00 2019 - [info] 103.75.1.22(103.75.1.22:3306) Tue Apr 30 10:52:00 2019 - [info] 103.75.1.23(103.75.1.23:3306) Tue Apr 30 10:52:00 2019 - [info] 103.75.1.24(103.75.1.24:3306) Tue Apr 30 10:52:00 2019 - [info] Alive Slaves: Tue Apr 30 10:52:00 2019 - [info] 103.75.1.23(103.75.1.23:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled Tue Apr 30 10:52:00 2019 - [info] Replicating from 103.75.1.22(103.75.1.22:3306) Tue Apr 30 10:52:00 2019 - [info] Primary candidate for the new Master (candidate_master is set) Tue Apr 30 10:52:00 2019 - [info] 103.75.1.24(103.75.1.24:3306) Version=5.7.25-log (oldest major version between slaves) log-bin:enabled Tue Apr 30 10:52:00 2019 - [info] Replicating from 103.75.1.22(103.75.1.22:3306) Tue Apr 30 10:52:00 2019 - [info] Current Alive Master: 103.75.1.22(103.75.1.22:3306) Tue Apr 30 10:52:00 2019 - [info] Checking slave configurations.. Tue Apr 30 10:52:00 2019 - [info] read_only=1 is not set on slave 103.75.1.24(103.75.1.24:3306). Tue Apr 30 10:52:00 2019 - [info] Checking replication filtering settings.. Tue Apr 30 10:52:00 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Apr 30 10:52:00 2019 - [info] Replication filtering check ok. Tue Apr 30 10:52:00 2019 - [info] GTID (with auto-pos) is not supported Tue Apr 30 10:52:00 2019 - [info] Starting SSH connection tests.. Tue Apr 30 10:52:07 2019 - [info] All SSH connection tests passed successfully. Tue Apr 30 10:52:07 2019 - [info] Checking MHA Node version.. Tue Apr 30 10:52:11 2019 - [info] Version check ok. Tue Apr 30 10:52:11 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Apr 30 10:52:12 2019 - [info] HealthCheck: SSH to 103.75.1.22 is reachable. Tue Apr 30 10:52:14 2019 - [info] Master MHA Node version is 0.56. Tue Apr 30 10:52:14 2019 - [info] Checking recovery script configurations on 103.75.1.22(103.75.1.22:3306).. Tue Apr 30 10:52:14 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008 Tue Apr 30 10:52:14 2019 - [info] Connecting to root@103.75.1.22(103.75.1.22:22).. Creating /data/mastermha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to master-bin.000008 Tue Apr 30 10:52:16 2019 - [info] Binlog setting check done. Tue Apr 30 10:52:16 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Apr 30 10:52:16 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=103.75.1.23 --slave_ip=103.75.1.23 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.25-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Tue Apr 30 10:52:16 2019 - [info] Connecting to root@103.75.1.23(103.75.1.23:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-log.000005 Temporary relay log file is /var/lib/mysql/relay-log.000005 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Apr 30 10:52:17 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=103.75.1.24 --slave_ip=103.75.1.24 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.25-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Tue Apr 30 10:52:17 2019 - [info] Connecting to root@103.75.1.24(103.75.1.24:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-log.000006 Temporary relay log file is /var/lib/mysql/relay-log.000006 Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Apr 30 10:52:19 2019 - [info] Slaves settings check done. Tue Apr 30 10:52:19 2019 - [info] 103.75.1.22(103.75.1.22:3306) (current master) +--103.75.1.23(103.75.1.23:3306) +--103.75.1.24(103.75.1.24:3306) Tue Apr 30 10:52:19 2019 - [info] Checking replication health on 103.75.1.23.. Tue Apr 30 10:52:19 2019 - [info] ok. Tue Apr 30 10:52:19 2019 - [info] Checking replication health on 103.75.1.24.. Tue Apr 30 10:52:19 2019 - [info] ok. Tue Apr 30 10:52:19 2019 - [info] Checking master_ip_failover_script status: Tue Apr 30 10:52:19 2019 - [info] /data/mastermha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=103.75.1.22 --orig_master_ip=103.75.1.22 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig bond1:1 down==/sbin/ifconfig bond1:1 103.75.1.30/26=== Checking the Status of the script.. OK SIOCSIFADDR: No such device SIOCSIFNETMASK: No such device SIOCGIFADDR: No such device SIOCSIFBROADCAST: No such device bond1:1: unknown interface: No such device Tue Apr 30 10:52:21 2019 - [info] OK. Tue Apr 30 10:52:21 2019 - [warning] shutdown_script is not defined. Tue Apr 30 10:52:21 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
3.master_manage --conf=/etc/app1.conf
这里我卡住。后来查找资料发现启动方式不一样
[root@localhost ~]# nohup
masterha_manager --conf=/etc/mha/app1.cnf >
/data/mastermha/app1/manager.log &1 &
[1] 2190
上面的就是启动命令,需要启动文件和日志
[1] 2190
上面的就是启动命令,需要启动文件和日志
[root@localhost ~]# masterha_check_status
--conf=/etc/mha/app1.cnf
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
查看状态就会提示在初始化,稍后一段时间,
再次执行就会发现启动成功
app1 monitoring program is now on initialization
phase(10:INITIALIZING_MONITOR). Wait for a while and try checking
again.
[root@localhost ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2190) is running(0:PING_OK), master:103.75.1.22
[root@localhost ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:2190) is running(0:PING_OK), master:103.75.1.22
专业从事搬砖多年,还是在继续搬砖中,厚积薄发~