返回顶部

故障集,持续更新

1. MHA测试报错

(1)报错情况

masterha_check_repl -conf=/etc/masterha/app1.cnf报错,显示如下

[root@manager scripts]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Tue Sep  7 02:59:56 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Sep  7 02:59:56 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Sep  7 02:59:56 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Sep  7 02:59:56 2021 - [info] MHA::MasterMonitor version 0.57.
Tue Sep  7 02:59:57 2021 - [info] GTID failover mode = 0
Tue Sep  7 02:59:57 2021 - [info] Dead Servers:
Tue Sep  7 02:59:57 2021 - [info] Alive Servers:
Tue Sep  7 02:59:57 2021 - [info]   192.168.122.10(192.168.122.10:3306)
Tue Sep  7 02:59:57 2021 - [info]   192.168.122.11(192.168.122.11:3306)
Tue Sep  7 02:59:57 2021 - [info]   192.168.122.12(192.168.122.12:3306)
Tue Sep  7 02:59:57 2021 - [info] Alive Slaves:
Tue Sep  7 02:59:57 2021 - [info]   192.168.122.11(192.168.122.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Tue Sep  7 02:59:57 2021 - [info]     Replicating from 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 02:59:57 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep  7 02:59:57 2021 - [info]   192.168.122.12(192.168.122.12:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Tue Sep  7 02:59:57 2021 - [info]     Replicating from 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 02:59:57 2021 - [info] Current Alive Master: 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 02:59:57 2021 - [info] Checking slave configurations..
Tue Sep  7 02:59:57 2021 - [warning]  relay_log_purge=0 is not set on slave 192.168.122.11(192.168.122.11:3306).
Tue Sep  7 02:59:57 2021 - [warning]  relay_log_purge=0 is not set on slave 192.168.122.12(192.168.122.12:3306).
Tue Sep  7 02:59:57 2021 - [info] Checking replication filtering settings..
Tue Sep  7 02:59:57 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
Tue Sep  7 02:59:57 2021 - [info]  Replication filtering check ok.
Tue Sep  7 02:59:57 2021 - [info] GTID (with auto-pos) is not supported
Tue Sep  7 02:59:57 2021 - [info] Starting SSH connection tests..
Tue Sep  7 03:00:00 2021 - [info] All SSH connection tests passed successfully.
Tue Sep  7 03:00:00 2021 - [info] Checking MHA Node version..
Tue Sep  7 03:00:00 2021 - [info]  Version check ok.
Tue Sep  7 03:00:00 2021 - [info] Checking SSH publickey authentication settings on the current master..
Tue Sep  7 03:00:00 2021 - [info] HealthCheck: SSH to 192.168.122.10 is reachable.
Tue Sep  7 03:00:00 2021 - [info] Master MHA Node version is 0.57.
Tue Sep  7 03:00:00 2021 - [info] Checking recovery script configurations on 192.168.122.10(192.168.122.10:3306)..
Tue Sep  7 03:00:00 2021 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=master-bin.000001 
Tue Sep  7 03:00:00 2021 - [info]   Connecting to root@192.168.122.10(192.168.122.10:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mysql/data, up to master-bin.000001
Tue Sep  7 03:00:00 2021 - [info] Binlog setting check done.
Tue Sep  7 03:00:00 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Sep  7 03:00:00 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.122.11 --slave_ip=192.168.122.11 --slave_port=3306 --workdir=/tmp --target_version=5.7.17-log --manager_version=0.57 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Tue Sep  7 03:00:00 2021 - [info]   Connecting to root@192.168.122.11(192.168.122.11:22).. 
mysqlbinlog: [ERROR] unknown variable 'default-character-set=utf8'
mysqlbinlog version command failed with rc 7:0, please verify PATH, LD_LIBRARY_PATH, and client options
 at /usr/local/bin/apply_diff_relay_logs line 493.
Tue Sep  7 03:00:01 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Tue Sep  7 03:00:01 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Tue Sep  7 03:00:01 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Tue Sep  7 03:00:01 2021 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Tue Sep  7 03:00:01 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

(2)报错原因

锁定报错原因为:[ERROR] unknown variable 'default-character-set=utf8'@192.168.122.11

(3)解决方法

前往192.168.122.11主机(以及同配置的其他slave主机)的/etc/my.cnf删除错误配置并重启mysql服务即可

[root@manager scripts]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Tue Sep  7 03:10:16 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Sep  7 03:10:16 2021 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Sep  7 03:10:16 2021 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Sep  7 03:10:16 2021 - [info] MHA::MasterMonitor version 0.57.
Tue Sep  7 03:10:17 2021 - [info] GTID failover mode = 0
Tue Sep  7 03:10:17 2021 - [info] Dead Servers:
Tue Sep  7 03:10:17 2021 - [info] Alive Servers:
Tue Sep  7 03:10:17 2021 - [info]   192.168.122.10(192.168.122.10:3306)
Tue Sep  7 03:10:17 2021 - [info]   192.168.122.11(192.168.122.11:3306)
Tue Sep  7 03:10:17 2021 - [info]   192.168.122.12(192.168.122.12:3306)
Tue Sep  7 03:10:17 2021 - [info] Alive Slaves:
Tue Sep  7 03:10:17 2021 - [info]   192.168.122.11(192.168.122.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Tue Sep  7 03:10:17 2021 - [info]     Replicating from 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 03:10:17 2021 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Sep  7 03:10:17 2021 - [info]   192.168.122.12(192.168.122.12:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Tue Sep  7 03:10:17 2021 - [info]     Replicating from 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 03:10:17 2021 - [info] Current Alive Master: 192.168.122.10(192.168.122.10:3306)
Tue Sep  7 03:10:17 2021 - [info] Checking slave configurations..
Tue Sep  7 03:10:17 2021 - [info]  read_only=1 is not set on slave 192.168.122.11(192.168.122.11:3306).
Tue Sep  7 03:10:17 2021 - [warning]  relay_log_purge=0 is not set on slave 192.168.122.11(192.168.122.11:3306).
Tue Sep  7 03:10:17 2021 - [info]  read_only=1 is not set on slave 192.168.122.12(192.168.122.12:3306).
Tue Sep  7 03:10:17 2021 - [warning]  relay_log_purge=0 is not set on slave 192.168.122.12(192.168.122.12:3306).
Tue Sep  7 03:10:17 2021 - [info] Checking replication filtering settings..
Tue Sep  7 03:10:17 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
Tue Sep  7 03:10:17 2021 - [info]  Replication filtering check ok.
Tue Sep  7 03:10:17 2021 - [info] GTID (with auto-pos) is not supported
Tue Sep  7 03:10:17 2021 - [info] Starting SSH connection tests..
Tue Sep  7 03:10:20 2021 - [info] All SSH connection tests passed successfully.
Tue Sep  7 03:10:20 2021 - [info] Checking MHA Node version..
Tue Sep  7 03:10:20 2021 - [info]  Version check ok.
Tue Sep  7 03:10:20 2021 - [info] Checking SSH publickey authentication settings on the current master..
Tue Sep  7 03:10:20 2021 - [info] HealthCheck: SSH to 192.168.122.10 is reachable.
Tue Sep  7 03:10:20 2021 - [info] Master MHA Node version is 0.57.
Tue Sep  7 03:10:20 2021 - [info] Checking recovery script configurations on 192.168.122.10(192.168.122.10:3306)..
Tue Sep  7 03:10:20 2021 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --start_file=master-bin.000001 
Tue Sep  7 03:10:20 2021 - [info]   Connecting to root@192.168.122.10(192.168.122.10:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mysql/data, up to master-bin.000001
Tue Sep  7 03:10:20 2021 - [info] Binlog setting check done.
Tue Sep  7 03:10:20 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Sep  7 03:10:20 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.122.11 --slave_ip=192.168.122.11 --slave_port=3306 --workdir=/tmp --target_version=5.7.17-log --manager_version=0.57 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Tue Sep  7 03:10:20 2021 - [info]   Connecting to root@192.168.122.11(192.168.122.11:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000006
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000006
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Sep  7 03:10:21 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.122.12 --slave_ip=192.168.122.12 --slave_port=3306 --workdir=/tmp --target_version=5.7.17-log --manager_version=0.57 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Tue Sep  7 03:10:21 2021 - [info]   Connecting to root@192.168.122.12(192.168.122.12:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to relay-log-bin.000006
    Temporary relay log file is /usr/local/mysql/data/relay-log-bin.000006
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Sep  7 03:10:21 2021 - [info] Slaves settings check done.
Tue Sep  7 03:10:21 2021 - [info] 
192.168.122.10(192.168.122.10:3306) (current master)
 +--192.168.122.11(192.168.122.11:3306)
 +--192.168.122.12(192.168.122.12:3306)

Tue Sep  7 03:10:21 2021 - [info] Checking replication health on 192.168.122.11..
Tue Sep  7 03:10:21 2021 - [info]  ok.
Tue Sep  7 03:10:21 2021 - [info] Checking replication health on 192.168.122.12..
Tue Sep  7 03:10:21 2021 - [info]  ok.
Tue Sep  7 03:10:21 2021 - [info] Checking master_ip_failover_script status:
Tue Sep  7 03:10:21 2021 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.122.10 --orig_master_ip=192.168.122.10 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.122.200===

Checking the Status of the script.. OK 
Tue Sep  7 03:10:21 2021 - [info]  OK.
Tue Sep  7 03:10:21 2021 - [warning] shutdown_script is not defined.
Tue Sep  7 03:10:21 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

2. 远程登录失败

(1)报错情况

传输文件报错

[root@master1 mysql-mmm]# scp mmm_common.conf root@192.168.122.101:/etc/mysql-mmm/
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:Ie+O9BOd/5wHoir8c++ToKTEpNOPK/5earrpbb886ms.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:1
ECDSA host key for 192.168.122.101 has changed and you have requested strict checking.
Host key verification failed.
lost connection

ssh远程登录同样报错

[root@master1 mysql-mmm]# ssh root@192.168.122.101
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:Ie+O9BOd/5wHoir8c++ToKTEpNOPK/5earrpbb886ms.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:1
ECDSA host key for 192.168.122.101 has changed and you have requested strict checking.
Host key verification failed.

但接收方可正常登录发送方

[root@slave2 ~]# ssh root@192.168.122.10
The authenticity of host '192.168.122.10 (192.168.122.10)' can't be established.
ECDSA key fingerprint is SHA256:tDS2skRigRL3zhDYuyo71fqxY+Hp0TNLSOD3qNVdjzA.
ECDSA key fingerprint is MD5:5f:14:4d:e0:62:72:13:c6:ca:81:19:f0:8b:da:9a:ee.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.10' (ECDSA) to the list of known hosts.
root@192.168.122.10's password: 
Permission denied, please try again.
root@192.168.122.10's password: 
Last failed login: Wed Sep  8 01:19:39 CST 2021 from 192.168.122.101 on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Wed Sep  8 00:11:35 2021 from 192.168.122.1
[root@master1 ~]# 

(2)报错原因

锁定报错原因为:ECDSA host key for 192.168.122.101 has changed and you have requested strict checking.
由于此前远程登录过该主机IP,本地存放着该IP主机的公钥信息。但接收主机的系统做出过更改,或者IP做过更改导致接收主机与公钥信息不匹配,产生报错。

(3)解决方法

进入“~/.ssh/known_hosts”删除对应IP的公钥信息即可

[root@master1 mysql-mmm]# vi ~/.ssh/known_hosts

192.168.122.11 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHvs6y2TVG3xGbSyMsvZDiPPL/s2PnAYz5JZStd0Et+J1iNiOuPvYHRZMjOPgMUH8ypDf2Xs59d8Vi4UgcMy01c=
192.168.122.12 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFRvfh9BZ4IdYp4glTQh+rTqxrXbVtCxaMyy4Nl+BqakL2UX8F1GCgmMFAJJd5OC48F5ouwUHqcpeNOPh5PXQww=
192.168.122.100 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHDgMaNHdzIs11NrXabPSmjGrpeZs87ZD9aCOV95QUxkw3uE/eHvVOy9IwSUW5CrYnxqRFfk8vkx5L7ihEsmll8=
##删除下列IP后重新登录
192.168.122.101 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBBc9IAocnXR5wW0j/cMHIKeuelaLaCdr0QNexc5mJTMWZ4yl/9o50WS/0aoobpTHOlLhpvYbWT2GmYTc3/AMapU=
[root@master1 mysql-mmm]# ssh root@192.168.122.101
The authenticity of host '192.168.122.101 (192.168.122.101)' can't be established.
ECDSA key fingerprint is SHA256:Ie+O9BOd/5wHoir8c++ToKTEpNOPK/5earrpbb886ms.
ECDSA key fingerprint is MD5:bf:42:2d:44:59:8a:81:b5:20:e6:90:73:b1:c5:85:fe.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.101' (ECDSA) to the list of known hosts.
root@192.168.122.101's password: 
Last login: Tue Sep  7 23:32:28 2021 from 192.168.122.1
[root@slave2 ~]#
##登录成功
[root@master1 mysql-mmm]# scp mmm_common.conf root@192.168.122.101:/etc/mysql-mmm/
The authenticity of host '192.168.122.101 (192.168.122.101)' can't be established.
ECDSA key fingerprint is SHA256:Ie+O9BOd/5wHoir8c++ToKTEpNOPK/5earrpbb886ms.
ECDSA key fingerprint is MD5:bf:42:2d:44:59:8a:81:b5:20:e6:90:73:b1:c5:85:fe.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.101' (ECDSA) to the list of known hosts.
root@192.168.122.101's password: 
mmm_common.conf                           100%  842     2.3MB/s   00:00    
##文件远程传输成功

3. Docker中mysqld服务未启动

(1)报错情况

[root@docker compose_lnmp]# docker exec -it mysql bash
[root@mysql bin]# systemctl status mysqld
Failed to get D-Bus connection: Operation not permitted

(2)报错原因

dockerfile如下

FROM centos:7
MAINTAINER this is mysql image <lnmp>
RUN yum -y install ncurses ncurses-devel bison cmake pcre-devel zlib-devel gcc gcc-c++ make;useradd -M -s /sbin/nologin mysql
ADD mysql-boost-5.7.20.tar.gz /usr/local/src/
WORKDIR /usr/local/src/mysql-5.7.20/
RUN cmake \
-DCMAKE_INSTALL_PREFIX=/usr/local/mysql \
-DMYSQL_UNIX_ADDR=/usr/local/mysql/mysql.sock \
-DSYSCONFDIR=/etc \
-DSYSTEMD_PID_DIR=/usr/local/mysql \
-DDEFAULT_CHARSET=utf8  \
-DDEFAULT_COLLATION=utf8_general_ci \
-DWITH_EXTRA_CHARSETS=all \
-DWITH_INNOBASE_STORAGE_ENGINE=1 \
-DWITH_ARCHIVE_STORAGE_ENGINE=1 \
-DWITH_BLACKHOLE_STORAGE_ENGINE=1 \
-DWITH_PERFSCHEMA_STORAGE_ENGINE=1 \
-DMYSQL_DATADIR=/usr/local/mysql/data \
-DWITH_BOOST=boost \
-DWITH_SYSTEMD=1;make -j4;make install
ADD my.cnf /etc/my.cnf
EXPOSE 3306
RUN chown -R mysql:mysql /usr/local/mysql/;chown mysql:mysql /etc/my.cnf
WORKDIR /usr/local/mysql/bin/
RUN ./mysqld \
--initialize-insecure \
--user=mysql \
--basedir=/usr/local/mysql \
--datadir=/usr/local/mysql/data;cp /usr/local/mysql/usr/lib/systemd/system/mysqld.service /usr/lib/systemd/system/;systemctl enable mysqld
ENV PATH=/usr/local/mysql/bin:/usr/local/mysql/lib:$PATH
VOLUME [ "/usr/local/mysql" ]
CMD ["/usr/sbin/init"]

无权限执行脚本/usr/local/mysql/bin/mysqld,未能完成初始化

(3)解决方法

方法一:

通过--privileged进入容器

方法二:

在dockerfile的RUN ./mysqld......前添加RUN chmod 777 /usr/local/mysql/bin/mysqld

方法三:

进入容器后运行脚本/usr/local/mysql/bin/mysqld

方法四:

compose脚本中添加privileged: true赋予容器的权限为root

4. 磁盘告警--overlay2文件过大

(1)告警情况

/data目录容量告警

(2)告警原因


docker目录放在/data目录下的,经排查发现有两个overlay2文件占用空间过大。

(3)解决办法

通过docker ps -q | xargs docker inspect --format '{{.State.Pid}}, {{.Id}}, {{.Name}}, {{.GraphDriver.Data.WorkDir}}' | grep "xxxxxxxxxx"命令查看该文件归属的容器,然后将该pod调度到其他节点。

posted @ 2021-09-07 04:38  丨君丶陌  阅读(326)  评论(0编辑  收藏  举报