MySQL参数max_connect_errors分析
一、问题引入
最近一台mysql服务器报错:ERROR 1129 (00000): Host 'xxx' is blocked because of many connection errors. Unblock with 'mysqladmin flush-hosts'。
本质原因:
因为同一个IP在短时间内产生太多中断的数据库连接(超过max_connect_errors的最大值)而导致的
If more than this many successive connection requests from a host are interrupted without a successful connection, the server blocks that host from further connections. You can unblock blocked hosts by flushing the host cache. To do so, issue a FLUSH HOSTS statement or execute a mysqladmin flush-hosts command. If a connection is established successfully within fewer than max_connect_errors attempts after a previous connection was interrupted, the error count for the host is cleared to zero. However, once a host is blocked, flushing the host cache is the only way to unblock it. The default is 100.
翻译:如果MySQL服务器连续接收到了来自于同一个主机的请求,而且这些连续的请求全部都没有成功的建立连接就被中断了,当这些连续的请求的累计值大于max_connect_errors的设定值时,MySQL服务器就会阻止这台主机后续的所有请求。相信一开始你看到这些资料,也会被“many successive connection requests from a host are interrupted without a successful connection”给弄懵,其实这个就是因为由于网络异常而中止数据库连接
There seems to be confusion around that variable. It does not really block hosts for repeated invalid passwords but for aborted connections due to network errors.
二、实验验证
#创建测试用户 mysql> grant all privileges on *.* to test@'%' identified by 'mysql'; Query OK, 0 rows affected, 1 warning (0.01 sec) #查看变量max_connect_errors并设置值 mysql> show variables like '%max_connect_errors%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | max_connect_errors | 100 | +--------------------+-------+ 1 row in set (0.13 sec) mysql> set global max_connect_errors=3; Query OK, 0 rows affected (0.00 sec) mysql> show variables like '%max_connect_errors%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | max_connect_errors | 3 | +--------------------+-------+ 1 row in set (0.00 sec) #在另外一台测试机器,以错误的密码去连接这个MySQL数据库,如下所示,即使前面输入了三次错误密码,第四次输入是也没有碰到上面错误。那么可以排除这个变量与密码错误输入有关系 [root@docker02 ~]# mysql -utest -pmysql11 -h 192.168.11.10 ERROR 1045 (28000): Access denied for user 'test'@'192.168.11.11' (using password: YES) [root@docker02 ~]# mysql -utest -pmysql11 -h 192.168.11.10 ERROR 1045 (28000): Access denied for user 'test'@'192.168.11.11' (using password: YES) [root@docker02 ~]# mysql -utest -pmysql11 -h 192.168.11.10 ERROR 1045 (28000): Access denied for user 'test'@'192.168.11.11' (using password: YES) [root@docker02 ~]# mysql -utest -pmysql11 -h 192.168.11.10 ERROR 1045 (28000): Access denied for user 'test'@'192.168.11.11' (using password: YES) [root@docker02 ~]# mysql -utest -pmysql11 -h 192.168.11.10 ERROR 1045 (28000): Access denied for user 'test'@'192.168.11.11' (using password: YES)
关于某个IP输入了错误密码,MySQL会在performance_schema数据库下的host_cache表中记录。它会累计记录在COUNT_AUTHENTICATION_ERRORS字段
host_cache的字段是统计被视为“阻塞”的连接错误的数量(根据max_connect_errors系统变量进行评估)。 只计算协议握手错误,并且仅用于通过验证的主机(HOST_VALIDATED = YES)
SUM_CONNECT_ERRORS:The number of connection errors that are deemed “blocking” (assessed against the max_connect_errors system variable). Only protocol handshake errors are counted, and only for hosts that passed validation (HOST_VALIDATED = YES).
MySQL客户端与数据库建立连接需要发起三次握手协议,正常情况下,这个时间非常短,但是一旦网络异常,网络超时等因素出现,就会导致这个握手协议无法完成,MySQL有个参数connect_timeout,它是MySQL服务端进程mysqld等待连接建立完成的时间,单位为秒。如果超过connect_timeout时间范围内,仍然无法完成协议握手话,MySQL客户端会收到异常,异常消息类似于: Lost connection to MySQL server at 'XXX', system error: errno,该变量默认是10秒
mysql> show variables like 'connect_timeout'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | connect_timeout | 10 | +-----------------+-------+ 1 row in set (0.01 sec)
构造一个网络超时引起的数据库连接被中断案例吧,我们用Linux下的netem与tc命令模拟构造出复杂环境下的网络传输延时案例,如下设置后,此时从测试服务器去访问MySQL服务器,都会出现延时11秒
[root@docker02 ~]# ping 192.168.11.10 PING 192.168.11.10 (192.168.11.10) 56(84) bytes of data. 64 bytes from 192.168.11.10: icmp_seq=1 ttl=64 time=0.686 ms 64 bytes from 192.168.11.10: icmp_seq=2 ttl=64 time=0.472 ms 64 bytes from 192.168.11.10: icmp_seq=3 ttl=64 time=0.465 ms 64 bytes from 192.168.11.10: icmp_seq=4 ttl=64 time=0.512 ms ^C --- 192.168.11.10 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3002ms rtt min/avg/max/mdev = 0.465/0.533/0.686/0.094 ms [root@docker02 ~]# tc qdisc add dev eth0 root netem delay 11000ms [root@docker02 ~]# [root@docker02 ~]# ping 192.168.11.10 PING 192.168.11.10 (192.168.11.10) 56(84) bytes of data. 64 bytes from 192.168.11.10: icmp_seq=1 ttl=64 time=11001 ms 64 bytes from 192.168.11.10: icmp_seq=2 ttl=64 time=11000 ms From 192.168.11.11 icmp_seq=10 Destination Host Unreachable From 192.168.11.11 icmp_seq=11 Destination Host Unreachable From 192.168.11.11 icmp_seq=12 Destination Host Unreachable From 192.168.11.11 icmp_seq=13 Destination Host Unreachable 64 bytes from 192.168.11.10: icmp_seq=3 ttl=64 time=11001 ms 64 bytes from 192.168.11.10: icmp_seq=4 ttl=64 time=11000 ms 64 bytes from 192.168.11.10: icmp_seq=5 ttl=64 time=11000 ms 64 bytes from 192.168.11.10: icmp_seq=6 ttl=64 time=11000 ms 64 bytes from 192.168.11.10: icmp_seq=7 ttl=64 time=11000 ms
在测试服务器连接MySQL数据库,如下所示(注意,如果你是在通过ssh连接这台服务器的话,此时在测试服务器上操作会相当慢。当然你也可以在MySQL服务器模拟网络延时,或者你将connect_timeout和网络延时都设小一点)
[root@docker02 ~]# mysql -utest -pmysql -h 192.168.11.10 ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0
由于网络延时超过10秒,导致连接MySQL失败,此时,你在MySQL服务器上查询host_cache表时,那么你就会看到SUM_CONNECT_ERRORS变成1了,COUNT_HANDSHAKE_ERRORS也变成了1
反复这样折腾三次,那么你会看到SUM_CONNECT_ERRORS变成3了,COUNT_HANDSHAKE_ERRORS也变成了3了
用netem与tc 命令在测试服务器上取消网络延迟模拟,然后去测试连接MySQL数据库
此时就能构造出“ERROR 1129 (HY000): Host 'xxx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'”错误了。
三、解决方案
解决方法比较多,不过有些方案都是临时方案。临时方案是指标不治本。关键还是需要解决网络错误(这个往往需要求助网络管理人员或系统管理员)
1)将变量max_connection_errors的值设置为一个更大的值
这个临时方案只是延迟触发IP被禁止访问的条件而已,而且在复杂情况或高并发的情况下,需要设置一个很大的值,否则很容易就会再次被触发。另外,变量只对当前环境生效,如果重启就会失效,如果需要永久有效,可以在my.cnf配置文件里面配置。
mysql> show variables like 'max_connect_errors'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | max_connect_errors | 3 | +--------------------+-------+ 1 row in set (0.03 sec) mysql> set global max_connect_errors=150; Query OK, 0 rows affected (0.00 sec)
2)使用flush hosts
#方式一 mysql> flush hosts; Query OK, 0 rows affected (0.00 sec) mysql> select * from performance_schema.host_cache\G Empty set (0.00 sec) #方式二 [root@docker01 ~]# mysqladmin -uroot -pmysql flush-host
host cache解释说明:
The MySQL server maintains a host cache in memory that contains information about clients: IP address, host name, and error information. The server uses this cache for nonlocal TCP connections. It does not use the cache for TCP connections established using a loopback interface address (127.0.0.1 or ::1), or for connections established using a Unix socket file, named pipe, or shared memory.
MySQL服务器在内存中维护一个包含客户端信息的缓存:IP地址,主机名和错误信息等。 服务器会将非本地TCP连接信息缓存起来。它不会缓存使用环回接口地址(127.0.0.1或者:: 1)建立的TCP连接,或者使用Unix套接字文件,命名管道或共享内存建立的连接。host cache信息可以通过performance_schema数据库下的host_cache表查询。
3)将变量host_cache_size设置为0
最不靠谱的解决方法,只是让MySQL服务器不记录host cache信息而已。完全可以忽略这个方法
mysql> show variables like '%host_cache_size%'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | host_cache_size | 279 | +-----------------+-------+ 1 row in set (0.00 sec) mysql> set global host_cache_size=0; Query OK, 0 rows affected (0.00 sec) mysql> show variables like '%host_cache_size%'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | host_cache_size | 0 | +-----------------+-------+ 1 row in set (0.00 sec) mysql> select * from performance_schema.host_cache; Empty set (0.00 sec)
四、参考资料
http://mysqlblog.fivefarmers.com/2013/08/08/understanding-max_connect_errors/
https://dev.mysql.com/doc/refman/5.6/en/host-cache.html
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_connect_errors
-------------------------------------------
个性签名:独学而无友,则孤陋而寡闻。做一个灵魂有趣的人!