一个MYSQL监控与CONNECTION_CONTROL引起的问题
适用范围
mysql 任何版本
问题概述
客户反应我们监控工具占用了太多连接,已出现连接数告警,影响生产环境了,总共最大连接数为1000,我们监控软件达到800多!
如下:
但是比较疑惑的是该监控已在上周停掉了服务,并且该帐号也已手动drop 掉。 为什么还能访问到DB呢!
问题原因
通过对该问题分析主要是由以下两个原因影起:
1.监控软件原因,针对以上问题和我们研发确认,除了mysql_exporter 会采集mysql信息外,proxy 上的slowmon_collector 服务也会时不时的去采集DB信息。只不过该DB我早已取消纳管了,但为什么还是会去采集DB信息。这需要研发确认才能回复!
2.CONNECTION_CONTROL BUG, 该帐号已被删除,所以不能正常连接到DB,而 CONNECTION_CONTROL 会记录当该帐号连接失败达到 connection_control_failed_connections_threshold 指定次数时就会延迟创建连接。而延迟创建连接这段时间,是会占用连接数的。
为了证实这个问题,我特意在mysql 8.0.30上还原了该问题。
#1.插件准备
#安装 CONNECTION_CONTROL 插件
mysql>INSTALL PLUGIN CONNECTION_CONTROL SONAME 'connection_control.so';
mysql> INSTALL PLUGIN CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS SONAME 'connection_control.so';
#检查CONNECTION_CONTROL 插件是否正常
mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS
-> FROM INFORMATION_SCHEMA.PLUGINS
-> WHERE PLUGIN_NAME LIKE 'connection%';
+------------------------------------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+------------------------------------------+---------------+
| CONNECTION_CONTROL | ACTIVE |
| CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS | ACTIVE |
+------------------------------------------+---------------+
#检查CONNECTION_CONTROL 参数
mysql> show variables connection_control%';
+-------------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------------+------------+
| connection_control_failed_connections_threshold | 3 |
| connection_control_max_connection_delay | 2147483647 |
| connection_control_min_connection_delay | 1000 |
+-------------------------------------------------+------------+
3 rows in set (0.02 sec)
#2 用户准备
#创建测试用户
mysql>
mysql> create user test@'%' identified by 'test' ;
Query OK, 0 rows affected (1.29 sec)
#给用户授权
mysql> grant all privileges on *.* to 'test'@'%' ;
Query OK, 0 rows affected (5.39 sec)
#确定测试用户可以访问DB
[root@s2ahuzcloud01 run]# mysql -u test -p'test' -h 192.168.1.51 -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.30-commercial MySQL Enterprise Server - Commercial
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql
#3.开启测试
#删掉用户
mysql> drop user test@'%' ;
Query OK, 0 rows affected (0.71 sec)
#刷新权限
mysql> FLUSH PRIVILEGES ;
Query OK, 0 rows affected (0.12 sec)
#设置最大连接数
mysql> set global max_connections=5 ;
Query OK, 0 rows affected (0.01 sec)
mysql> show variables like '%max_connections%' ;
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| max_connections | 5 |
| mysqlx_max_connections | 100 |
+------------------------+-------+
2 rows in set (0.02 sec)
#4 创建并发连接
#同时开四个窗口连接
[root@s2ahuzcloud01 ~]# while true ; do mysql -u test -p'test' -h 192.168.1.51 -P 3306; done
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'test'@'192.168.1.51' (using password: YES)
mysql: [Warning] Using a password on the command line interface can be insecure.
#在次检查用户是否存
mysql>
mysql> select user ,host from mysql.user where user ='test' ;
Empty set (0.01 sec)
#查看连接数
mysql> show processlist ;
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
| 6 | event_scheduler | localhost | NULL | Daemon | 5852 | Waiting on empty queue | NULL |
| 9 | root | localhost | NULL | Query | 0 | init | show processlist |
| 47 | test | 192.168.1.51:37622 | NULL | Connect | 12 | Waiting in connection_control plugin | NULL |
| 48 | test | 192.168.1.51:37624 | NULL | Connect | 9 | Waiting in connection_control plugin | NULL |
| 49 | test | 192.168.1.51:37626 | NULL | Connect | 7 | Waiting in connection_control plugin | NULL |
| 50 | test | 192.168.1.51:37628 | NULL | Connect | 5 | Waiting in connection_control plugin | NULL |
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
6 rows in set (0.01 sec)
##使用正常的业务帐号却恩为连接数过多无法登录
[root@s2ahuzcloud01 run]# mysql -u root -p'root' --socket=/u01/mysql/data/uat1/run/mysql3306.sock
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1040 (HY000): Too many connections
补充
CONNECTION_CONTROL 插件下,任何不存在用户连接DB,都会占用DB的连接数
#多窗口并发访问, username 随机
while true ; do mysql -u test123 -p'test' -h 192.168.1.51 -P 3306; done
mysql> show processlist ;
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
| 6 | event_scheduler | localhost | NULL | Daemon | 200 | Waiting on empty queue | NULL |
| 9 | root | localhost | NULL | Query | 0 | init | show processlist |
| 51 | test123 | 192.168.1.51:41792 | NULL | Connect | 16 | Waiting in connection_control plugin | NULL |
| 52 | test123 | 192.168.1.51:41794 | NULL | Connect | 16 | Waiting in connection_control plugin | NULL |
| 53 | test123 | 192.168.1.51:41796 | NULL | Connect | 11 | Waiting in connection_control plugin | NULL |
| 54 | test123 | 192.168.1.51:41798 | NULL | Connect | 8 | Waiting in connection_control plugin | NULL |
+----+-----------------+--------------------+------+---------+------+--------------------------------------+------------------+
6 rows in set (0.00 sec)
解决方案
通过以上问题分析,即已明确了问题的原因,就可以通过以下方式避免该问题发生:
- 更新监控程序,确保监控程去不会去访问未纳管的DB
- 在必须使用CONNECTION_CONTROL 插件的场景
a. 将connection_control_min_connection_delay设置为足够小。这样就会很快释放掉连接。
b. 将connection_control_failed_connections_threshold 参数设置到足够大 或 设置为0 不限制,这也将失去connection_control插件的作用。
set global connection_control_failed_connections_threshold =0;
3.在有可以使用CONNECTION_CONTROL 插件的场景,可以卸载掉该插件。
mysql> UNINSTALL PLUGIN CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS;
mysql> UNINSTALL PLUGIN CONNECTION_CONTROL;
当然做好监控告警,将问题消除在潜在阶段,仍然是有效的手段,如果不是我们提前检查,可能就对生产产生了事故。 在对帐号有操作更新时,建议定期重启。