PT-ONLINE-CHANGE-SCHEMA 无法连接从库
问题现状:
在执行ONLINE DDL时,PT工具报出异常,无法连接从库:
Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root
执行是完成了, 但这个警告忽略了从库,如果从库有异常,这个时候是不会管的。
分析:
环境ROOT密码不一致。
执行ONLINE DDL的用户为ROOT。
问题模拟:
1. 将主从库的ROOT密码设置为不一致:
mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('xxxxx'); Query OK, 0 rows affected (0.01 sec) mysql> grant all privileges on *.* to root@'localhost' identified by 'xxxxxx'; Query OK, 0 rows affected (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=1; Query OK, 0 rows affected (0.00 sec) mysql>
2. 创建一张数据库,21855条测试数据:
mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> create table test_row(id int(9) auto_increment primary key not null, name varchar(9)); Query OK, 0 rows affected (0.02 sec) root@redis1:~# cat insert_row.sh #!/bin/bash for i in `seq 1 21855` do /usr/local/percona/bin/mysql -uroot -pxxxx -S /database/mysql/3306/run/mysql.sock -D test -e"insert into test_row(name) values('aa');" done exit 0
3. 用PT-ONLINE工具为此表增加一个AGE字段:
root@redis1:~# pt-online-schema-change --critical-load Threads_running=100 --user=root --host=localhost --ask-pass --port=3306 --charset=utf8 --nodrop-old-table --chunk-size=10000 --alter="add column age int(9)" D=test,t=test_row --nocheck-replication-filters --exec Enter MySQL password: Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root No slaves found. See --recursion-method if host redis1 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`test_row`... Creating new table... Created new table test._test_row_new OK. Altering new table... Altered `test`.`_test_row_new` OK. 2017-12-11T06:19:21 Creating triggers... 2017-12-11T06:19:21 Created triggers OK. 2017-12-11T06:19:21 Copying approximately 21787 rows... Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root 2017-12-11T06:19:21 Copied rows OK. 2017-12-11T06:19:21 Analyzing new table... 2017-12-11T06:19:21 Swapping tables... 2017-12-11T06:19:21 Swapped original and new tables OK. Not dropping old table because --no-drop-old-table was specified. 2017-12-11T06:19:21 Dropping triggers... 2017-12-11T06:19:21 Dropped triggers OK. Successfully altered `test`.`test_row`.
4. 检查数据:
mysql> desc test_row; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 21866 Current database: test +-------+------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+----------------+ | id | int(9) | NO | PRI | NULL | auto_increment | | name | varchar(9) | YES | | NULL | | | age | int(9) | YES | | NULL | | +-------+------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> select count(*) from test_row; +----------+ | count(*) | +----------+ | 21855 | +----------+ 1 row in set (0.01 sec)
如真实情况一样, 字段已经添加上了,数据也正确。
解决问题的方案:
1. 将ROOT密码设置为一致。
经测试, 此方案依旧有警告信息,连接不上从库。
2. 设置一个新的账户,可以用来检测从库也可以执行DDL的用户。
2.1 授权账户:
mysql> GRANT RELOAD, PROCESS, SUPER, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'root'@'192.168.252.203' identified by 'xxxxx'; Query OK, 0 rows affected (0.00 sec) mysql> GRANT ALL PRIVILEGES ON `test`.* TO 'root'@'192.168.252.203'; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) mysql>
2.2 执行ONLINE DDL操作:
root@redis1:~# pt-online-schema-change --critical-load Threads_running=100 --user=root --host=192.168.252.203 --ask-pass --port=3306 --charset=utf8 --nodrop-old-table --chunk-size=10000 --alter="drop column age" D=test,t=test_row --nocheck-replication-filters --exec Enter MySQL password: Found 2 slaves: redis1 redis2 Will check slave lag on: redis1 redis2 Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`test_row`... Creating new table... Created new table test._test_row_new OK. Waiting forever for new table `test`.`_test_row_new` to replicate to redis2... Altering new table... Altered `test`.`_test_row_new` OK. 2017-12-11T07:04:29 Creating triggers... 2017-12-11T07:04:29 Created triggers OK. 2017-12-11T07:04:29 Copying approximately 21787 rows... 2017-12-11T07:04:30 Copied rows OK. 2017-12-11T07:04:30 Analyzing new table... 2017-12-11T07:04:30 Swapping tables... 2017-12-11T07:04:30 Swapped original and new tables OK. Not dropping old table because --no-drop-old-table was specified. 2017-12-11T07:04:30 Dropping triggers... 2017-12-11T07:04:30 Dropped triggers OK. Successfully altered `test`.`test_row`. root@redis1:~#
从库被正常连接,并被正常检测 .