Mysql InnoDB Cluster集群日常维护命令

很重要:请保证集群中的数据库表都存在主键，不然集群会挂掉；

一.检查cluster各个节点是否可用

[cluster@node-1 ~]$ mysqlsh

mysql-js> dba.checkInstanceConfiguration("root@node-1:3305");

#在各节点配置之后, 创建cluster集群之前,会显示如下内容：
Please provide the password for "root@node-1:3305": 
Validating instance...
 
The instance 'node-1:3305' is valid for Cluster usage
{
    "status": "ok"
}

# 如果集群已经创建，会返回
Dba.checkInstanceConfiguration: The instance 'root@node-1:3305' is already part of an InnoDB Cluster (RuntimeError)

二.查看集群状态（主节点才能查看）

1.连接集群主节点

# mysqlsh

mysql-js> shell.connect("root@node-1:3305");
Please provide the password for 'root@node-1:3305': 
Creating a Session to 'root@node-1:3305'
Your MySQL connection id is 2567660
No default schema selected; type \use <schema> to set one.

#如果从远程登陆
[root@db-node03 ~]# mysqlsh --uri root@node-1:3305

2.获取集群实例

mysql-js> cluster=dba.getCluster();
<Cluster:myCluster>

3.查看集群状态

mysql-js> cluster.status();
{
    "clusterName": "myCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "node-1:3305", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "node-1:3305": {
                "address": "node-1:3305", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node-2:3305": {
                "address": "node-2:3305", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "node-3:3305": {
                "address": "node-3:3305", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }
    }
}

三.集群节点状态

- ONLINE       - 节点状态正常。
- OFFLINE      - 实例在运行，但没有加入任何Cluster。
- RECOVERING   - 实例已加入Cluster，正在同步数据。
- ERROR        - 同步数据发生异常。
- UNREACHABLE  - 与其他节点通讯中断，可能是网络问题，可能是节点crash。
- MISSING      - 节点已加入集群，但未启动group replication

四.InnoDB Cluster集群维护的命令帮助

#会列出集群维护的指令
mysql-js> dba.help();
#列出详细指令的用法
mysql-js> dba.help('deploySandboxInstance')

五.日常使用的几个重要命令 (mysqlsh的JS语法)

dba.checkInstanceConfiguration("root@hostname:3306")     #检查节点配置实例，用于加入cluster之前
 
dba.rebootClusterFromCompleteOutage('myCluster');        #重启 
 
dba.dropMetadataSchema();                                #删除schema
 
var cluster = dba.getCluster('myCluster')                #获取当前集群
 
cluster.checkInstanceState("root@hostname:3306")         #检查cluster里节点状态
 
cluster.rejoinInstance("root@hostname:3306")             #重新加入节点，我本地测试的时候发现rejoin一直无效，每次是delete后
 
addcluster.dissolve({force：true})                       #删除集群
 
cluster.addInstance("root@hostname:3306")                #增加节点
 
cluster.removeInstance("root@hostname:3306")             #删除节点
 
cluster.removeInstance('root@host:3306',{force:true})    #强制删除节点
 
cluster.dissolve({force:true})                           #解散集群
 
cluster.describe();                                      #集群描述

六.节点服务器重启后未加入集群（重新加入集群）（多出现主节点）

#如果查询节点状态为：
 "status": "(MISSING)"
#重新加入命令
shell.connect("root@node-1:3305");
cluster=dba.getCluster();
cluster.rejoinInstance("root@node-2:3305")
#如果rejoinInstance失败，提示remove重新添加如下：
cluster.removeInstance('root@node-1:3305');
cluster.addInstance('root@node-1:3305');

七.集群中所有服务器重启，所有节点都offline，直接获取集群信息失败

查询数据库SELECT * FROM performance_schema.replication_group_members;
仅显示单机 'MEMBER_STATE' = 'offline'

使用SELECT clusters.cluster_id,clusters.cluster_name from mysql_innodb_cluster_metadata.clusters；
查找集群名称
执行rebootClusterFromCompleteOutage命令，恢复集群
+------------+--------------+
| cluster_id | cluster_name |
+------------+--------------+
|          1 | myCluster    |
+------------+--------------+

#指令如下：
shell.connect("root@node-1:3305");
dba.rebootClusterFromCompleteOutage('myCluster');

八.RESET MASTER 和RESET SLAVE 命令的使用方法注意事项

RESET MASTER 
删除所有index file 中记录的所有binlog 文件，将日志索引文件清空，创建一个新的日志文件，这个命令通常仅仅用于第一次用于搭建主从关系的时的主库，
注意
  reset master 不同于purge binary log的两处地方
1 reset master 将删除日志索引文件中记录的所有binlog文件，创建一个新的日志文件 起始值从000001 开始，然而purge binary log 命令并不会修改记录binlog的顺序的数值
2 reset master 不能用于有任何slave 正在运行的主从关系的主库。因为在slave 运行时刻 reset master 命令不被支持，resetmaster 将master 的binlog从000001 开始记录,slave 记录的master log 则是reset master 时主库的最新的binlog,从库会报错无法找的指定的binlog文件。

In MySQL 5.6.5 and later, RESET MASTER also clears the values of the gtid_purged system variable (known as gtid_lost in MySQL 5.6.8 and earlier) as well as the global value of the gtid_executed (gtid_done, prior to MySQL 5.6.9) system variable (but not its session value); that is, executing this statement sets each of these values to an empty string ('')

RESET SLAVE 
reset slave 将使slave 忘记主从复制关系的位置信息。该语句将被用于干净的启动, 它删除master.info文件和relay-log.info 文件以及所有的relay log 文件并重新启用一个新的relaylog文件。
使用reset slave之前必须使用stop slave 命令将复制进程停止。

注 所有的relay log将被删除不管他们是否被SQL thread进程完全应用(这种情况发生于备库延迟以及在备库执行了stop slave 命令),存储复制链接信息的master.info文件将被立即清除,如果SQL thread 正在复制临时表的过程中，执行了stop slave ，并且执行了reset slave，这些被复制的临时表将被删除。

RESET SLAVE ALL
在 5.6 版本中 reset slave 并不会清理存储于内存中的复制信息比如 master host, master port, master user, or master password,也就是说如果没有使用change master 命令做重新定向，执行start slave 还是会指向旧的master 上面。
当从库执行reset slave之后,将mysqld shutdown 复制参数将被重置。
在5.6.3 版本以及以后 使用使用 RESET SLAVE ALL 来完全的清理复制连接参数信息。(Bug #11809016)
RESET SLAVE ALL does not clear the IGNORE_SERVER_IDS list set by CHANGE MASTER TO. This issue is fixed in MySQL 5.7. (Bug #18816897)
In MySQL 5.6.7 and later, RESET SLAVE causes an implicit commit of an ongoing transaction. See Section 13.3.3, “Statements That Cause an Implicit Commit”.

九.停止集群

mysql-js> shell.connect("root@node-1:3305");
mysql-js> \sql
#停止集群
mysql-sql> stop group_replication;

十.故障节点修复后，重新加入集群

1.初始化故障节点
2.导出正常节点的数据库，并传到故障节点
3.故障节点导入数据库
4.重启故障节点 MySQL
5.将故障节点重新加入集群

问题解决

问题1

MySQL JS > cluster=dba.getCluster();
Dba.getCluster: An open session is required to perform this operation. (RuntimeError)

解决1

需要先连接到集群实例

MySQL JS > shell.connect("root@vm001:3306");

问题2

metadata exists, instance belongs to that metadata, but GR is not active

Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (RuntimeError)

解决2

此时可尝试重启集群：

dba.rebootClusterFromCompleteOutage()；

若依然不能解决，则使用cmd方式登录MySQL，然后启动该节点的group replication：

set global group_replication_bootstrap_group=on;
start group_replication;
set global group_replication_bootstrap_group=off;

问题3

metadata exists, instance does not belong to that metadata, and GR is not active

调用\connect连接到某个从节点，然后var cluster = dba.getCluster()时报：

Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance does not belong to that metadata, and GR is not active) (RuntimeError)

此时可删除从节点元数据，重新加入集群中：

dba.dropMetadataSchema();

执行过程中会报一个ERROR，提示是否禁用super_reade_only，输入y即可。
对于非当前节点报的此错误，首先要\connect连接到错误节点。

posted @ 2022-05-13 17:59 风吹稻香阅读(3146) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

风吹稻香

Mysql InnoDB Cluster集群 日常维护命令