代码改变世界

MySQL复制中slave延迟监控

2017-10-31 12:41  abce  阅读(512)  评论(0编辑  收藏  举报

通常是根据 Seconds_Behind_Master 的值来判断slave的延迟。这么做大部分情况下尚可接受,但并不够准确,而应该考虑更多因素。

 

查看slave的状态

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.20
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000150
          Read_Master_Log_Pos: 226326418
               Relay_Log_File: abce-relay-bin.000003
                Relay_Log_Pos: 77106856
        Relay_Master_Log_File: mysql-bin.000149
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1046292913
              Relay_Log_Space: 330964012
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 30510
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
1 row in set (0.00 sec)

mysql> 

从查询结果可以看出,slave落后master的时间是30510秒(即Seconds_Behind_Master的值)

查看一下slave上的复制进程在做什么:

mysql> show full processlist\G
*************************** 1. row ***************************
     Id: 4
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 6018
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 5
   User: system user
   Host: 
     db: abce
Command: Connect
   Time: 30501
  State: updating
   Info: delete from s_item_e
        where
          
           (
             ng_user_id = 23880
             and dt_shift_date in
              (  
                 '2017-10-30 00:00:00'
              , 
                 '2017-10-29 00:00:00'
              ) 
           )
          or  
           (
             ng_user_id = 24853
             and dt_shift_date in
              (  
                 '2017-10-30 00:00:00'
              , 
                 '2017-10-29 00:00:00'
              ) 
           )
		......

sql线程在做updating操作,具体内容是从表中删除数据。Time的值是30501。通常简单的sql不会执行30501秒。

在REPLICATION进程中,Time这列的值还可能有几种情况:
1、SQL线程当前执行的binlog(实际上是relay log)中的timestamp和IO线程最新的timestamp的差值,这就是通常大家认为的Seconds_Behind_Master值,并不是某个SQL的实际执行耗时;
2、SQL线程当前如果没有活跃SQL在执行的话,Time值就是SQL线程的idle time;

而IO线程的Time值则是该线程自从启动以来的总时长(多少秒),如果系统时间在IO线程启动后发生修改的话,可能会导致该Time值异常,比如变成负数,或者非常大。

 

可以根据以下顺序进行延迟的监控:

1.首先比较 Read_Master_Log_Pos 和 Master_Log_File 是否有差异
2.如果 Relay_Master_Log_File 和 Master_Log_File 是一样的话,再看 Exec_Master_Log_Pos 和 Read_Master_Log_Pos 的差异,对比SQL线程比IO线程慢了多少个binlog事件;
3.如果 Relay_Master_Log_File 和 Master_Log_File 不一样,那说明延迟可能较大,需要从MASTER上取得binlog status,判断当前的binlog和MASTER上的差距;

  

查看主库的binary日志:

mysql> SHOW BINARY LOGS;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000142 | 1073778812 |
| mysql-bin.000143 | 1073798740 |
| mysql-bin.000144 | 1073910052 |
| mysql-bin.000145 | 1073781405 |
| mysql-bin.000146 | 1073845859 |
| mysql-bin.000147 | 1073746199 |
| mysql-bin.000148 | 1073863295 |
| mysql-bin.000149 | 1073822822 |
| mysql-bin.000150 |  226343255 |
+------------------+------------+
9 rows in set (0.00 sec)

mysql> 

计算延迟应该是mysql-bin.000149的最大事件数减去已经被执行完的事件数,
即 1073822822 - 1046292913 = 27529909 个binlog event,再加上mysql-bin.000150这个binlog已经产生的226343255个binlog event,共226343255个binlog event。即:1073822822 - 1046292913 + 226343255

 

查看过程中也可以设置pager,只查看关注的几个status值:

mysql> pager cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos'
PAGER set to 'cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos''
mysql> 

这时有活跃SQL,Time值是30583,Seconds_Behind_Master是30565 
mysql>  show processlist; show slave status\G
|  4 | system user |           | NULL   | Connect |  7168 | Waiting for master to send event | NULL                                                                                                 |
|  5 | system user |           | abce   | Connect | 30583 | updating                         | delete from s_item_e
3 rows in set (0.00 sec)

          Read_Master_Log_Pos: 226364447
          Exec_Master_Log_Pos: 1057661705
        Seconds_Behind_Master: 30565
1 row in set (0.00 sec)

mysql>