MySQL并行复制死锁源码解析

最近一个MySQL 5.7.21备库告警当天的备份失败,登录上去看的时候发现前一天的备份任务还没有结束,通过查看日志发现无法备份成功的原因是一直无法获取FTWRL锁,登录MySQL查看会话状态发现其中几个复制worker一致处于异常状态,下发STOP SLAVE命令时命令也一直被卡住,当时的会话状态如下:
> show processlist;
+---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+
| Id      | User            | Host                 | db   | Command | Time     | State                                       | Info                        |
+---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+
|       1 | event_scheduler | localhost            | NULL | Daemon  | 47179666 | Waiting on empty queue                      | NULL                        |
|     101 | system user     |                      | NULL | Connect | 47085737 | Waiting for master to send event            | NULL                        |
|     102 | system user     |                      | NULL | Connect |   112325 | Waiting for dependent transaction to commit | NULL                        |
|     103 | system user     |                      | NULL | Connect |   116446 | Waiting for commit lock                     | NULL                        |
|     104 | system user     |                      | NULL | Connect |   116434 | Waiting for preceding transaction to commit | NULL                        |
|     105 | system user     |                      | NULL | Connect |   590564 | Waiting for an event from Coordinator       | NULL                        |
|     106 | system user     |                      | NULL | Connect |  2902104 | Waiting for an event from Coordinator       | NULL                        |
|     107 | system user     |                      | NULL | Connect |  7057083 | Waiting for an event from Coordinator       | NULL                        |
|     108 | system user     |                      | NULL | Connect | 10948598 | Waiting for an event from Coordinator       | NULL                        |
|     109 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     110 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     111 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     112 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     113 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     114 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     115 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     116 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     117 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     118 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     119 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     120 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     121 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
|     122 | system user     |                      | NULL | Connect | 47085737 | Waiting for an event from Coordinator       | NULL                        |
| 4409099 | backupxxx       | xxx.xxx.xxx.xx:48300 | NULL | Query   |   112222 | Waiting for commit lock                     | FLUSH TABLES WITH READ LOCK |
+---------+-----------------+----------------------+------+---------+----------+---------------------------------------------+-----------------------------+
根据当时的会话状态我们可以得知是发生了元数据锁等待,不过由于这个实例并没有开启元数据锁跟踪,所以并不能通过metadata_locks表来确认元数据锁的详细情况。不过有经验的DBA通过会话状态及故障现象应该能够猜测出来是发生了并行复制死锁,所以当时为了保留现场并快速恢复复制,就打了一个pstack便于后面详细分析问题,并通过kill掉当时的备份任务和会话来解除死锁状态,使复制能够正常运行下去。
接下来我们来分析一下当时的堆栈,还原一下当时MySQL的状态及问题发生的原因。
将堆栈归类,并将可能对我们分析问题有用的线程堆栈提取整理后如下:
#1. Waiting for dependent transaction to commit
pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Mts_submode_logical_clock::wait_for_last_committed_trx(thr_cond.h:140),
Mts_submode_logical_clock::schedule_next_event(rpl_mts_submode.cc:657),schedule_next_event(log_event.cc:2768),Log_event::get_slave_worker(log_event.cc:2909),Log_event::apply_event(log_event.cc:3468),
apply_event_and_update_pos(rpl_slave.cc:4713),exec_relay_log_event(rpl_slave.cc:5240),handle_slave_sql(rpl_slave.cc:5240),pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #2. 持有意向全局读锁(IX) Waiting for preceding transaction to commit pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Commit_order_manager::wait_for_its_turn(thr_cond.h:140),
MYSQL_BIN_LOG::ordered_commit(binlog.cc:9245),MYSQL_BIN_LOG::commit(binlog.cc:8529),ha_commit_trans(handler.cc:1796),trans_commit(transaction.cc:239),Xid_log_event::do_commit(log_event.cc:7016),
Xid_apply_log_event::do_apply_event_worker(log_event.cc:7111),slave_worker_exec_job_group(rpl_rli_pdb.cc:2594),handle_slave_worker(rpl_slave.cc:6161),pfs_spawn_thread(pfs.cc:2190),
start_thread(libpthread.so.0),clone(libc.so.6) #3. STOP SLAVE 等待worker完成当前任务并退出 pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),terminate_slave_thread(thr_cond.h:129),
terminate_slave_thread(rpl_slave.cc:1665),terminate_slave_threads(rpl_slave.cc:1665),stop_slave(rpl_slave.cc:10172),stop_slave(rpl_slave.cc:618),stop_slave_cmd(rpl_slave.cc:759),
mysql_execute_command(sql_parse.cc:3438),mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),
pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #4. 等待获取意向全局读锁(IX) Waiting for commit lock pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),MDL_wait::timed_wait(thr_cond.h:129),
MDL_context::acquire_lock(mdl.cc:3695),ha_commit_trans(handler.cc:1761),trans_commit(transaction.cc:239),Xid_log_event::do_commit(log_event.cc:7016),
Xid_apply_log_event::do_apply_event_worker(log_event.cc:7111),slave_worker_exec_job_group(rpl_rli_pdb.cc:2594),handle_slave_worker(rpl_slave.cc:6161),
pfs_spawn_thread(pfs.cc:2190),start_thread(libpthread.so.0),clone(libc.so.6) #5. 备份线程 等待获取全局读锁(X) Waiting for commit lock pthread_cond_timedwait,native_cond_timedwait(thr_cond.h:129),my_cond_timedwait(thr_cond.h:129),inline_mysql_cond_timedwait(thr_cond.h:129),MDL_wait::timed_wait(thr_cond.h:129),
MDL_context::acquire_lock(mdl.cc:3695),Global_read_lock::make_global_read_lock_block_commit(lock.cc:1196),reload_acl_and_cache(sql_reload.cc:234),mysql_execute_command(sql_parse.cc:4170),
mysql_parse(sql_parse.cc:5582),dispatch_command(sql_parse.cc:1458),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2190),
start_thread(libpthread.so.0),clone(libc.so.6)
我们给这5个线程编了号并大致说明了一下线程的状态,接下来我们详细分析一下每个线程的状态。
先看第一个线程#1所处的函数为Mts_submode_logical_clock::wait_for_last_committed_trx:
    thd->ENTER_COND(&rli->logical_clock_cond, &rli->mts_gaq_LOCK,
                    &stage_worker_waiting_for_commit_parent, &old_stage);
    do
    {
      mysql_cond_wait(&rli->logical_clock_cond, &rli->mts_gaq_LOCK);
    }
    while ((!rli->info_thd->killed && !is_error) &&
           !clock_leq(last_committed_arg, estimate_lwm_timestamp()));
其所处的stage为“Waiting for dependent transaction to commit”。目前其停留在do...while循环中,其正常需要满足两个条件才能跳出循环,第一:last_committed <= lwm,即满足事务满足并行回放的条件;第二:接收到logical_clock_cond条件变量,logical_clock_cond条件变量为worker回放完事务后会判断当前是否还有事务等待,如果存在则发送logical_clock_cond条件变量唤醒等待的coordinator线程。
线程#2所处的函数为Commit_order_manager::wait_for_its_turn,该函数被函数MYSQL_BIN_LOG::ordered_commit调用,其逻辑处于两阶段提交的FLSH STAGE之前,用于判断从库提交事务时是否需要按照和主库一样的顺序进行提交(slave_preserve_commit_order)。线程所处的位置为:
    while (queue_front() != worker->id)
    {
      if (unlikely(worker->found_order_commit_deadlock()))
      {
        mysql_mutex_unlock(&m_mutex);
        thd->EXIT_COND(&old_stage);
        DBUG_RETURN(true);
      }
      mysql_cond_wait(cond, &m_mutex);
    }
其也处在一个while循环中,其跳出循环的条件为只有当前worker是worker队列的头部时才可以。worker应用事务时,不仅会把事务分配给worker,还会将worker注册到工作队列中,其入队的堆栈如下:
#0  Commit_order_manager::queue_push (this=0x7fb1a8007f30, index=0) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.h:123
#1  0x000000000183ba79 in Commit_order_manager::register_trx (this=0x7fb1a8007f30, worker=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.cc:49
#2  0x000000000183adb9 in Mts_submode_logical_clock::get_least_occupied_worker (this=0x7fb1a8007ee0, rli=0xb340c70, ws=0xb343fa8, ev=0x7fb1a8691ce0)
    at /soft/mysql-5.7.21/sql/rpl_mts_submode.cc:916
#3  0x00000000017978b9 in Log_event::get_slave_worker (this=0x7fb1a8691ce0, rli=0xb340c70) at /soft/mysql-5.7.21/sql/log_event.cc:2954
#4  0x0000000001798f77 in Log_event::apply_event (this=0x7fb1a8691ce0, rli=0xb340c70) at /soft/mysql-5.7.21/sql/log_event.cc:3468
#5  0x000000000180b5a4 in apply_event_and_update_pos (ptr_ev=0x7fb255aca8c0, thd=0x7fb1a8000950, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_slave.cc:4713
#6  0x000000000180cc3c in exec_relay_log_event (thd=0x7fb1a8000950, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_slave.cc:5240
#7  0x00000000018131e3 in handle_slave_sql (arg=0xb2a31b0) at /soft/mysql-5.7.21/sql/rpl_slave.cc:7377
#8  0x0000000001ce0555 in pfs_spawn_thread (arg=0x7fb1b00ddab0) at /soft/mysql-5.7.21/storage/perfschema/pfs.cc:2190
#9  0x00007fb2f7745ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fb2f660bb0d in clone () from /lib64/libc.so.6
其出队的堆栈如下:
#0  Commit_order_manager::queue_pop (this=0x7fb1a8007f30) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.h:111
#1  0x000000000183bee9 in Commit_order_manager::unregister_trx (this=0x7fb1a8007f30, worker=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave_commit_order_manager.cc:126
#2  0x00000000017cd0a4 in Stage_manager::enroll_for (this=0x2d1c418 <mysql_bin_log+4568>, stage=Stage_manager::FLUSH_STAGE, thd=0x7fb1ac000950, stage_mutex=0x0)
    at /soft/mysql-5.7.21/sql/binlog.cc:2000
#3  0x00000000017dd652 in MYSQL_BIN_LOG::change_stage (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, stage=Stage_manager::FLUSH_STAGE, queue=0x7fb1ac000950, leave_mutex=0x0,
    enter_mutex=0x2d1b248 <mysql_bin_log+8>) at /soft/mysql-5.7.21/sql/binlog.cc:8854
#4  0x00000000017de2f6 in MYSQL_BIN_LOG::ordered_commit (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, all=false, skip_commit=false) at /soft/mysql-5.7.21/sql/binlog.cc:9251
#5  0x00000000017dcbbe in MYSQL_BIN_LOG::commit (this=0x2d1b240 <mysql_bin_log>, thd=0x7fb1ac000950, all=false) at /soft/mysql-5.7.21/sql/binlog.cc:8529
#6  0x0000000001632b78 in trans_commit_stmt (thd=0x7fb1ac000950) at /soft/mysql-5.7.21/sql/transaction.cc:463
#7  0x000000000153895a in mysql_execute_command (thd=0x7fb1ac000950, first_level=true) at /soft/mysql-5.7.21/sql/sql_parse.cc:5004
#8  0x0000000001539f91 in mysql_parse (thd=0x7fb1ac000950, parser_state=0x7fb255a88a80) at /soft/mysql-5.7.21/sql/sql_parse.cc:5582
#9  0x000000000179cf62 in Query_log_event::do_apply_event (this=0x7fb1a8691ce0, rli=0x7fb1a8022a60, query_arg=0x7fb1a869f2d7 "create table t(id int)", q_len_arg=22)
    at /soft/mysql-5.7.21/sql/log_event.cc:4711
#10 0x000000000179be51 in Query_log_event::do_apply_event (this=0x7fb1a8691ce0, rli=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/log_event.cc:4431
#11 0x00000000017bf204 in Log_event::do_apply_event_worker (this=0x7fb1a8691ce0, w=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/log_event.cc:758
#12 0x0000000001834bee in Slave_worker::slave_worker_exec_event (this=0x7fb1a8022a60, ev=0x7fb1a8691ce0) at /soft/mysql-5.7.21/sql/rpl_rli_pdb.cc:1833
#13 0x0000000001836bb8 in slave_worker_exec_job_group (worker=0x7fb1a8022a60, rli=0xb340c70) at /soft/mysql-5.7.21/sql/rpl_rli_pdb.cc:2594
#14 0x000000000180f9aa in handle_slave_worker (arg=0x7fb1a8022a60) at /soft/mysql-5.7.21/sql/rpl_slave.cc:6161
#15 0x0000000001ce0555 in pfs_spawn_thread (arg=0x7fb1a8027000) at /soft/mysql-5.7.21/storage/perfschema/pfs.cc:2190
#16 0x00007fb2f7745ea5 in start_thread () from /lib64/libpthread.so.0
即事务在ordered_commit中当在切换到FLUSH STAGE的过程中会进行出队,因为此时已经可以通过FLSUH STAGE的&LOCK_LOG互斥量来保证该事务组一定会比后续的事务组先提交。
线程#3是我们下发STOP SLAVE命令的线程,其处于terminate_slave_thread函数中,其正在退出SQL线程:
  while (*slave_running)                        // Should always be true
  {
    DBUG_PRINT("loop", ("killing slave thread"));

    mysql_mutex_lock(&thd->LOCK_thd_data);
    /*
      Error codes from pthread_kill are:
      EINVAL: invalid signal number (can't happen)
      ESRCH: thread already killed (can happen, should be ignored)
    */
#ifndef _WIN32
    int err MY_ATTRIBUTE((unused))= pthread_kill(thd->real_id, SIGUSR1);
    DBUG_ASSERT(err != EINVAL);
#endif
    thd->awake(THD::NOT_KILLED);
    mysql_mutex_unlock(&thd->LOCK_thd_data);

    /*
      There is a small chance that slave thread might miss the first
      alarm. To protect againts it, resend the signal until it reacts
    */
    struct timespec abstime;
    set_timespec(&abstime,2);
#ifndef DBUG_OFF
    int error=
#endif
      mysql_cond_timedwait(term_cond, term_lock, &abstime);    // 在这等待
    if ((*stop_wait_timeout) >= 2)
      (*stop_wait_timeout)= (*stop_wait_timeout) - 2;
    else if (*slave_running)
    {
      if (need_lock_term)
        mysql_mutex_unlock(term_lock);
      DBUG_RETURN (1);
    }
    DBUG_ASSERT(error == ETIMEDOUT || error == 0);
  }

  DBUG_ASSERT(*slave_running == 0);

  if (need_lock_term)
    mysql_mutex_unlock(term_lock);
  DBUG_RETURN(0);
}
其也是处于一个while循环中,循环判断接收线程退出条件变量term_cond,退出条件变量的触发只有当worker线程发现了mi->abort_slave = true退出标记并完成当前的event_group之后才会退出(详细分析可见https://www.cnblogs.com/wagaga/p/17286379.html),所以如果当前worker线程存在未完成的event,用户的STOP SLAVE命令就无法执行完成。
线程#4正处于ha_commit_trans函数中,正在等待获取commit lock:
if (rw_trans && !ignore_global_read_lock)
    {
      /*
        Acquire a metadata lock which will ensure that COMMIT is blocked
        by an active FLUSH TABLES WITH READ LOCK (and vice versa:
        COMMIT in progress blocks FTWRL).

        We allow the owner of FTWRL to COMMIT; we assume that it knows
        what it does.
      */
      MDL_REQUEST_INIT(&mdl_request,
                       MDL_key::COMMIT, "", "", MDL_INTENTION_EXCLUSIVE,
                       MDL_EXPLICIT);

      DBUG_PRINT("debug", ("Acquire MDL commit lock"));
      if (thd->mdl_context.acquire_lock(&mdl_request,
                                        thd->variables.lock_wait_timeout))
      {
        ha_rollback_trans(thd, all);
        DBUG_RETURN(1);
      }
      release_mdl= true;

      DEBUG_SYNC(thd, "ha_commit_trans_after_acquire_commit_lock");
}
即事务在提交之前需要先获取意向独占元数据锁(commit lock),如果当前实例存在全局读锁,则事务就会被阻塞,不能提交。
此时让我们再回头看线程#2,发现其堆栈中间也有ha_commit_trans函数,说明其已经获取到了意向独占元数据锁。
最后看线程#5,其很简单,就是由于xtrabackup备份而触发的需要获取全局读锁,且正处于等待状态。
此时,我们把待分析的线程都分析了一遍,这时候我们就可以理清他们之间的关系了:
  1. 线程#1可能在等待线程#2,也可能在等待线程#4,不过不重要,因为它也只是受害者
  2. 线程#2持有元数据意向独占锁(IX),并等待线程#4的提交
  3. 线程#3等待线程#2和#4完成后停止复制线程,不重要,受害者
  4. 线程#4想要获取意向独占锁(IX),但是却被阻塞,所以在等待线程#5
  5. 线程#5想要获取独占锁(X),但是却被阻塞,所以在等待线程#2
此时死锁就产生了,线程#2由于slave_preserve_commit_order的原因需要等待线程#4,而线程#4等待线程#5,线程#5又等待线程#2。所以我们在最开始kill掉备份会话(线程#5)就可以解决该问题了。
posted @ 2024-01-18 17:54  吃饭端住碗  阅读(139)  评论(0编辑  收藏  举报