20.锁的监控与处理浅谈
1.前言
在Mysql中少不了与各式各样的锁打交道,比如说共享锁与排他锁等等,其中有时我们可能会遇到锁等待的情况,比如说同时开启两个事务对某一行记录进行修改,其中一个事务修改完但是未提交,另一个事务也修改,那么此事它就会遇到锁等待的情况(就是阻塞-->block)
2. show status like 'innodb_row_lock%'
root@localhost 17:19: [liulin]> show status like 'innodb_row_lock%'; +-------------------------------+-------+ | Variable_name | Value | +-------------------------------+-------+ | Innodb_row_lock_current_waits | 1 | # 当前处于锁等待的个数 | Innodb_row_lock_time | 51068 | #InnoDB 表等待获取行锁的总时间开销,单位毫秒,
| Innodb_row_lock_time_avg | 25534 | | Innodb_row_lock_time_max | 51068 | | Innodb_row_lock_waits | 2 | # 这个表示数据库从启动开始直到现在共发生锁等待的次数(包括目前正在处于锁等待的个数) +-------------------------------+-------+
3. select * from information_schema.innodb_trx where trx_state='lock wait'\G;
root@localhost 20:28: [information_schema]> select * from information_schema.innodb_trx where trx_state='lock wait'\G; *************************** 1. row *************************** trx_id: 35182 trx_state: LOCK WAIT trx_started: 2021-07-31 17:18:11 trx_requested_lock_id: 35182:116:3:11 trx_wait_started: 2021-07-31 20:27:29 trx_weight: 2 trx_mysql_thread_id: 37 trx_query: update t1 set name='liulin1' where id=1 trx_operation_state: starting index read trx_tables_in_use: 1 trx_tables_locked: 1 trx_lock_structs: 2 trx_lock_memory_bytes: 1136 trx_rows_locked: 4 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 0 trx_is_read_only: 0 trx_autocommit_non_locking: 0 1 row in set (0.00 sec)
- 这个表示查找处于锁等待的事务的详细信息,通俗的说是它目前想要获取写锁,但是该锁被被的事务提前一步获取且还未释放,因此它想要获取就处于了阻塞状态了(blocked).
- 一般这里我们我们会设置一下锁超时等待时间的,参数为innodb_lock_wait_timeout,默认是50s,表示innoDB 事务请求行锁的超时时间限制参数
- select * from information_schema.innodb_trx :可以查找出当前数据库中所有未提交的事务
4. select * from sys.innodb_lock_waits; 查看被锁事务和锁源事务的信息
root@localhost 20:48: [sys]> select * from innodb_lock_waits \G; *************************** 1. row *************************** wait_started: 2021-07-31 20:48:26 wait_age: 00:00:08 wait_age_secs: 8 locked_table: `liulin`.`t1` locked_index: PRIMARY locked_type: RECORD waiting_trx_id: 35182 waiting_trx_started: 2021-07-31 17:18:11 waiting_trx_age: 03:30:23 waiting_trx_rows_locked: 7 waiting_trx_rows_modified: 0 waiting_pid: 37 waiting_query: update t1 set name='liulin1' where id=1 waiting_lock_id: 35182:116:3:11 waiting_lock_mode: X blocking_trx_id: 35181 blocking_pid: 36 blocking_query: NULL blocking_lock_id: 35181:116:3:11 blocking_lock_mode: X blocking_trx_started: 2021-07-31 17:17:12 blocking_trx_age: 03:31:22 blocking_trx_rows_locked: 1 blocking_trx_rows_modified: 1 sql_kill_blocking_query: KILL QUERY 36 sql_kill_blocking_connection: KILL 36 1 row in set, 3 warnings (0.00 sec)
其中:waiting_trx_id:被锁的事务id waiting_pid :被锁的线程id
blocking_trx_id:锁源的事务id blocking_pid :表示锁源的线程id
5. select * from performance_schema.threads where processlist_id='xxx'
root@localhost 21:01: [performance_schema]> select * from threads where processlist_id=36\G; *************************** 1. row *************************** THREAD_ID: 62 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 36 PROCESSLIST_USER: root PROCESSLIST_HOST: localhost PROCESSLIST_DB: liulin PROCESSLIST_COMMAND: Sleep PROCESSLIST_TIME: 13441 PROCESSLIST_STATE: NULL PROCESSLIST_INFO: update t1 set name='liulin1' where id=1 PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: Socket THREAD_OS_ID: 4548 1 row in set (0.00 sec)
6. select * from performance_schema.events_statements_current where thread_id='xxx';
root@localhost 22:24: [performance_schema]> select * from performance_schema.events_statements_current where thread_id='62'\G; *************************** 1. row *************************** THREAD_ID: 62 EVENT_ID: 46 END_EVENT_ID: 46 EVENT_NAME: statement/sql/update SOURCE: TIMER_START: 7601338054145000 TIMER_END: 7601339393285000 TIMER_WAIT: 1339140000 LOCK_TIME: 277000000 SQL_TEXT: update t1 set name='liulin1' where id=1 DIGEST: e63566e2378e6dbf044cfe5345a79a9a DIGEST_TEXT: UPDATE `t1` SET NAME = ? WHERE `id` = ? CURRENT_SCHEMA: liulin OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 0 RETURNED_SQLSTATE: 00000 MESSAGE_TEXT: Rows matched: 1 Changed: 1 Warnings: 0 ERRORS: 0 WARNINGS: 0 ROWS_AFFECTED: 1 ROWS_SENT: 0 ROWS_EXAMINED: 1 CREATED_TMP_DISK_TABLES: 0 CREATED_TMP_TABLES: 0 SELECT_FULL_JOIN: 0 SELECT_FULL_RANGE_JOIN: 0 SELECT_RANGE: 0 SELECT_RANGE_CHECK: 0 SELECT_SCAN: 0 SORT_MERGE_PASSES: 0 SORT_RANGE: 0 SORT_ROWS: 0 SORT_SCAN: 0 NO_INDEX_USED: 0 NO_GOOD_INDEX_USED: 0 NESTING_EVENT_ID: NULL NESTING_EVENT_TYPE: NULL NESTING_EVENT_LEVEL: 0 1 row in set (0.00 sec)
总结:
show status like 'innodb_row_lock%' ---> select * from information_schema.innodb.trx where trx_state='lock wait' -->
select * from sys.innodb_lock_waits ---> select * from performance_schema.threads where processlist_id='xxx' --->
select * from performance_schema.events_statements_current where thread_id='xxx';
其中,第二个命令如果是只是查看锁等待的事务的详细信息的话,我们可以直接用第三个命令代替,因为第二个命令还可以查看所有状态的事务的详细信息。然后第三个命令也是查看锁源事务和被锁事务的一些详细信息,然后第四个命令可以针对连接线程id查看其内部的Mysql线程id,最后第五个命令最后再根据第四个查到的Mysql内部的线程id查看到该线程id对应的当前线程id中对应的事件。
案例篇:
背景:
硬件环境:DELL R720 , 16核,48G,SAS 900G, RAID10
在巡检的时候发现,在某个时间段的cpu压力非常高
排查和处理过程
1.通过top命令查看,发现Mysqld进程占比非常高
2.用vmstat查看发现大量的CPU是被用作了sys和wait,us处于正常
3,这时怀疑是Mysql的锁或者sql语句出现了问题
4.通过show status like 'innodb_row_lock%'查看有大量的所等待以及通过查看慢日志发现有少量的慢语句
5.其中关于慢日志可以通过pt-query-disget工具进行分析