代码改变世界

Performance Schema: Measure Multi-Threaded Slave Activity

2022-08-03 14:53  abce  阅读(33)  评论(0编辑  收藏  举报

Performance Schema

In many types of database workloads, using a multi-threaded slave from 5.6+ helps improve replication performance. I’ve had a number of users enable this feature, but have not seen anyone ask how each thread is performing. Here’s a quick way with Performance_Schema to measure the amount of multi-threaded slave activity on each thread—after you have already configured MTS on your slave of course.

First, we need to enable the statements instruments:

slave1> UPDATE setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_statements_%'; 
Query OK, 2 rows affected (0.00 sec) 
Rows matched: 3  Changed: 2  Warnings: 0 

Next, let’s find the THREAD_ID for our slave workers:

slave1> SELECT THREAD_ID, NAME FROM threads WHERE NAME LIKE '%worker';
+-----------+-------------------------+
| THREAD_ID | NAME                    |
+-----------+-------------------------+ 
|        28 | thread/sql/slave_worker | 
|        29 | thread/sql/slave_worker | 
|        30 | thread/sql/slave_worker | 
|        31 | thread/sql/slave_worker | 
+-----------+-------------------------+ 
4 rows in set (0.00 sec) 

And then we capture measurements:

slave1> SELECT THREAD_ID, EVENT_NAME, COUNT_STAR FROM events_statements_summary_by_thread_by_event_name WHERE THREAD_ID IN (28,29,30,31) AND COUNT_STAR > 0; 
+-----------+----------------------+------------+ 
| THREAD_ID | EVENT_NAME           | COUNT_STAR | 
+-----------+----------------------+------------+ 
|        28 | statement/sql/update |         48 | 
|        28 | statement/sql/insert |         24 | 
|        28 | statement/sql/delete |         24 | 
|        28 | statement/sql/begin  |         24 | 
|        29 | statement/sql/update |         68 | 
|        29 | statement/sql/insert |         34 | 
|        29 | statement/sql/delete |         34 | 
|        29 | statement/sql/begin  |         34 | 
|        30 | statement/sql/update |       1864 | 
|        30 | statement/sql/insert |        932 | 
|        30 | statement/sql/delete |        932 | 
|        30 | statement/sql/begin  |        932 | 
|        31 | statement/sql/update |      40320 | 
|        31 | statement/sql/insert |      20160 | 
|        31 | statement/sql/delete |      20160 | 
|        31 | statement/sql/begin  |      20160 | 
+-----------+----------------------+------------+ 
16 rows in set (0.01 sec)
  

As you can see above, one of the multi-threadedthreads is doing more work compared to the other three. This is a sysbench on the master in the database doing [--trx-rate=10], compared to the other three which are only doing [--trx-rate=2|5].

Below is another example, this time including the event times. With a slightly modified sysbench test, I’ve designed the tests to send large updates but one with a low transaction rate, another with a high transaction rate, and the rest in between. You can see specifically in threads 32 and 34, that the former is performing a lower amount of [UPDATES], but spending more time per event.

slave1> SELECT THREAD_ID, EVENT_NAME, COUNT_STAR,
->   SUM_TIMER_WAIT/1000000 AS SUM_TIMER_WAIT_MS,
    ->   (SUM_TIMER_WAIT/COUNT_STAR)/1000000 AS AVG_TIME_MS
    -> FROM events_statements_summary_by_thread_by_event_name
    -> WHERE THREAD_ID IN (31,32,33,34) AND COUNT_STAR > 0; 
+-----------+----------------------+------------+-------------------+--------------+ 
| THREAD_ID | EVENT_NAME           | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS  | 
+-----------+----------------------+------------+-------------------+--------------+ 
|        31 | statement/sql/update |          3 |          403.9690 | 134.65633333 | 
|        31 | statement/sql/insert |          1 |           91.5340 |  91.53400000 | 
|        31 | statement/sql/delete |          1 |           75.9540 |  75.95400000 | 
|        31 | statement/sql/begin  |          1 |           11.6800 |  11.68000000 | 
|        32 | statement/sql/update |         75 |        25298.5090 | 337.31345333 | 
|        32 | statement/sql/insert |         25 |         2221.1410 |  88.84564000 | 
|        32 | statement/sql/delete |         25 |         2187.8970 |  87.51588000 | 
|        32 | statement/sql/begin  |         25 |          321.0220 |  12.84088000 | 
|        33 | statement/sql/update |       4728 |      1008736.6000 | 213.35376481 | 
|        33 | statement/sql/insert |       1576 |       138815.0570 |  88.08061992 | 
|        33 | statement/sql/delete |       1576 |       136161.9060 |  86.39714848 | 
|        33 | statement/sql/begin  |       1576 |        20498.2300 |  13.00649112 | 
|        34 | statement/sql/update |      70668 |     12304784.2380 | 174.12101995 | 
|        34 | statement/sql/insert |      23556 |      2083454.5350 |  88.44687277 | 
|        34 | statement/sql/delete |      23556 |      2041202.7690 |  86.65319957 | 
|        34 | statement/sql/begin  |      23556 |       303710.4860 |  12.89312642 | 
+-----------+----------------------+------------+-------------------+--------------+ 
16 rows in set (0.00 sec)

Exploring a bit further, one thing I noticed is that when using [binlog_format=ROW] for replication, you will only see the [BEGIN] events instrumented from [performance_schema]. I think this is a bug so I reported it here.

slave1> SELECT THREAD_ID, EVENT_NAME, COUNT_STAR,     
->   SUM_TIMER_WAIT/1000000 AS SUM_TIMER_WAIT_MS,
    ->   (SUM_TIMER_WAIT/COUNT_STAR)/1000000 AS AVG_TIME_MS
    -> FROM events_statements_summary_by_thread_by_event_name
    -> WHERE THREAD_ID IN (40,41,42,43) AND COUNT_STAR > 0; 
+-----------+---------------------+------------+-------------------+-------------+ 
| THREAD_ID | EVENT_NAME          | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS | 
+-----------+---------------------+------------+-------------------+-------------+ 
|        40 | statement/sql/begin |         16 |          258.6500 | 16.16562500 | 
|        41 | statement/sql/begin |         91 |         1526.4320 | 16.77397802 | 
|        42 | statement/sql/begin |        589 |        10131.4500 | 17.20110357 | 
|        43 | statement/sql/begin |       5022 |        85584.0250 | 17.04182099 | 
+-----------+---------------------+------------+-------------------+-------------+ 
4 rows in set (0.01 sec)

With MySQL 5.7, MTS has an additional improvement with how parallel replication is applied. By default, transactions are applied in parallel per database. With the new [slave_parallel_type] feature, another option called [LOGICAL_CLOCK]  is introduced that allows parallelization intra-database as well. I did some small tests below, with same workload as above. First with the default [DATABASE] type:

+-----------+----------------------+------------+-------------------+--------------+
| THREAD_ID | EVENT_NAME           | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS  | 
+-----------+----------------------+------------+-------------------+--------------+ 
|        65 | statement/sql/update |         51 |         9974.1310 | 195.57119608 | 
|        65 | statement/sql/insert |         17 |         2037.7080 | 119.86517647 | 
|        65 | statement/sql/delete |         17 |         2144.4800 | 126.14588235 | 
|        65 | statement/sql/begin  |         17 |          293.0650 |  17.23911765 | 
|        66 | statement/sql/update |        255 |        67131.4980 | 263.26077647 |
|        66 | statement/sql/insert |         85 |         9629.5520 | 113.28884706 |
|        66 | statement/sql/delete |         85 |         9854.2920 | 115.93284706 | 
|        66 | statement/sql/begin  |         85 |         1405.0200 |  16.52964706 | 
|        67 | statement/sql/update |       1215 |       344712.6330 | 283.71410123 | 
|        67 | statement/sql/insert |        405 |        48000.0110 | 118.51854568 | 
|        67 | statement/sql/delete |        405 |        53222.3010 | 131.41308889 | 
|        67 | statement/sql/begin  |        405 |         6836.9070 |  16.88125185 | 
|        68 | statement/sql/update |       5943 |      1820669.3290 | 306.35526317 | 
|        68 | statement/sql/insert |       1981 |       241513.1400 | 121.91476022 | 
|        68 | statement/sql/delete |       1981 |       245022.2450 | 123.68614084 | 
|        68 | statement/sql/begin  |       1981 |        32978.3390 |  16.64731903 | 
+-----------+----------------------+------------+-------------------+--------------+

And then with [LOGICAL_CLOCK]:

+-----------+----------------------+------------+-------------------+--------------+
| THREAD_ID | EVENT_NAME           | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS  |
+-----------+----------------------+------------+-------------------+--------------+
|        59 | statement/sql/update |       9486 |      2667681.7170 | 281.22303574 |
|        59 | statement/sql/insert |       3162 |       376350.1650 | 119.02282258 |
|        59 | statement/sql/delete |       3162 |       388606.5460 | 122.89897090 |
|        59 | statement/sql/begin  |       3162 |        53988.2600 |  17.07408602 |
|        60 | statement/sql/update |        903 |       362853.2080 | 401.83079513 |
|        60 | statement/sql/insert |        301 |        36507.3090 | 121.28674086 |
|        60 | statement/sql/delete |        301 |        37437.2550 | 124.37626246 |
|        60 | statement/sql/begin  |        301 |         5210.4110 |  17.31033555 |
|        61 | statement/sql/update |         42 |        23394.0330 | 557.00078571 |
|        61 | statement/sql/insert |         14 |         1671.5900 | 119.39928571 |
|        61 | statement/sql/delete |         14 |         1720.1230 | 122.86592857 |
|        61 | statement/sql/begin  |         14 |          246.1490 |  17.58207143 |
+-----------+----------------------+------------+-------------------+--------------+

With [LOGICAL_LOCK] and [slave_preserve_commit_order] enabled:

+-----------+----------------------+------------+-------------------+--------------+
| THREAD_ID | EVENT_NAME           | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS  |
+-----------+----------------------+------------+-------------------+--------------+
|        71 | statement/sql/update |       8097 |      2216461.7170 | 273.73863369 |
|        71 | statement/sql/insert |       2699 |       322933.1620 | 119.64918933 |
|        71 | statement/sql/delete |       2699 |       326944.2380 | 121.13532345 |
|        71 | statement/sql/begin  |       2699 |        45860.5720 |  16.99169026 |
|        72 | statement/sql/update |        807 |       256668.2730 | 318.05238290 |
|        72 | statement/sql/insert |        269 |        32952.5570 | 122.50021190 |
|        72 | statement/sql/delete |        269 |        33346.3060 | 123.96396283 |
|        72 | statement/sql/begin  |        269 |         4650.1010 |  17.28662082 |
|        73 | statement/sql/update |         33 |         6823.1170 | 206.76112121 |
|        73 | statement/sql/insert |         11 |         1512.5810 | 137.50736364 |
|        73 | statement/sql/delete |         11 |         1593.5790 | 144.87081818 |
|        73 | statement/sql/begin  |         11 |          188.3910 |  17.12645455 |
+-----------+----------------------+------------+-------------------+--------------+

Combining the [INSERT], [UPDATE], and [DELETE] counts from these 3 tests, [LOGICAL_CLOCK] tops with around 17k events within 120 seconds of the same workload. The [DATABASE]  type was the slowest, recording only about 12k events committed on the slave.

Lastly, with MySQL 5.7 a new instrument has been added and enabled for metrics to be collected. You can also measure the per-transaction rate instead of per event class.

slave1> UPDATE setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'vents_transactions%'; 
Query OK, 0 rows affected (0.00 sec) 
Rows matched: 0  Changed: 0  Warnings: 0   

slave1> UPDATE setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'transaction'; 
Query OK, 1 row affected (0.00 sec) 
Rows matched: 1  Changed: 1  Warnings: 0

And a quick measurement:

slave1> SELECT THREAD_ID, EVENT_NAME, COUNT_STAR,
->   SUM_TIMER_WAIT/1000000 AS SUM_TIMER_WAIT_MS,     
    ->   (SUM_TIMER_WAIT/COUNT_STAR)/1000000 AS AVG_TIME_MS     
    -> FROM events_transactions_summary_by_thread_by_event_name     
    -> WHERE THREAD_ID IN (     
    ->     SELECT THREAD_ID FROM replication_applier_status_by_worker     
    -> ) AND COUNT_STAR > 0; 
+-----------+-------------+------------+-------------------+----------------+ 
| THREAD_ID | EVENT_NAME  | COUNT_STAR | SUM_TIMER_WAIT_MS | AVG_TIME_MS    | 
+-----------+-------------+------------+-------------------+----------------+ 
|        71 | transaction |       4849 |     50323942.6710 | 10378.21049103 | 
|        72 | transaction |        487 |      6421399.4580 | 13185.62517043 | 
|        73 | transaction |         17 |       319024.9280 | 18766.17223529 | 
+-----------+-------------+------------+-------------------+----------------+ 
3 rows in set (0.00 sec)

Of course, not only statement, events, and transactions can be measured; you can also measure wait times and stages as well. Happy instrumenting!

This article was written by Jervin Real.

 

https://dzone.com/articles/performance-schema-measure-multi-threaded-slave-ac