Percona-Tookit工具包之pt-stalk
Preface
We have a lot of methods to diagnose problems in our system such as strace,pstack,gstack,gdb,pt-pmp,etc.But sometimes there will be some fitful performance issues which are not so easy to trace.Thus,pt-stalk may help us in diagnosing these kind of problems.
Introduce
pt-stalk is a tool to collect detail diagnostic data base on triggers you specified such as gdb,oprofile,strace,tcpdump.The trigger is not the conception of trigger in database.They're different at all.pt-stalk provides various options to collect comprehensive data you need.It's really useful and helpful in performance diagnosing.Let's see some details of it.
Procedure
Usage
1 pt-stalk [OPTIONS]
Common Parameters
1 --collect //Collect diagostic data based on spedified triggers. 2 --collect-gdb //Collect gdb data.It prints stack traces from all threads. 3 --collect-oprofile //Collect oprofile data. 4 --collect-strace //Collect strace data.Do not specify it when "--collect-gdb" has been used. 5 --collect-tcpdump //Collect tcpdump data. 6 --cycles //Limite the times when triggering condition occurs(default "5"). 7 --dest //Specify the position to store diagnostic data(default "/var/lib/pt-stalk"). 8 --disk-bytes-free //Prevent the lack of least disk space.It does not collect data until it has the necessary disk space(default "100M",valid surffixes of unit is k,M,G and T). 9 --disk-pct-free //It's similar with "--disk-bytes-free" but specify the percentage of disk space. 10 --function //Specify the contents to watch for the triggers(default "status",other value is "processlist" or <yourfilename>). 11 --iterations //Limit the times to collect.It will run forever if not give a specific value. 12 --log //Specify the postion to record logs(default "/var/log/pt-stalk.log").It only generates when deamonized. 13 --match //The pattern to specify when using "--function processlist". 14 --mysql-only //Merely collect the MySQL relevant diagnostic data but disk space is the exception. 15 --retention-time //Specify the purge days of diagnostic data(default "30"). 16 --run-time //Specify how many seconds to collect diagnostic data(default "30").It should not be longer than the value of "--sleep". 17 --sleep //Specify how many seconds to sleep after last collection.It used to prevent too many operations of collection(default "300"). 18 --stalk //Watch and wait for trigger occur(default "yes"). 19 --no_stalk //The option can be specify if you want to collect diagnostic data immediately without waiting for trigger to occur. 20 --threshold //Specify the maximum value for collecting(default "25"). 21 --variable //Specify the variables to compared against(default "Threads_running").
Examples
Generate benchmark with sysbench.
1 [root@zlm2 08:07:54 ~/sysbench-1.0/src/lua] 2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb cleanup 3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 4 5 Dropping table 'sbtest1'... 6 Dropping table 'sbtest2'... 7 Dropping table 'sbtest3'... 8 Dropping table 'sbtest4'... 9 Dropping table 'sbtest5'... 10 Dropping table 'sbtest6'... 11 Dropping table 'sbtest7'... 12 Dropping table 'sbtest8'... 13 Dropping table 'sbtest9'... 14 Dropping table 'sbtest10'... 15 16 [root@zlm2 08:08:08 ~/sysbench-1.0/src/lua] 17 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb prepare 18 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 19 20 Creating table 'sbtest1'... 21 Inserting 100000 records into 'sbtest1' 22 Creating a secondary index on 'sbtest1'... 23 Creating table 'sbtest2'... 24 Inserting 100000 records into 'sbtest2' 25 Creating a secondary index on 'sbtest2'... 26 Creating table 'sbtest3'... 27 Inserting 100000 records into 'sbtest3' 28 Creating a secondary index on 'sbtest3'... 29 Creating table 'sbtest4'... 30 Inserting 100000 records into 'sbtest4' 31 Creating a secondary index on 'sbtest4'... 32 Creating table 'sbtest5'... 33 Inserting 100000 records into 'sbtest5' 34 Creating a secondary index on 'sbtest5'... 35 Creating table 'sbtest6'... 36 Inserting 100000 records into 'sbtest6' 37 Creating a secondary index on 'sbtest6'... 38 Creating table 'sbtest7'... 39 Inserting 100000 records into 'sbtest7' 40 Creating a secondary index on 'sbtest7'... 41 Creating table 'sbtest8'... 42 Inserting 100000 records into 'sbtest8' 43 Creating a secondary index on 'sbtest8'... 44 Creating table 'sbtest9'... 45 Inserting 100000 records into 'sbtest9' 46 Creating a secondary index on 'sbtest9'... 47 Creating table 'sbtest10'... 48 Inserting 100000 records into 'sbtest10' 49 Creating a secondary index on 'sbtest10'... 50 51 [root@zlm2 08:09:23 ~/sysbench-1.0/src/lua] 52 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --threads=3 --time=3600 --report-interval=10 --rand-type=uniform run 53 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 54 55 Running the test with following options: 56 Number of threads: 3 57 Report intermediate results every 10 second(s) 58 Initializing random number generator from current time 59 60 61 Initializing worker threads... 62 63 Threads started! 64 65 [ 10s ] thds: 3 tps: 258.68 qps: 5176.49 (r/w/o: 3624.11/1034.82/517.56) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00 66 [ 20s ] thds: 3 tps: 286.25 qps: 5726.15 (r/w/o: 4008.67/1144.89/572.60) lat (ms,95%): 15.00 err/s: 0.00 reconn/s: 0.00 67 [ 30s ] thds: 3 tps: 270.82 qps: 5416.13 (r/w/o: 3790.80/1083.69/541.64) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 68 [ 40s ] thds: 3 tps: 280.98 qps: 5619.75 (r/w/o: 3934.26/1123.53/561.97) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 69 [ 50s ] thds: 3 tps: 298.42 qps: 5968.38 (r/w/o: 4177.83/1193.70/596.85) lat (ms,95%): 14.46 err/s: 0.00 reconn/s: 0.00 70 [ 60s ] thds: 3 tps: 278.88 qps: 5578.20 (r/w/o: 3904.92/1115.52/557.76) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00 71 [ 70s ] thds: 3 tps: 280.91 qps: 5617.66 (r/w/o: 3932.21/1123.63/561.82) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00 72 [ 80s ] thds: 3 tps: 281.68 qps: 5632.83 (r/w/o: 3942.77/1126.71/563.35) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 73 [ 90s ] thds: 3 tps: 281.60 qps: 5631.55 (r/w/o: 3942.07/1126.39/563.10) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 74 [ 100s ] thds: 3 tps: 287.62 qps: 5753.86 (r/w/o: 4028.02/1150.49/575.35) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00 75 [ 110s ] thds: 3 tps: 308.99 qps: 6180.45 (r/w/o: 4326.12/1236.35/617.97) lat (ms,95%): 13.95 err/s: 0.00 reconn/s: 0.00 76 ... //Omitted.
Collect diagnostic data using pt-stalk.
1 [root@zlm2 08:14:36 /data/mysql/mysql3306] 2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connect --threshold 2 3 mysql: [Warning] Using a password on the command line interface can be insecure. 4 2018_07_09_08_15_46 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connect --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin= 5 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 6 2018_07_09_08_15_46 Detected value is empty; something failed? Trigger exit status: 0 7 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 8 2018_07_09_08_15_47 Detected value is empty; something failed? Trigger exit status: 0 9 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 10 2018_07_09_08_15_48 Detected value is empty; something failed? Trigger exit status: 0 11 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 12 2018_07_09_08_15_50 Detected value is empty; something failed? Trigger exit status: 0 13 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 14 2018_07_09_08_15_51 Detected value is empty; something failed? Trigger exit status: 0 15 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 16 2018_07_09_08_15_52 Detected value is empty; something failed? Trigger exit status: 0 17 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 18 2018_07_09_08_15_53 Detected value is empty; something failed? Trigger exit status: 0 19 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 20 2018_07_09_08_15_54 Detected value is empty; something failed? Trigger exit status: 0 21 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 22 2018_07_09_08_15_55 Detected value is empty; something failed? Trigger exit status: 0 23 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 24 2018_07_09_08_15_56 Detected value is empty; something failed? Trigger exit status: 0 25 ^C2018_07_09_08_15_57 Caught signal, exiting 26 2018_07_09_08_15_57 All subprocesses have finished 27 2018_07_09_08_15_57 Exiting because OKTORUN is false 28 2018_07_09_08_15_57 /usr/bin/pt-stalk exit status 1 29 30 //The value of "--variable" should be "Threads_connected". 31 //Modify the correct value and run it again. 32 33 [root@zlm2 08:15:57 /data/mysql/mysql3306] 34 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connected --threshold 2 35 mysql: [Warning] Using a password on the command line interface can be insecure. 36 2018_07_09_08_19_39 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin= 37 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 38 2018_07_09_08_19_39 Check results: status(Threads_connected)=5, matched=yes, cycles_true=1 39 2018_07_09_08_19_39 Collect 1 triggered 40 2018_07_09_08_19_39 Collect 1 PID 5921 41 2018_07_09_08_19_39 Collect 1 done 42 2018_07_09_08_19_39 Sleeping 300 seconds after collect 43 mysqladmin: [Warning] Using a password on the command line interface can be insecure. 44 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1 45 2018_07_09_08_24_39 Collect 2 triggered 46 2018_07_09_08_24_39 Collect 2 PID 7398 47 2018_07_09_08_24_39 Collect 2 done 48 2018_07_09_08_24_40 Sleeping 300 seconds after collect 49 ^C2018_07_09_08_25_56 Caught signal, exiting //Terminate to collect diagnostic data by "Ctrl+C" 50 2018_07_09_08_25_56 Waiting up to 90 seconds for subprocesses to finish... 51 2018_07_09_08_25_56 Exiting because OKTORUN is false 52 2018_07_09_08_25_56 /usr/bin/pt-stalk exit status 1
Check the value of "show global status ... ".
1 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%'; 2 +------------------------+-------+ 3 | Variable_name | Value | 4 +------------------------+-------+ 5 | Delayed_insert_threads | 0 | 6 | Slow_launch_threads | 0 | 7 | Threads_cached | 1 | 8 | Threads_connected | 5 | //Threads_connected has been up to 5. 9 | Threads_created | 5 | 10 | Threads_running | 1 | 11 +------------------------+-------+ 12 6 rows in set (0.04 sec) 13 14 15 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%'; 16 +------------------------+-------+ 17 | Variable_name | Value | 18 +------------------------+-------+ 19 | Delayed_insert_threads | 0 | 20 | Slow_launch_threads | 0 | 21 | Threads_cached | 3 | 22 | Threads_connected | 6 | //Second time,Threads_connected has been up to 6. 23 | Threads_created | 9 | 24 | Threads_running | 1 | 25 +------------------------+-------+ 26 6 rows in set (0.01 sec)
Check the output files in default directory("/var/lib/pt-stalk").
1 [root@zlm2 08:31:41 /var/lib/pt-stalk] 2 #ls -lrt 3 total 3928 4 -rw-r--r-- 1 root root 383 Jul 9 08:19 2018_07_09_08_19_39-trigger 5 -rw-r--r-- 1 root root 9845 Jul 9 08:19 2018_07_09_08_19_39-pmap 6 -rw-r--r-- 1 root root 16554 Jul 9 08:19 2018_07_09_08_19_39-variables 7 -rw-r--r-- 1 root root 4352 Jul 9 08:19 2018_07_09_08_19_39-innodbstatus1 8 -rw-r--r-- 1 root root 384 Jul 9 08:19 2018_07_09_08_19_39-mutex-status1 9 -rw-r--r-- 1 root root 8507 Jul 9 08:19 2018_07_09_08_19_39-ps 10 -rw-r--r-- 1 root root 10356 Jul 9 08:19 2018_07_09_08_19_39-lsof 11 -rw-r--r-- 1 root root 5280 Jul 9 08:19 2018_07_09_08_19_39-opentables1 12 -rw-r--r-- 1 root root 8204 Jul 9 08:19 2018_07_09_08_19_39-top 13 -rw-r--r-- 1 root root 28343 Jul 9 08:19 2018_07_09_08_19_39-sysctl 14 -rw-r--r-- 1 root root 139 Jul 9 08:20 2018_07_09_08_19_39-disk-space 15 -rw-r--r-- 1 root root 1084769 Jul 9 08:20 2018_07_09_08_19_39-mysqladmin 16 -rw-r--r-- 1 root root 2782 Jul 9 08:20 2018_07_09_08_19_39-vmstat 17 -rw-r--r-- 1 root root 24750 Jul 9 08:20 2018_07_09_08_19_39-procstat 18 -rw-r--r-- 1 root root 14403 Jul 9 08:20 2018_07_09_08_19_39-diskstats 19 -rw-r--r-- 1 root root 72469 Jul 9 08:20 2018_07_09_08_19_39-procvmstat 20 -rw-r--r-- 1 root root 51960 Jul 9 08:20 2018_07_09_08_19_39-netstat_s 21 -rw-r--r-- 1 root root 296850 Jul 9 08:20 2018_07_09_08_19_39-slabinfo 22 -rw-r--r-- 1 root root 33210 Jul 9 08:20 2018_07_09_08_19_39-interrupts 23 -rw-r--r-- 1 root root 48390 Jul 9 08:20 2018_07_09_08_19_39-netstat 24 -rw-r--r-- 1 root root 37260 Jul 9 08:20 2018_07_09_08_19_39-meminfo 25 -rw-r--r-- 1 root root 17100 Jul 9 08:20 2018_07_09_08_19_39-df 26 -rw-r--r-- 1 root root 40753 Jul 9 08:20 2018_07_09_08_19_39-processlist 27 -rw-r--r-- 1 root root 4555 Jul 9 08:20 2018_07_09_08_19_39-innodbstatus2 28 -rw-r--r-- 1 root root 74476 Jul 9 08:20 2018_07_09_08_19_39-transactions 29 -rw-r--r-- 1 root root 5 Jul 9 08:20 2018_07_09_08_19_39-hostname 30 -rw-r--r-- 1 root root 385 Jul 9 08:20 2018_07_09_08_19_39-mutex-status2 31 -rw-r--r-- 1 root root 5280 Jul 9 08:20 2018_07_09_08_19_39-opentables2 32 -rw-r--r-- 1 root root 325 Jul 9 08:20 2018_07_09_08_19_39-vmstat-overall 33 -rw-r--r-- 1 root root 24763 Jul 9 08:20 2018_07_09_08_19_39-output 34 35 //There're a series of files(starts with "tigger" and ends with "output") each time we collect the diagnostic data. 36 37 -rw-r--r-- 1 root root 383 Jul 9 08:24 2018_07_09_08_24_39-trigger 38 -rw-r--r-- 1 root root 10251 Jul 9 08:24 2018_07_09_08_24_39-pmap 39 -rw-r--r-- 1 root root 16554 Jul 9 08:24 2018_07_09_08_24_39-variables 40 -rw-r--r-- 1 root root 4353 Jul 9 08:24 2018_07_09_08_24_39-innodbstatus1 41 -rw-r--r-- 1 root root 8583 Jul 9 08:24 2018_07_09_08_24_39-ps 42 -rw-r--r-- 1 root root 386 Jul 9 08:24 2018_07_09_08_24_39-mutex-status1 43 -rw-r--r-- 1 root root 28343 Jul 9 08:24 2018_07_09_08_24_39-sysctl 44 -rw-r--r-- 1 root root 10534 Jul 9 08:24 2018_07_09_08_24_39-lsof 45 -rw-r--r-- 1 root root 5280 Jul 9 08:24 2018_07_09_08_24_39-opentables1 46 -rw-r--r-- 1 root root 8053 Jul 9 08:24 2018_07_09_08_24_39-top 47 -rw-r--r-- 1 root root 139 Jul 9 08:25 2018_07_09_08_24_39-disk-space 48 -rw-r--r-- 1 root root 2783 Jul 9 08:25 2018_07_09_08_24_39-vmstat 49 -rw-r--r-- 1 root root 1084769 Jul 9 08:25 2018_07_09_08_24_39-mysqladmin 50 -rw-r--r-- 1 root root 24899 Jul 9 08:25 2018_07_09_08_24_39-procstat 51 -rw-r--r-- 1 root root 48390 Jul 9 08:25 2018_07_09_08_24_39-netstat 52 -rw-r--r-- 1 root root 296850 Jul 9 08:25 2018_07_09_08_24_39-slabinfo 53 -rw-r--r-- 1 root root 52230 Jul 9 08:25 2018_07_09_08_24_39-netstat_s 54 -rw-r--r-- 1 root root 33210 Jul 9 08:25 2018_07_09_08_24_39-interrupts 55 -rw-r--r-- 1 root root 37260 Jul 9 08:25 2018_07_09_08_24_39-meminfo 56 -rw-r--r-- 1 root root 14460 Jul 9 08:25 2018_07_09_08_24_39-diskstats 57 -rw-r--r-- 1 root root 17100 Jul 9 08:25 2018_07_09_08_24_39-df 58 -rw-r--r-- 1 root root 72914 Jul 9 08:25 2018_07_09_08_24_39-procvmstat 59 -rw-r--r-- 1 root root 44370 Jul 9 08:25 2018_07_09_08_24_39-processlist 60 -rw-r--r-- 1 root root 4445 Jul 9 08:25 2018_07_09_08_24_39-innodbstatus2 61 -rw-r--r-- 1 root root 76282 Jul 9 08:25 2018_07_09_08_24_39-transactions 62 -rw-r--r-- 1 root root 5 Jul 9 08:25 2018_07_09_08_24_39-hostname 63 -rw-r--r-- 1 root root 387 Jul 9 08:25 2018_07_09_08_24_39-mutex-status2 64 -rw-r--r-- 1 root root 5280 Jul 9 08:25 2018_07_09_08_24_39-opentables2 65 -rw-r--r-- 1 root root 325 Jul 9 08:25 2018_07_09_08_24_39-vmstat-overall 66 -rw-r--r-- 1 root root 24763 Jul 9 08:25 2018_07_09_08_24_39-output 67 68 [root@zlm2 08:33:37 /var/lib/pt-stalk] 69 #cat 2018_07_09_08_24_39-trigger 70 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1 71 2018_07_09_08_24_39 pt-stalk ran with --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin= 72 73 //The trigger file records the options we have used. 74 75 [root@zlm2 08:34:17 /var/lib/pt-stalk] 76 #cat 2018_07_09_08_24_39-vmstat-overall 77 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- 78 r b swpd free buff cache si so bi bo in cs us sy id wa st 79 18 2 9692 60716 0 583316 0 0 11 139 92 597 4 1 96 0 0 80 4 0 10440 78828 0 569672 0 25 89 1965 1455 12046 80 20 0 0 0 81 82 //The vmstat-overall records the vmstat output. 83 //each file contains the relevant diagnostic data about what their name called.I'm not going to demonstrate all of them.
Take care of you remain disk space,pt-stalk won't run anymore if you're out of space.
1 [root@zlm2 08:25:56 /data/mysql/mysql3306] 2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect --cycles 1 --variable Threads_connected --threshold 2 --sleep 30 --demonize 3 Cannot open /tmp/pt-stalk.8940.FSboRq/po/daemonize: No space left on device at -e line 13, <$fh> chunk 274. 4 No long attribute in option spec /tmp/pt-stalk.8940.FSboRq/po/ask-pass 5 6 [root@zlm2 08:51:27 /data/mysql/mysql3306] 7 #df -h 8 ]Filesystem Size Used Avail Use% Mounted on 9 /dev/mapper/centos-root 8.4G 8.4G 20K 100% / //The root of linux has been out of space. 10 devtmpfs 488M 0 488M 0% /dev 11 tmpfs 497M 0 497M 0% /dev/shm 12 tmpfs 497M 6.6M 491M 2% /run 13 tmpfs 497M 0 497M 0% /sys/fs/cgroup 14 /dev/sda1 497M 118M 379M 24% /boot 15 none 87G 81G 5.9G 94% /vagrant
Summary
- pt-stalk is another tool in diagnosing system performance and similar with pt-pmp but not the same.
- pt-stalk will generate many statistic files which can really help you collecting information in almost every aspect.
- pt-stalk has four main triggers:gdb,oprofile,strace,tcpdump.It's flexible to collect data with diffrent dimensionality.
版权声明:本文为博主原创文章,如需转载请保留此声明及博客链接,谢谢!
博客地址: http://www.cnblogs.com/aaron8219 & http://blog.csdn.net/aaron8219
博客地址: http://www.cnblogs.com/aaron8219 & http://blog.csdn.net/aaron8219