记/usr/bin/pt-stalk 的一个小bug
由于监控到一台mysql服务器每天凌晨4点网络流量突然很高,于是想抓取那个是几点的状态数据,看看到底在做什么,找了下资料,发现pt-stalk是个不错的选择,pt-stalk功能很强大,具体用法可以man pt-stalk
下面举例说明bug
版本:
percona-toolkit-2.2.14-1.noarch
bug描述:
在前台执行
/usr/bin/pt-stalk --user=dba --password=xxx --no-stalk --run-time 60 --iterations 1 --daemonize &>/var/log/ptstalk.log
能成功执行,并且mysql的路径是/usr/local/bin/mysql
但是放到crontab下执行:
# crontab -l
21 19 * * * /usr/bin/pt-stalk --user=dba --password=xxx --no-stalk --run-time 60 --iterations 1 --daemonize &>/var/log/ptstalk.log
会报错:提示mysql命令找不到请检查路径
提示1482行 Cannot execute mysql. Check that it is in PATH.
打开脚本找到1482行发现是这样的:
# Check that mysql and mysqladmin are in PATH. If not, we're
# already dead in the water, so don't bother with cmd line opts,
# just error and exit.
[ -n "$(mysql --help)" ] \
|| die "Cannot execute mysql. Check that it is in PATH."
[ -n "$(mysqladmin --help)" ] \
|| die "Cannot execute mysqladmin. Check that it is in PATH."
# Now that we have the cmd line opts, check that we can actually
# connect to MySQL.
[ -n "$(mysql $EXT_ARGV -e 'SELECT 1')" ] \
|| die "Cannot connect to MySQL. Check that MySQL is running and that the options after -- are correct."
在check的前面 which mysql,退出分别看手动执行命令和crontab执行的输出日志情况:
1、手动执行/usr/bin/pt-stalk 输出的路径是:/usr/local/bin/mysql
2、crontab下执行命令,输出的信息:which: no mysql in (/usr/bin:/bin)
可见是crontab下执行命令环境变量和手动执行不一样
解决方法:
一,做软连接
ln -fs /usr/local/mysql/bin/* /usr/bin/
二、修改脚本执行修改mysql指定绝对路径
# Check that mysql and mysqladmin are in PATH. If not, we're
# already dead in the water, so don't bother with cmd line opts,
# just error and exit.
[ -n "$(/usr/local/bin/mysql --help)" ] \
|| die "Cannot execute mysql. Check that it is in PATH."
[ -n "$(/usr/local/bin/mysqladmin --help)" ] \
|| die "Cannot execute mysqladmin. Check that it is in PATH."
# Now that we have the cmd line opts, check that we can actually
# connect to MySQL.
[ -n "$(/usr/local/bin/mysql $EXT_ARGV -e 'SELECT 1')" ] \
|| die "Cannot connect to MySQL. Check that MySQL is running and that the options after -- are correct."
发现如果用方法二只能是避免执行报错,能正常执行下去,但是输出信息不完整,由于环境变量的问题,脚本里面有很多命令执行不了,看脚本定义的变量:
set -u
CMD_GDB="${CMD_GDB:-"$(_which gdb)"}"
CMD_IOSTAT="${CMD_IOSTAT:-"$(_which iostat)"}"
CMD_MPSTAT="${CMD_MPSTAT:-"$(_which mpstat)"}"
CMD_MYSQL="${CMD_MYSQL:-"$(_which mysql)"}"
CMD_MYSQLADMIN="${CMD_MYSQLADMIN:-"$(_which mysqladmin)"}"
CMD_OPCONTROL="${CMD_OPCONTROL:-"$(_which opcontrol)"}"
CMD_OPREPORT="${CMD_OPREPORT:-"$(_which opreport)"}"
CMD_PMAP="${CMD_PMAP:-"$(_which pmap)"}"
CMD_STRACE="${CMD_STRACE:-"$(_which strace)"}"
CMD_SYSCTL="${CMD_SYSCTL:-"$(_which sysctl)"}"
CMD_TCPDUMP="${CMD_TCPDUMP:-"$(_which tcpdump)"}"
CMD_VMSTAT="${CMD_VMSTAT:-"$(_which vmstat)"}"
CMD_DMESG="${CMD_DMESG:-"$(_which dmesg)"}"
所以,最好的办法就是做软连接了
ln -fs /usr/local/mysql/bin/* /usr/bin/
三,还有一点要保证命令执行的权限足够 不够有些命令也是运行不了的
--user --password
执行完后检查默认目录下 /var/lib/pt-stalk/2015_09_25_09_44_01-output 这个文件,里面会有详细的执行信息
四、命令执行完后生成下面这些信息,可以根据当时的情况定位问题了
2015_09_25_09_44_01-df
2015_09_25_09_44_01-disk-space
2015_09_25_09_44_01-diskstats
2015_09_25_09_44_01-hostname
2015_09_25_09_44_01-innodbstatus1
2015_09_25_09_44_01-innodbstatus2
2015_09_25_09_44_01-interrupts
2015_09_25_09_44_01-iostat
2015_09_25_09_44_01-iostat-overall
2015_09_25_09_44_01-lsof
2015_09_25_09_44_01-meminfo
2015_09_25_09_44_01-mpstat
2015_09_25_09_44_01-mpstat-overall
2015_09_25_09_44_01-mutex-status1
2015_09_25_09_44_01-mutex-status2
2015_09_25_09_44_01-mysqladmin
2015_09_25_09_44_01-netstat
2015_09_25_09_44_01-netstat_s
2015_09_25_09_44_01-opentables1
2015_09_25_09_44_01-opentables2
2015_09_25_09_44_01-output
2015_09_25_09_44_01-pmap
2015_09_25_09_44_01-processlist
2015_09_25_09_44_01-procstat
2015_09_25_09_44_01-procvmstat
2015_09_25_09_44_01-ps
2015_09_25_09_44_01-slabinfo
2015_09_25_09_44_01-sysctl
2015_09_25_09_44_01-top
2015_09_25_09_44_01-transactions
2015_09_25_09_44_01-trigger
2015_09_25_09_44_01-variables
2015_09_25_09_44_01-vmstat
2015_09_25_09_44_01-vmstat-overall