超时程序管理
遇到一个新的场景:
从两个线上tt拷贝数据到两个线下tt,按照前面写法,两个拷贝的进程是串行执行的,这不满足业务的需求,任何处理呢?
最开始考虑cmd1;cmd2同时提交,但这种方式依然是串行执行,怎么解呢?
最直接的方式和就是两个命令后台执行cmd1&;cmd2&,然后监控进程运行时间,超时就去kill进程,那么如何精确获取进程pid呢?
调研了好久,最后给出解决方案:
def waitPid(name, timeOut):
for line in psutil.process_iter():#获取当前进程list
lists= line.cmdline() #获取进程的命令行,类型为list
Contains="true"
for key in name:#name为list,起进程时所用变量
if key not in lists:
Contains="false"
break
if Contains=="true":
print "pid:", line.pid
time.sleep(timeOut)
if line in psutil.process_iter(): #需要判断进程是否还存在,避免不存在的进程kill时抛异常
cmd = "kill -9 %s"%line.pid #kill进程
print "cmd:%s"%cmd
Execute(cmd)
具体使用:
namelist=["com.aliyun.timetunnel.demo.TTReadAndWrite", "member_cart", "0324153555NBJ3YZX0", "e62375c5-6bcb-4bde-900b-0a38c2f6b218", "1490842758", "10000", "member_cart_blink_mufeng", "5fa49387-e8da-461f-a835-abf03f9b9d4c"]
waitPid(namelist,1)#先kill掉相同任务的进程,否则可能新提交的进程处于sleep状态无法执行
cmd = "java -cp utils/blink_test-1.0-shaded.jar com.aliyun.timetunnel.demo.TTReadAndWrite member_cart 0324153555NBJ3YZX0 e62375c5-6bcb-4bde-900b-0a38c2f6b218 1490842758 10000 member_cart_blink_mufeng 5fa49387-e8da-461f-a835-abf03f9b9d4c >aa.log 2>&1 &"
os.system(cmd)
namelist=["com.aliyun.timetunnel.demo.TTReadAndWrite", "member_cart", "0324153555NBJ3YZX0", "e62375c5-6bcb-4bde-900b-0a38c2f6b218", "1490842758", "10000", "member_cart_blink_mufeng", "5fa49387-e8da-461f-a835-abf03f9b9d4c"]
waitPid(namelist,timeOut)
工作中常常遇到这种场景:
例如导数据时上游无数据,任务夯住;执行某个命令时命令夯住等,此时需要加超时判断,如果超过指定时间就退出程序,那怎么写呢?
shell写法:
python写法:
代码如下:
function timeout()
{
command=$*
#echo "command:" $command
waitfor=300
( $command ; echo "success" ) &
commandpid=$!
( sleep $waitfor ; kill -9 $commandpid > /dev/null 2>&1 ) &
watchdog=$!
sleeppid=$PPID
wait $commandpid > /dev/null 2>&1
kill $sleeppid > /dev/null 2>&1
}
python写法:
import time
from subprocess import Popen, PIPE
import commands
import datetime
def sys_command_outstatuserr(cmd, timeout=120):
#print "cmd:%s"%cmd
p = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True)
t_beginning = time.time()
print "t_begging:%s"%t_beginning
seconds_passed = 0
while True:
if p.poll() is not None:
res = p.communicate()
exitcode = p.poll() if p.poll() else 0
#print "out:%s, exitcode:%d, err:%s"%(res[0], exitcode, res[1])
return res[0], exitcode, res[1]
seconds_passed = time.time() - t_beginning
print "seconds_passed:%s"%seconds_passed
if timeout and seconds_passed > timeout:
p.terminate()
out, exitcode, err = '', 128, 'Timeout'
#print "out:%s, exitcode:%d, err:%s"%(out, exitcode, err)
return out, exitcode, err
time.sleep(10)