python监控进程并重启
最近公司的游戏服务器经常掉线,老板只能让员工不定时登陆服务器看死掉没有,都快成机器人了,因此python自动化监测进程运用脚本就产生了。
分析了具体思路:
1.做个线程定时器,每隔20s执行系统命令查询指定进程名称是否存在
2.如果不存在,就重启;不存在就不进行后续的操作。
相关代码很简单:
def restart_process(process_name): red = subprocess.Popen('tasklist', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) tasklist_str = red.stdout.read().decode(encoding='gbk') re_path = process_name.split("\\")[-1] formattime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') if re_path not in tasklist_str: # obj = connect_emai() # sendmail('程序卡掉正在重启。。。', obj) # 发送HTTP请求 # url = "http://159.138.131.148/server_offline.html" # request = urllib.request(url) global count count += 1 print(formattime + '第' + str(count) + '次检测发现异常重连') cmd = process_name os.system(process_name) # res = subprocess.Popen(cmd,stdout=subprocess.PIPE, stderr=subprocess.PIPE,shell=True) # print(res.stderr.read().decode(encoding='gbk'),res.stdout.read().decode(encoding='gbk')) # sendmail('重启连接成功!',obj) print('yes,connected') else: global error_count error_count += 1 print(formattime + '第' + str(error_count) + '次检测正在运行中') global timer timer = Timer(20, restart_process, ("start C:\Progra~1\CloudControlServer\CloudControlServer.exe",)) timer.start() count = 0 error_count = 0 timer = Timer(20, restart_process, ("start C:\Progra~1\CloudControlServer\CloudControlServer.exe",)) timer.start()
搞定!!!
接下来有了新的需求~~ 需要监控CPU的运行状态,如果CPU一直维持在80%以上 就主动杀死进程,并重启进程,使用了牛逼的psutil 跨系统平台操作库。实现代码如下:
def look_cpu(process_name): res = subprocess.Popen('wmic cpu get LoadPercentage', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) res_str = res.stdout.read().decode(encoding='gbk') num = re.findall('\d+', res_str)[0] if int(num) > 80: print('cup负载超过10%') time.sleep(10) res_twice = subprocess.Popen('wmic cpu get LoadPercentage', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) res_twice_str = res_twice.stdout.read().decode(encoding='gbk') num_twice = re.findall('\d+', res_twice_str)[0] # 判断两次监测稳定在5%以内 杀死进程并重启 if abs(int(num) - int(num_twice)) < 5: tasklist = subprocess.Popen('tasklist | findstr CloudControlServer.exe', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) res = tasklist.stdout.read().decode(encoding='gbk') pid = re.search('\d{1,4}', res).group() cmd = 'taskkill -f /pid %s' % pid time.sleep(0.5) print(cmd) os.system('taskkill -f /pid %s' % pid) os.system(process_name) print('正在监测cpu,cpu占用率:%s' % num) global timer timer = Timer(30, look_cpu, ("start C:\Progra~1\CloudControlServer\CloudControlServer.exe",)) timer.start()
但是第三天老板有了新的需求,需要做个web端 将CPU和内存信息开放api 并且支持远程重启,我的思路是利用python自带的http服务类库,省去了socket编程的麻烦,直接输入IP port 即可,这里使用了wsgiref.simple_server
# web服务应用函数 def application(environ, start_response): path = environ.get('PATH_INFO') start_response('200 OK', []) # 提供cpu 状态信息 if path == '/cpu': res = subprocess.Popen('wmic cpu get LoadPercentage', stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) res_str = res.stdout.read().decode(encoding='gbk') resp = {'cpu': re.findall('\d+', res_str)[0]} return [json.dumps(resp).encode(encoding='utf-8')] # 提供cpu + memory 信息 elif path == '/state': cpu = psutil.cpu_percent() memory = psutil.virtual_memory() memory_lv = float(memory.used) / float(memory.total) * 100 res = {'cpu': cpu, 'memory': memory_lv} return [json.dumps(res).encode(encoding='utf-8')] # 提供重启进程api elif path == '/restart_process': # os.system('shutdowm.exe -r') res = remote_restart_process("start C:\Progra~1\CloudControlServer\CloudControlServer.exe") return [b'success'] # 启动web服务器提供api .port=8060 httpserver = make_server('', 8060, application) httpserver.serve_forever() ''' 三个api接口: ip:8060/cpu cpu信息 ip:8060/state cpu+memory状态 ip:8060/restart_process 重启进程 '''