cloudera-scm-agent dead but pid file exists

问题一:

错误描述:

/opt/cm-5.7.0/etc/init.d/cloudera-scm-agent status

cloudera-scm-agent dead but pid file exists

 查看日志/opt/cm-5.7.0/log/cloudera-scm-agent/cloudera-scm-agent.log:

No socket could be created on ('testintf.novalocal', 9000) -- [Errno 99] Cannot assign requested address

 

此问题主要是网络问题

1. 

python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'命令获取/etc/hosts文件中的IP和hostname

正规hosts如下:
127.0.0.1 localhost.xxxx localhost
111.222.333.444 aa.aa  aa
555.666.777.888 bb.bb   bb

上述命令获取结果为 :111.111.111.111 aa.aa

此IP和ifocnfig中获取的IP相同,(有公网和内网的同学,请自觉选择内网ip)
hostname和hostname命令获取的名称一样。


2. 同在一个内网的几台服务器之间是相互通信的,但是使用公网IP就不可以了,所以CDH集群中需要大量的端口通信,所以在设置ocnfig.ini文件中的server_host时,选择内网IP。




 

 

 

问题二:cm界面安装时,agent服务不起,所在服务器不受管。导致后面agent时界面安装的。在界面安装中出现以下错误提示:

 解决办法:

1.

python -c 'import socket; print socket.getfqdn(), socket.gethostbyname(socket.getfqdn())'

  和

hostname

  两种方式得出的主机名不同造成的。

 

 

 

2.   Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).

   telnet  112.35.23.45 7182

   ps -ef |grep PID?  

 

 

3.  Ensure that ports 9000 and 9001 are free on the host being added.

  netstat|grep 9000

  netstat |grep 9001

 

4.  Check agent logs in /var/log/cloudera-scm-agent/ on the host being added (some of the logs can be found in the installation details)

这个目录时agent服务起来之后才有的,如果agent 启动失败,则不会有。

 

 

问题三:

[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2161, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/agent.py", line 2183, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
verbose=self.__verbose
File "/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/xmlrpc.py", line 470, in request
'' )
ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
[22/Oct/2018 18:49:13 +0000] 3131 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit

解决办法:

kill 掉supervisored的进程,重启,多试几次就好了。

 

posted @ 2018-10-19 12:25  duaner92  阅读(2018)  评论(0编辑  收藏  举报