重启CDH的方法以及问题解决

重启CDH的方法如下:

service cloudera-scm-server-db restart
service cloudera-scm-server restart
service cloudera-scm-agent restart(这个还需要在每个slave上执行)

 

 

启动服务cloudera-scm-server时会遇到过一段时间自己挂掉,并返回cloudera-scm-server dead but pid file exists的问题。

以下为根源在cloudera-scm-server-db没有正常启动的情况。

 

【过程】

cloudera-scm-server启动后过一段时间自己挂掉

[html] view plain copy
[root@gyvm-4 data]# service cloudera-scm-server start  
Starting cloudera-scm-server:                              [  OK  ]  
[root@gyvm-4 data]#   
[root@gyvm-4 data]# service cloudera-scm-server status  
cloudera-scm-server (pid  60761) is running...  
[root@gyvm-4 data]# service cloudera-scm-server status  
cloudera-scm-server (pid  60761) is running...  
[root@gyvm-4 data]# service cloudera-scm-server status  
cloudera-scm-server (pid  60761) is running...  
[root@gyvm-4 data]# service cloudera-scm-server status  
cloudera-scm-server dead but pid file exists  
这时候想要完整重启cloudera-scm server-db/server

发现cloudera-scm-server-db无法重启

[html] view plain copy
[root@gyvm-4 data]# service cloudera-scm-server-db stop  
waiting for server to shut down............................................................... failed  
pg_ctl: server does not shut down  

无法停止server-db的原因是残留了一个pid文件,status显示不正确,删除该文件,通过status查看,server-db其实已经停止了。

[html] view plain copy
[root@gyvm-4 data]# cd /var/lib/cloudera-scm-server-db/data  
[root@gyvm-4 data]# service cloudera-scm-server-db status  
pg_ctl: server is running (PID: 17378)  
/usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"  
[root@gyvm-4 data]# rm postmaster.pid  
rm: remove regular file `postmaster.pid'? y  
[root@gyvm-4 data]# service cloudera-scm-server-db status  
pg_ctl: no server running  

此时启动server-db,失败

[html] view plain copy
[root@gyvm-4 data]# service cloudera-scm-server-db start  
DB initialization done.  
waiting for server to start...............................................................could not start server  

查看log,tcp/ip端口7432 被占用

[html] view plain copy
[root@gyvm-4 cloudera-scm-server]# tail db.log   
LOG:  could not bind IPv4 socket: Address already in use  
HINT:  Is another postmaster already running on port 7432? If not, wait a few seconds and retry.  
LOG:  could not bind IPv6 socket: Address already in use  
HINT:  Is another postmaster already running on port 7432? If not, wait a few seconds and retry.  
WARNING:  could not create listen socket for "*"  
FATAL:  could not create any TCP/IP sockets  

杀掉占用该端口的进程

[html] view plain copy
[root@gyvm-4 cloudera-scm-server]# netstat -ntp | grep 7432  
tcp        0      0 192.168.1.17:7432           192.168.1.17:49784          ESTABLISHED 37118/postgres        
tcp        0      0 192.168.1.17:7432           192.168.1.8:35818           ESTABLISHED 36807/postgres        
tcp        0      0 192.168.1.17:7432           192.168.1.17:49779          ESTABLISHED 37060/postgres        
tcp        0      0 192.168.1.17:49783          192.168.1.17:7432           ESTABLISHED 36306/java            
tcp        0      0 192.168.1.17:7432           192.168.1.8:35813           ESTABLISHED 36778/postgres        
tcp        0      0 192.168.1.17:49779          192.168.1.17:7432           ESTABLISHED 36306/java            
tcp        0      0 192.168.1.17:49784          192.168.1.17:7432           ESTABLISHED 36306/java            
tcp        0      0 192.168.1.17:49778          192.168.1.17:7432           ESTABLISHED 36306/java            
tcp        0      0 192.168.1.17:7432           192.168.1.17:49778          ESTABLISHED 37059/postgres        
tcp        0      0 192.168.1.17:7432           192.168.1.8:35814           ESTABLISHED 36779/postgres        
tcp        0      0 192.168.1.17:7432           192.168.1.8:35817           ESTABLISHED 36804/postgres        
tcp        0      0 192.168.1.17:7432           192.168.1.17:49783          ESTABLISHED 37117/postgres        
[root@gyvm-4 cloudera-scm-server]# kill -9 37118  

再次开启server-db,成功,启动server,成功。

[html] view plain copy
[root@gyvm-4 data]# service cloudera-scm-server-db start  
DB initialization done.  
waiting for server to start.... done  
server started  
  
[root@gyvm-4 data]# service cloudera-scm-server start  
Starting cloudera-scm-server:                              [  OK  ]  

此时,cloudera管理界面可以正常访问。

 

【结论】

究其原因,是cloudera-server-db没有正常启动,但是残留了pid文件postmaster.pid。

所以查看cloudera-server-db状态时,显示有误,返回cloudera-server-db是启动的状态。

在此基础上,每次启动cloudera-server就会失败。

而cloudera-server-db启动失败的原因是该服务需要的端口号被占用。

 

posted @ 2020-04-14 20:49  Simon92  阅读(780)  评论(0编辑  收藏  举报