脚本启动和停止Storm集群

每次都要手动到各台机子启动Storm集群,网上找了下好像没有类似hadoop start-all的脚本,还是自己写个。。

这里涉及shell远程执行,如果没有配置ssh免密码登陆,要用到expect自动交互脚本,使用方法可以参考这里。如果配置了ssh免密码登陆,ssh远程方法可以参考这里

停止:

先说停止,停止比启动复杂点。对于停止,我们要ssh到各台机子,然后把线程kill掉。利用jps | grep supervisor可以得到supervisor的线程,之后用awk文本分析工具可以提取线程的id。awk介绍看这里

写脚本时可能出现“warning here-document at line 36 delimited by end-of-file (wanted eof')”的问题,解决方法是删除第二个EOF前后的空格。

ssh远程到各个节点后,脚本中的变量由于是本地的,所以远程节点会访问不了。这里涉及到远程变量和本地变量的操作问题,一开始一直不知道为什么程序不行,研究了一下午。。。

ssh时,远程变量的问题
   1.设置临时变量。
ssh hadoop@192.168.178.93 'my_var=/tmp/hequn_script.txt; cat $my_var;my_var=123;echo $my_var' 
    如果设置成下面双引号的话,临时变量传递时就会被篡改。
”my_var=/tmp/hequn_script.txt; cat $my_var;my_var=123;echo $my_var”
   2.将变量返回。
proc_id=$(ssh -T hadoop@192.168.178.$COUNTER 'cat /tmp/hequn_script.txt')

停止Storm集群的脚本如下(最后几行是spark和hadoop集群的停止):

#!/bin/bash

#########################stop Storm##########################
##kill core
proc=core
proc_id=`jps |grep $proc|awk ' {print $1}'`

if [ -n "$proc_id" ]
then 
    kill -9 $proc_id
    echo "kill core!"
else 
        echo "no core to be killed!"
fi

##kill nimbus
proc=nimbus
proc_id=`jps |grep $proc|awk ' {print $1}'`

if [ -n "$proc_id" ]
then 
    kill -9 $proc_id
    echo "kill nimbus!"
else 
        echo "no nimbus to be killed!"
fi


##kill zookeeper
COUNTER=91
proc=QuorumPeerMain
while [ $COUNTER -le 93 ]; do
    let COUNTER=COUNTER+1
        #1)get proc_id
    ssh -T hadoop@192.168.178.$COUNTER  <<eeooff 
        jps | grep $proc | awk '{print \$1}' > /tmp/hequn_script.txt
eeooff
    proc_id=$(ssh -T hadoop@192.168.178.$COUNTER 'cat /tmp/hequn_script.txt')

    #2)kill proc based on proc_id    
    if [ -n "$proc_id" ]; then
            ssh -T hadoop@192.168.178.$COUNTER <<eeooff 
             kill -9 $proc_id
eeooff
            echo "kill 192.168.178.$COUNTER's QuorumPeerMain!"
        else 
                echo "192.168.178.$COUNTER has no QuorumPeerMain to be killed!"
     fi

done 


##kill supervisors
COUNTER=92
proc=supervisor
while [ $COUNTER -le 101 ]; do 
    let COUNTER=COUNTER+1
    if [ $COUNTER -ge 96 -a $COUNTER -le 100 ]; then
        continue
    fi
    
    #1)get proc_id
    ssh -T hadoop@192.168.178.$COUNTER  <<eeooff 
        jps | grep $proc | awk '{print \$1}' > /tmp/hequn_script.txt
eeooff
    proc_id=$(ssh -T hadoop@192.168.178.$COUNTER 'cat /tmp/hequn_script.txt')

    #2)kill proc based on proc_id    
    if [ -n "$proc_id" ]; then
            ssh -T hadoop@192.168.178.$COUNTER <<eeooff 
             kill -9 $proc_id
eeooff
            echo "kill 192.168.178.$COUNTER's supervisor!"
        else 
                echo "192.168.178.$COUNTER has no supervisor to be killed!"
     fi
done 


exit
View Code

 启动:

#!/bin/bash

#########################start Storm##########################
##start zookeeper
echo "=======Start Storm!======="
cd /home/hadoop
usr/zookeeper/bin/zkServer.sh start
ssh -T hadoop@192.168.178.93 'usr/zookeeper/bin/zkServer.sh start'
ssh -T hadoop@192.168.178.94 'usr/zookeeper/bin/zkServer.sh start'

usr/zookeeper/bin/zkServer.sh status
ssh -T hadoop@192.168.178.93 'usr/zookeeper/bin/zkServer.sh status'
ssh -T hadoop@192.168.178.94 'usr/zookeeper/bin/zkServer.sh status'


##start nimbus
cd /home/hadoop
nohup usr/storm/bin/storm nimbus > /dev/null 2>&1 &
echo "Start nimbus!"

##start ui
cd /home/hadoop
nohup  usr/storm/bin/storm ui > /dev/null 2>&1 &
echo "Start UI!"


##start supervisors
cd /home/hadoop
COUNTER=92
while [ $COUNTER -le 101 ]; do 
    let COUNTER=COUNTER+1
    if [ $COUNTER -ge 96 -a $COUNTER -le 100 ]; then
        continue
    fi
    
    ssh -T hadoop@192.168.178.$COUNTER 'nohup usr/storm/bin/storm supervisor > /dev/null 2>&1 &'
    echo "192.168.178.$COUNTER's supervisor has been started!" 
done

echo "=======Storm has been started!======="
exit
View Code

 

posted on 2014-11-06 13:23  hequn8128  阅读(1067)  评论(0编辑  收藏  举报

导航