Spark启动流程(Standalone)-分析
1、start-all.sh脚本,实际上执行java -cp Master 和 java -cp Worker
2、Master 启动时首先穿件一个RpcEnv对象,负责管理所有通信逻辑
3、Master 通信RpcEnv对象创建一个Endpoint,Master就是一个Endpoint,Worker可以与其进行通信
4、Worker启动时也是创建一个RpcEnv对象
5、Worker通过RpcEnv对象创建一个Endpoint
6、Worker 通过RpcEnv对象建立到Master的连接 ,获取到一个RpcEndpointRef对象,通过该对象可以与Master通信
7、Worker向Master注册,注册内容包括主机名、端口、CPU core数量、内存数量
8、Master接收到worker的注册,将注册信息维护在内存中的table中,其中还包含了一个到worker的RpcEndpointRef对象引用
9、Master回复Worker已经接收到注册,告知Worker已经注册成功
10、Worker端收到成功注册相应后,开始周期性向Master发送心跳
1、start-master.sh Master 启动脚本分析
start-master.sh
1 //读取SPARK_HOME 2 if [ -z "${SPARK_HOME}" ]; then 3 export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" 4 fi 5 6 # NOTE: This exact class name is matched downstream by SparkSubmit. 7 # Any changes need to be reflected there. 8 CLASS="org.apache.spark.deploy.master.Master" 9 10 if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then 11 echo "Usage: ./sbin/start-master.sh [options]" 12 pattern="Usage:" 13 pattern+="\|Using Spark's default log4j profile:" 14 pattern+="\|Registered signal handlers for" 15 16 "${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2 17 exit 1 18 fi 19 20 ORIGINAL_ARGS="$@" 21 22 . "${SPARK_HOME}/sbin/spark-config.sh" 23 24 . "${SPARK_HOME}/bin/load-spark-env.sh" 25 26 if [ "$SPARK_MASTER_PORT" = "" ]; then 27 SPARK_MASTER_PORT=7077 28 fi 29 30 if [ "$SPARK_MASTER_HOST" = "" ]; then 31 case `uname` in 32 (SunOS) 33 SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`" 34 ;; 35 (*) 36 SPARK_MASTER_HOST="`hostname -f`" 37 ;; 38 esac 39 fi 40 41 if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then 42 SPARK_MASTER_WEBUI_PORT=8080 43 fi 44 //调用spark-daemon.sh执行 45 "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \ 46 --host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \ 47 $ORIGINAL_ARGS
spark-daemon.sh
1 ... 2 3 execute_command() { 4 if [ -z ${SPARK_NO_DAEMONIZE+set} ]; then 5 # 最终以后台守护进程的方式启动 Master 6 nohup -- "$@" >> $log 2>&1 < /dev/null & 7 newpid="$!" 8 9 echo "$newpid" > "$pid" 10 11 # Poll for up to 5 seconds for the java process to start 12 for i in {1..10} 13 do 14 if [[ $(ps -p "$newpid" -o comm=) =~ "java" ]]; then 15 break 16 fi 17 sleep 0.5 18 done 19 20 sleep 2 21 # Check if the process has died; in that case we'll tail the log so the user can see 22 if [[ ! $(ps -p "$newpid" -o comm=) =~ "java" ]]; then 23 echo "failed to launch: $@" 24 tail -2 "$log" | sed 's/^/ /' 25 echo "full log in $log" 26 fi 27 else 28 "$@" 29 fi 30 } 31 32 ...
启动类
1 /opt/module/spark-standalone/bin/spark-class org.apache.spark.deploy.master.Master 2 --host hadoop201 3 --port 7077 4 --webui-port 8080
bin/spark-class启动命令:
1 /opt/module/jdk1.8.0_172/bin/java 2 -cp /opt/module/spark-standalone/conf/:/opt/module/spark-standalone/jars/* 3 -Xmx1g org.apache.spark.deploy.master.Master 4 --host hadoop201 5 --port 7077 6 --webui-port 8080