SparkShell(sparkSql) on k8s
k8s上没有搭建zepplin,有时候想使用sparkshell/sparksql查一些数据不是很方便,尤其是数据量大的时候,下面描述一下在k8s上运行一个pod,然后在pod里面运行sparkshell/sparksql这样就可以方便查询数据。
(当然,如果你本机有固定的ip或可以使用花生壳之类的服务,就可以直接使用spark-shell/sparksql 的client模式到k8s上请求资源)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #step1 create a pod as spark-client cat <<EOF >spark-client.yaml apiVersion: v1 kind: Pod metadata: labels: run: spark-client name: spark-client spec: containers: - name: spark-client image: student2021 /spark :301p imagePullPolicy: Always securityContext: allowPrivilegeEscalation: false runAsUser: 0 command : - sh - -c - "exec tail -f /dev/null" restartPolicy: Never serviceAccount: spark EOF kubectl apply -n spark-job spark-client.yaml #step2 enter spark-client pod and run spark-shell or spark-sql kubectl -n spark-job exec -it spark-client sh export SPARK_USER=spark driver_host=$( cat /etc/hosts | grep spark-client| cut -f 1) echo $driver_host /opt/spark/bin/spark-shell --conf spark.jars.ivy= /tmp/ .ivy \ --master k8s: //localhost :18080 \ --deploy-mode client \ --conf spark.kubernetes.namespace=spark \ --conf spark.kubernetes.container.image=student2021 /spark :301p \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.driver.pod.name=spark-client \ --conf spark.executor.instances=4 \ --conf spark.executor.memory=4g \ --conf spark.driver.memory=4g \ --conf spark.driver.host=${driver_host} \ --conf spark.driver.port=14040 \ ###如果想使用headless service可以执行下面的操作 kubectl expose deployment spark-client --port=14040 -- type =ClusterIP --cluster-ip=None |
Looking for a job working at Home about MSBI
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步