SparkShell(sparkSql) on k8s

k8s上没有搭建zepplin,有时候想使用sparkshell/sparksql查一些数据不是很方便,尤其是数据量大的时候,下面描述一下在k8s上运行一个pod,然后在pod里面运行sparkshell/sparksql这样就可以方便查询数据。

(当然,如果你本机有固定的ip或可以使用花生壳之类的服务,就可以直接使用spark-shell/sparksql 的client模式到k8s上请求资源)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#step1 create a pod as spark-client
cat <<EOF >spark-client.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: spark-client
  name: spark-client
spec:
  containers:
  - name: spark-client
    image: student2021/spark:301p
    imagePullPolicy: Always
    securityContext:
        allowPrivilegeEscalation: false
        runAsUser: 0
    command:
      - sh
      - -c
      - "exec tail -f /dev/null"
  restartPolicy: Never
  serviceAccount: spark
EOF
kubectl apply -n spark-job spark-client.yaml
#step2 enter spark-client pod and run spark-shell or spark-sql
kubectl -n spark-job exec -it spark-client sh
export SPARK_USER=spark
driver_host=$(cat /etc/hosts|grep spark-client|cut -f 1)
echo $driver_host
/opt/spark/bin/spark-shell --conf spark.jars.ivy=/tmp/.ivy \
--master k8s://localhost:18080 \
--deploy-mode client  \
--conf spark.kubernetes.namespace=spark  \
--conf spark.kubernetes.container.image=student2021/spark:301p \
--conf spark.kubernetes.container.image.pullPolicy=Always  \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark  \
--conf spark.kubernetes.driver.pod.name=spark-client  \
--conf spark.executor.instances=4  \
--conf spark.executor.memory=4g \
--conf spark.driver.memory=4g \
--conf spark.driver.host=${driver_host} \
--conf spark.driver.port=14040  \
###如果想使用headless service可以执行下面的操作
kubectl expose deployment spark-client --port=14040 --type=ClusterIP --cluster-ip=None

 

posted on   tneduts  阅读(650)  评论(0编辑  收藏  举报

努力加载评论中...

导航

点击右上角即可分享
微信分享提示