Spark on K8S - Client Mode

配置 spark 用户

apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: spark-role
rules:
- apiGroups: [""]
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-role-binding
  namespace: default
subjects:
- kind: ServiceAccount
  name: spark
  namespace: default
roleRef:
  kind: Role
  name: spark-role
  apiGroup: rbac.authorization.k8s.io

配置 spark 容器,会在这个容器里以 client 模式 submit spark 程序,所以这个容器也会作为 driver

apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spark-client
      component: spark-client
  template:
    metadata:
      labels:
        app: spark-client
        component: spark-client
    spec:
      containers:
      - name: spark-client
        image: spark-py:2.4.6
        workingDir: /opt/spark
        command: ["/bin/bash", "-c", "while true;do echo hello;sleep 6000;done"]
      serviceAccountName: spark

配置 service,使得 spark executor 可以连接上 spark driver,任意端口都可以

apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: spark-client-service
spec:
  selector:
    app: spark-client
  ports:
    - protocol: TCP
      port: 7321
      targetPort: 7321
  clusterIP: None

登陆 spark 容器,以 client 模式提交 spark,指定 spark.driver.host 和 spark.driver.port

bin/spark-submit \
    --master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS} \
    --deploy-mode client \
    --name spark-test \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=spark-py:2.4.6 \
    --conf spark.driver.host=spark-client-service \
    --conf spark.driver.port=7321 \
    /opt/spark/examples/src/main/python/wordcount.py \
    /opt/spark/examples/src/main/python/wordcount.py
posted @   moon~light  阅读(712)  评论(4编辑  收藏  举报
编辑推荐:
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
点击右上角即可分享
微信分享提示