【spark】 spark on k8s

准备:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

实测发现,server如果是最新的1.19以上的,会出现兼容问题,spark3.0才支持

构建镜像:

主要参考这篇:https://www.jianshu.com/p/da20133ecfea

步骤如下:

  1. 下载对应的spark版本,本案例以spark3.0 为例,下载地址:https://archive.apache.org/dist/spark/spark-3.0.0/,选择 spark-3.0.0-bin-hadoop2.7.tgz
  2. 解压,本案例解压后的文件夹为spark
  3. 将你编写的spark jar包放入到spark目录下,本案例直接使用已存在spark/examples/jars/spark-examples_2.12-3.0.0.jar
  4. 执行 ./bin/docker-image-tool.sh -r 【repourl】 -t 3.0.0 build,生成 repourl-spark:3.0.0镜像,python镜像构建: 

    bin/docker-image-tool.sh -r 【repourl】 -t 3.0.0 -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build

  5. docker push repourl-spark:3.0.0
  6. 执行spark-submit,其中你上传的jar所在位置在镜像中的 /opt/spark文件夹下

执行:

spark-submit \
--master k8s://https://k8sapi:6443 --deploy-mode cluster \
--name spark-pi --class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=repourl/spark:3.0.0 \
--conf spark.kubernetes.submission.waitAppCompletion=false \
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar

该操作默认使用default用户,在default用户没有足够权限的情况下,该任务最终会失败

授权用户

spark driver需要获取executor的创建、运行、watch等权限,需要配置对应用户权限。本例使用spark用户,执行命令如下:

kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

  

最终的执行命令

spark-submit \
--master k8s://https://k8sapiserver:6443 --deploy-mode cluster \
--name spark-pi --class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=repourl/spark:3.0.0 \
--conf spark.kubernetes.submission.waitAppCompletion=false \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar

 

 

授信:

spark在2.2之后开始支持k8s,3.0正式支持k8s,所以有很多功能其实阉割了。

最快捷的解决方案写在最下面

在k8s中运行spark最复杂的问题,我认为就是授信问题。关于授信问题,spark2.4的官方文档写的很不好,官方文档地址: http://spark.apache.org/docs/2.4.1/running-on-kubernetes.html

在授信问题上提供了很多参数如: spark.kubernetes.authenticate.submission.caCertFile,spark.kubernetes.authenticate.submission.clientKeyFile,配置这些证书过程很复杂

  • 异常:
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

这个是因为自签证书不受信任,需要将ca.pem导入keystore。参考https://blog.csdn.net/ljskr/article/details/84570573

  • 异常
2020/10/15 11:09:49.147 WARN WatchConnectionManager : Exec Failure: HTTP 403, Status: 403 - pods "spark-pi-1602731387162-driver" is forbidden: User "system:anonymous" cannot watch resource "pods" in API group "" in the namespace "default"

这是客户端不受信任。解决方案是spark-submit添加参数

--conf spark.kubernetes.authenticate.submission.clientKeyFile=/root/admin-key.pem
--conf spark.kubernetes.authenticate.submission.clientCertFile=/root/admin.pem

或者执行:

kubectl create clusterrolebinding test:anonymous --clusterrole=cluster-admin --user=system:anonymous

这里的证书有一个配置错误都会出问题

 

所以最佳配置方案是:

将.kube文件拷贝到$HOME目录下

原理是:

spark使用的是io.fabric8库,虽然spark提供了一堆的参数,但是该库默认还是会寻找~/.kube/config文件。

代码逻辑: 

https://github.com/apache/spark/blob/branch-2.4/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala#L66

https://github.com/fabric8io/kubernetes-client/blob/74cc63df9b6333d083ee24a6ff3455eaad0a6da8/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L538

而这里的授信认证仅仅在spark-submit时生效。当命令提交后,生成driver pod之后,授信文件pem就失去职能了

driver阶段授信:

spark on k8s大致的流程是driver pod去创建和销毁executor pod,所以driver pod的权限需要很大才行。这里需要配置RBAC,参考 http://spark.apache.org/docs/2.4.1/running-on-kubernetes.html#rbac

配置完成之后,需要告知driver以什么身份执行,因此还需要配置参数:

--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark

最后

k8s client和server版本兼容很烂

k8s server1.15需要搭载的client版本大概是4.6,而spark2.4.7才更新这个包版本

参考: https://www.waitingforcode.com/apache-spark/setting-up-apache-spark-kubernetes-microk8s/read#unknownhostexception_kubernetes_default_svc_try_again

因此测试时格外要注意client和server版本问题

最后发布命令如下:

/Users/zhouwenyang/Desktop/tmp/spark/bin/spark-submit \
--master k8s://https://knode1:6443 --deploy-mode cluster \
--name spark-pi --class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=vm2173:5000/spark:2.4.7 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.submission.waitAppCompletion=false \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.7.jar

 spark官方配置: https://spark.apache.org/docs/3.0.0/running-on-kubernetes.html#rbac

问题记录

  • 将文件广播到每个节点

sparkContext.addFile() 该方法在standalone和yarn集群都是可以将文件进行广播的,该方法的注释:

Add a file to be downloaded with this Spark job on every node.
If a file is added during execution, it will not be available until the next TaskSet starts.
Params:
path – can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, use SparkFiles.get(fileName) to find its download location.
Note:
A path can be added only once. Subsequent additions of the same path are ignored.

按照注释解释,应该是可以将文件广播到每个节点,但实际测试并没有效果。我怀疑是一个bug,【未完待续】

posted @ 2021-07-28 15:29  周周周文阳  阅读(792)  评论(0编辑  收藏  举报