Kubernetes——Job控制器
Job控制器
与 Deployment 及 DaemonSet 控制器管理的守护进程类的服务应用不同的是,Job 控制器用于调配 Pod 对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将 Pod 对象置于 "Completed"(完成)状态。
若容器中的进程因错误而终止,则需要依配置确定重启与否,未运行完成的 Pod 对象因其所在的节点故障而意外终止后会被重新调度。Job 控制器的 Pod 对象的状态转换,参考下图:
工作中,有的作业任务可能需要运行不止一次,用户可以配置它们以串行或并行的方式运行。总结来说,这种类型的 Job 控制器对象两种,具体如下:
-
- 单工作队列(work queue)的串行式 Job:即以多个一次性的作业方式串行执行多次作业执行方式,在某个时刻仅存在一个 Pod 资源对象。
- 多工作队列的并行 Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业,也可以用有限的工作队列运行较多的作业,即工作队列数少于总业数,相当于运行多个串行作业队列。
Job 控制器常用于管理那些运行一段时间便可 "完成" 的任务,例如计算或备份操作。
一、创建 Job 对象
Job 控制器的 spec 字段内嵌的必要字段仅为 template,它的使用方式与 Deployment 等控制器并无不同。Job 会为其 Pod 对象自行添加 "job-name=JOB_NAME" 和 "controller-uid=UID" 标签,并使用标签选择器完成对 controller-uid 标签的关联。需要注意的是,Job 位于 API 群租 "batch/v1" 之内。下面的资源清单文件(job-example.yaml)中定义了一个 Job 控制器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | kind: Job apiVersion: batch /v1 metadata: name: elasticsearch-logging-curator-elasticsearch-curator-1655485200 namespace: kubesphere-logging-system labels: app: elasticsearch-curator release: elasticsearch-logging-curator annotations: revisions: >- { "1" :{ "status" : "completed" , "succeed" :1, "desire" :1, "uid" : "9acdb1d4-eb50-4309-9534-8848450fd732" , "start-time" : "2022-06-18T01:00:06+08:00" , "completion-time" : "2022-06-18T01:00:09+08:00" }} spec: parallelism: 1 completions: 1 backoffLimit: 6 selector: matchLabels: controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732 template: metadata: creationTimestamp: null labels: app: elasticsearch-curator controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732 job-name: elasticsearch-logging-curator-elasticsearch-curator-1655485200 release: elasticsearch-logging-curator spec: volumes: - name: config-volume configMap: name: elasticsearch-logging-curator-elasticsearch-curator-config defaultMode: 420 containers: - name: elasticsearch-curator image: >- registry.cn-beijing.aliyuncs.com /kubesphereio/elasticsearch-curator :v5.7.6 command : - curator /curator args: - '--config' - /etc/es-curator/config .yml - /etc/es-curator/action_file .yml resources: {} volumeMounts: - name: config-volume mountPath: /etc/es-curator terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent restartPolicy: Never terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst securityContext: runAsUser: 16 schedulerName: default-scheduler |
Pod模板中的 spec.restartPolicy 默认为 "Always",这对 Job 控制器来说并不适用,因此必须在 Pod 模板中显式设定 restartPolicy 属性的值为 "Never" 或 "OnFailure"。
使用 "kubectl create" 或 "kubectl apply" 命令完成创建后可查看相关的任务状态,DESIRED 字段表示期望并运行的 Pod 资源数量,而 SUCCESSFUL 则表示成功完成的 Job 数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | [root@mh-k8s-master-prd-243-24 ~] # kubectl get jobs --all-namespaces NAMESPACE NAME COMPLETIONS DURATION AGE istio-system jaeger-es-index-cleaner-1655481300 1 /1 3s 2d14h istio-system jaeger-es-index-cleaner-1655567700 1 /1 4s 38h istio-system jaeger-es-index-cleaner-1655654100 1 /1 3s 14h kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655485200 1 /1 3s 2d13h kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655571600 1 /1 4s 37h kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655658000 1 /1 4s 13h kubesphere-system openpitrix- import -job 1 /1 6m35s 69d testkube jetstack-cert-manager-startupapicheck 0 /1 56d 56d [root@mh-k8s-master-prd-243-24 ~] # kubectl get job jaeger-es-index-cleaner-1655481300 -n istio-system NAME COMPLETIONS DURATION AGE jaeger-es-index-cleaner-1655481300 1 /1 3s 2d14h [root@mh-k8s-master-prd-243-24 ~] # |
其详细信息中可显式所使用的标签选择器及匹配的 Pod 资源的标签,具体如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | [root@mh-k8s-master-prd-243-24 ~] # kubectl describe jobs jaeger-es-index-cleaner-1655481300 -n istio-system Name: jaeger-es-index-cleaner-1655481300 Namespace: istio-system Selector: controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c Labels: app=jaeger app.kubernetes.io /component =cronjob-es-index-cleaner app.kubernetes.io /instance =jaeger app.kubernetes.io /managed-by =jaeger-operator app.kubernetes.io /name =jaeger-es-index-cleaner app.kubernetes.io /part-of =jaeger controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c job-name=jaeger-es-index-cleaner-1655481300 Annotations: revisions: { "1" :{ "status" : "completed" , "succeed" :1, "uid" : "c0e4757a-a438-4206-8e66-e7eb16c6cf3c" , "start-time" : "2022-06-17T23:55:07+08:00" ,"completion-t... Controlled By: CronJob /jaeger-es-index-cleaner Parallelism: 1 Completions: < unset > Start Time: Fri, 17 Jun 2022 23:55:07 +0800 Completed At: Fri, 17 Jun 2022 23:55:10 +0800 Duration: 3s Pods Statuses: 0 Running / 1 Succeeded / 0 Failed Pod Template: Labels: app=jaeger app.kubernetes.io /component =cronjob-es-index-cleaner app.kubernetes.io /instance =jaeger app.kubernetes.io /managed-by =jaeger-operator app.kubernetes.io /name =jaeger-es-index-cleaner app.kubernetes.io /part-of =jaeger controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c job-name=jaeger-es-index-cleaner-1655481300 Annotations: linkerd.io /inject : disabled prometheus.io /scrape : false sidecar.istio.io /inject : false Service Account: jaeger Containers: jaeger-es-index-cleaner: Image: registry.cn-beijing.aliyuncs.com /kubesphereio/jaeger-es-index-cleaner :1.17 Port: <none> Host Port: <none> Args: 7 http: //elasticsearch-logging-data .kubesphere-logging-system.svc:9200 Environment Variables from: jaeger-secret Secret Optional: false Environment: INDEX_PREFIX: logstash Mounts: <none> Volumes: <none> Events: <none> [root@mh-k8s-master-prd-243-24 ~] # |
二、并行式Job
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | [root@mh-k8s-master-prd-243-24 ~] # kubectl explain job KIND: Job VERSION: batch /v1 DESCRIPTION: Job represents the configuration of a single job. FIELDS: apiVersion <string> APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #resources kind <string> Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #types-kinds metadata <Object> Standard object's metadata. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #metadata spec <Object> Specification of the desired behavior of a job. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #spec-and-status status <Object> Current status of a job. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #spec-and-status [root@mh-k8s-master-prd-243-24 ~] # |
将并行度属性 job.spec.parallelism 的值设置为 1,并设置总任务数 job.spec.completion 属性便能够让 Job 控制器以串行方式运行多任务。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | [root@mh-k8s-master-prd-243-24 ~] # kubectl explain job.spec KIND: Job VERSION: batch /v1 RESOURCE: spec <Object> DESCRIPTION: Specification of the desired behavior of a job. More info: https: //git .k8s.io /community/contributors/devel/sig-architecture/api-conventions .md #spec-and-status JobSpec describes how the job execution will look like. FIELDS: activeDeadlineSeconds <integer> Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer backoffLimit <integer> Specifies the number of retries before marking this job failed. Defaults to 6 completions <integer> Specifies the desired number of successfully finished pods the job should be run with. Setting to nil means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job. More info: https: //kubernetes .io /docs/concepts/workloads/controllers/jobs-run-to-completion/ manualSelector <boolean> manualSelector controls generation of pod labels and pod selectors. Leave `manualSelector` unset unless you are certain what you are doing. When false or unset , the system pick labels unique to this job and appends those labels to the pod template. When true , the user is responsible for picking unique labels and specifying the selector. Failure to pick a unique label may cause this and other jobs to not function correctly. However, You may see `manualSelector= true ` in jobs that were created with the old `extensions /v1beta1 ` API. More info: https: //kubernetes .io /docs/concepts/workloads/controllers/jobs-run-to-completion/ #specifying-your-own-pod-selector parallelism <integer> Specifies the maximum desired number of pods the job should run at any given time . The actual number of pods running in steady state will be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism), i.e. when the work left to do is less than max parallelism. More info: https: //kubernetes .io /docs/concepts/workloads/controllers/jobs-run-to-completion/ selector <Object> A label query over pods that should match the pod count. Normally, the system sets this field for you. More info: https: //kubernetes .io /docs/concepts/overview/working-with-objects/labels/ #label-selectors template <Object> -required- Describes the pod that will be created when executing a job. More info: https: //kubernetes .io /docs/concepts/workloads/controllers/jobs-run-to-completion/ ttlSecondsAfterFinished <integer> ttlSecondsAfterFinished limits the lifetime of a Job that has finished execution (either Complete or Failed). If this field is set , ttlSecondsAfterFinished after the Job finishes, it is eligible to be automatically deleted. When the Job is being deleted, its lifecycle guarantees (e.g. finalizers) will be honored. If this field is unset , the Job won't be automatically deleted. If this field is set to zero, the Job becomes eligible to be deleted immediately after it finishes. This field is alpha-level and is only honored by servers that enable the TTLAfterFinished feature. [root@mh-k8s-master-prd-243-24 ~] # |
下面是一个串行运行 5次 任务的 Job 控制器示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 | apiVersion: batch /v1 kind: Job metadata: name: job-multi spec: completions: 5 template: spec: containers: - name: myjob image: alpine command : [ "/bin/sh" , "-c" , "sleep 20" ] restartPolicy: OnFailure |
三、Job扩容
Job控制器的 job.spec.parallelism 定义的并行度表示同时运行的 Pod 对象数,此属性值支持运行时调整从而改变其队列总数,实现扩容和缩容。使用的命令与此前的 Deployment 对象相同。即 "kubeclt scale --replicas” 命令:
1 | kubectl scale jobs job-multi --replicas=2 |
四、删除Job
Job 控制器待其 Pod 资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。不过,如果某 Job 控制器的容器应用总是无法正常结束运行,而其 restartPolicy 又定为了重启,则它可能会一直处于不停地重启和错误的循环中。
- job.spec.activeDeadlineSeconds <integer>:Job 的 deadline,用于为其指定最大活动时间长度,超出此时长的作业将被终止。
- job.spec.backoffLimit <integer>:将作业标记为失败状态之前的重试次数,默认值为 6.
例如,下面的配置片段表示其失败重试的次数为 5,并且如果超过 100秒 的时间仍未完成,那么其将被终止:
1 2 3 | space: backoffLimit: 5 activeDeadlineSecond: 100 |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· DeepSeek 开源周回顾「GitHub 热点速览」
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了