Kubernetes——Job控制器
Job控制器
与 Deployment 及 DaemonSet 控制器管理的守护进程类的服务应用不同的是,Job 控制器用于调配 Pod 对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将 Pod 对象置于 "Completed"(完成)状态。
若容器中的进程因错误而终止,则需要依配置确定重启与否,未运行完成的 Pod 对象因其所在的节点故障而意外终止后会被重新调度。Job 控制器的 Pod 对象的状态转换,参考下图:
工作中,有的作业任务可能需要运行不止一次,用户可以配置它们以串行或并行的方式运行。总结来说,这种类型的 Job 控制器对象两种,具体如下:
-
- 单工作队列(work queue)的串行式 Job:即以多个一次性的作业方式串行执行多次作业执行方式,在某个时刻仅存在一个 Pod 资源对象。
- 多工作队列的并行 Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业,也可以用有限的工作队列运行较多的作业,即工作队列数少于总业数,相当于运行多个串行作业队列。
Job 控制器常用于管理那些运行一段时间便可 "完成" 的任务,例如计算或备份操作。
一、创建 Job 对象
Job 控制器的 spec 字段内嵌的必要字段仅为 template,它的使用方式与 Deployment 等控制器并无不同。Job 会为其 Pod 对象自行添加 "job-name=JOB_NAME" 和 "controller-uid=UID" 标签,并使用标签选择器完成对 controller-uid 标签的关联。需要注意的是,Job 位于 API 群租 "batch/v1" 之内。下面的资源清单文件(job-example.yaml)中定义了一个 Job 控制器:
kind: Job
apiVersion: batch/v1
metadata:
name: elasticsearch-logging-curator-elasticsearch-curator-1655485200
namespace: kubesphere-logging-system
labels:
app: elasticsearch-curator
release: elasticsearch-logging-curator
annotations:
revisions: >-
{"1":{"status":"completed","succeed":1,"desire":1,"uid":"9acdb1d4-eb50-4309-9534-8848450fd732","start-time":"2022-06-18T01:00:06+08:00","completion-time":"2022-06-18T01:00:09+08:00"}}
spec:
parallelism: 1
completions: 1
backoffLimit: 6
selector:
matchLabels:
controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732
template:
metadata:
creationTimestamp: null
labels:
app: elasticsearch-curator
controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732
job-name: elasticsearch-logging-curator-elasticsearch-curator-1655485200
release: elasticsearch-logging-curator
spec:
volumes:
- name: config-volume
configMap:
name: elasticsearch-logging-curator-elasticsearch-curator-config
defaultMode: 420
containers:
- name: elasticsearch-curator
image: >-
registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
command:
- curator/curator
args:
- '--config'
- /etc/es-curator/config.yml
- /etc/es-curator/action_file.yml
resources: {}
volumeMounts:
- name: config-volume
mountPath: /etc/es-curator
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Never
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext:
runAsUser: 16
schedulerName: default-scheduler
Pod模板中的 spec.restartPolicy 默认为 "Always",这对 Job 控制器来说并不适用,因此必须在 Pod 模板中显式设定 restartPolicy 属性的值为 "Never" 或 "OnFailure"。
使用 "kubectl create" 或 "kubectl apply" 命令完成创建后可查看相关的任务状态,DESIRED 字段表示期望并运行的 Pod 资源数量,而 SUCCESSFUL 则表示成功完成的 Job 数:
[root@mh-k8s-master-prd-243-24 ~]# kubectl get jobs --all-namespaces
NAMESPACE NAME COMPLETIONS DURATION AGE
istio-system jaeger-es-index-cleaner-1655481300 1/1 3s 2d14h
istio-system jaeger-es-index-cleaner-1655567700 1/1 4s 38h
istio-system jaeger-es-index-cleaner-1655654100 1/1 3s 14h
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655485200 1/1 3s 2d13h
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655571600 1/1 4s 37h
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-1655658000 1/1 4s 13h
kubesphere-system openpitrix-import-job 1/1 6m35s 69d
testkube jetstack-cert-manager-startupapicheck 0/1 56d 56d
[root@mh-k8s-master-prd-243-24 ~]# kubectl get job jaeger-es-index-cleaner-1655481300 -n istio-system
NAME COMPLETIONS DURATION AGE
jaeger-es-index-cleaner-1655481300 1/1 3s 2d14h
[root@mh-k8s-master-prd-243-24 ~]#
其详细信息中可显式所使用的标签选择器及匹配的 Pod 资源的标签,具体如下:
[root@mh-k8s-master-prd-243-24 ~]# kubectl describe jobs jaeger-es-index-cleaner-1655481300 -n istio-system
Name: jaeger-es-index-cleaner-1655481300
Namespace: istio-system
Selector: controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
Labels: app=jaeger
app.kubernetes.io/component=cronjob-es-index-cleaner
app.kubernetes.io/instance=jaeger
app.kubernetes.io/managed-by=jaeger-operator
app.kubernetes.io/name=jaeger-es-index-cleaner
app.kubernetes.io/part-of=jaeger
controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
job-name=jaeger-es-index-cleaner-1655481300
Annotations: revisions:
{"1":{"status":"completed","succeed":1,"uid":"c0e4757a-a438-4206-8e66-e7eb16c6cf3c","start-time":"2022-06-17T23:55:07+08:00","completion-t...
Controlled By: CronJob/jaeger-es-index-cleaner
Parallelism: 1
Completions: <unset>
Start Time: Fri, 17 Jun 2022 23:55:07 +0800
Completed At: Fri, 17 Jun 2022 23:55:10 +0800
Duration: 3s
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pod Template:
Labels: app=jaeger
app.kubernetes.io/component=cronjob-es-index-cleaner
app.kubernetes.io/instance=jaeger
app.kubernetes.io/managed-by=jaeger-operator
app.kubernetes.io/name=jaeger-es-index-cleaner
app.kubernetes.io/part-of=jaeger
controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
job-name=jaeger-es-index-cleaner-1655481300
Annotations: linkerd.io/inject: disabled
prometheus.io/scrape: false
sidecar.istio.io/inject: false
Service Account: jaeger
Containers:
jaeger-es-index-cleaner:
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-es-index-cleaner:1.17
Port: <none>
Host Port: <none>
Args:
7
http://elasticsearch-logging-data.kubesphere-logging-system.svc:9200
Environment Variables from:
jaeger-secret Secret Optional: false
Environment:
INDEX_PREFIX: logstash
Mounts: <none>
Volumes: <none>
Events: <none>
[root@mh-k8s-master-prd-243-24 ~]#
二、并行式Job
[root@mh-k8s-master-prd-243-24 ~]# kubectl explain job
KIND: Job
VERSION: batch/v1
DESCRIPTION:
Job represents the configuration of a single job.
FIELDS:
apiVersion <string>
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind <string>
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata <Object>
Standard object's metadata. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
spec <Object>
Specification of the desired behavior of a job. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
status <Object>
Current status of a job. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
[root@mh-k8s-master-prd-243-24 ~]#
将并行度属性 job.spec.parallelism 的值设置为 1,并设置总任务数 job.spec.completion 属性便能够让 Job 控制器以串行方式运行多任务。
[root@mh-k8s-master-prd-243-24 ~]# kubectl explain job.spec
KIND: Job
VERSION: batch/v1
RESOURCE: spec <Object>
DESCRIPTION:
Specification of the desired behavior of a job. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
JobSpec describes how the job execution will look like.
FIELDS:
activeDeadlineSeconds <integer>
Specifies the duration in seconds relative to the startTime that the job
may be active before the system tries to terminate it; value must be
positive integer
backoffLimit <integer>
Specifies the number of retries before marking this job failed. Defaults to
6
completions <integer>
Specifies the desired number of successfully finished pods the job should
be run with. Setting to nil means that the success of any pod signals the
success of all pods, and allows parallelism to have any positive value.
Setting to 1 means that parallelism is limited to 1 and the success of that
pod signals the success of the job. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
manualSelector <boolean>
manualSelector controls generation of pod labels and pod selectors. Leave
`manualSelector` unset unless you are certain what you are doing. When
false or unset, the system pick labels unique to this job and appends those
labels to the pod template. When true, the user is responsible for picking
unique labels and specifying the selector. Failure to pick a unique label
may cause this and other jobs to not function correctly. However, You may
see `manualSelector=true` in jobs that were created with the old
`extensions/v1beta1` API. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector
parallelism <integer>
Specifies the maximum desired number of pods the job should run at any
given time. The actual number of pods running in steady state will be less
than this number when ((.spec.completions - .status.successful) <
.spec.parallelism), i.e. when the work left to do is less than max
parallelism. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
selector <Object>
A label query over pods that should match the pod count. Normally, the
system sets this field for you. More info:
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
template <Object> -required-
Describes the pod that will be created when executing a job. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
ttlSecondsAfterFinished <integer>
ttlSecondsAfterFinished limits the lifetime of a Job that has finished
execution (either Complete or Failed). If this field is set,
ttlSecondsAfterFinished after the Job finishes, it is eligible to be
automatically deleted. When the Job is being deleted, its lifecycle
guarantees (e.g. finalizers) will be honored. If this field is unset, the
Job won't be automatically deleted. If this field is set to zero, the Job
becomes eligible to be deleted immediately after it finishes. This field is
alpha-level and is only honored by servers that enable the TTLAfterFinished
feature.
[root@mh-k8s-master-prd-243-24 ~]#
下面是一个串行运行 5次 任务的 Job 控制器示例:
apiVersion: batch/v1
kind: Job
metadata:
name: job-multi
spec:
completions: 5
template:
spec:
containers:
- name: myjob
image: alpine
command: ["/bin/sh", "-c", "sleep 20"]
restartPolicy: OnFailure
三、Job扩容
Job控制器的 job.spec.parallelism 定义的并行度表示同时运行的 Pod 对象数,此属性值支持运行时调整从而改变其队列总数,实现扩容和缩容。使用的命令与此前的 Deployment 对象相同。即 "kubeclt scale --replicas” 命令:
kubectl scale jobs job-multi --replicas=2
四、删除Job
Job 控制器待其 Pod 资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。不过,如果某 Job 控制器的容器应用总是无法正常结束运行,而其 restartPolicy 又定为了重启,则它可能会一直处于不停地重启和错误的循环中。
- job.spec.activeDeadlineSeconds <integer>:Job 的 deadline,用于为其指定最大活动时间长度,超出此时长的作业将被终止。
- job.spec.backoffLimit <integer>:将作业标记为失败状态之前的重试次数,默认值为 6.
例如,下面的配置片段表示其失败重试的次数为 5,并且如果超过 100秒 的时间仍未完成,那么其将被终止:
space:
backoffLimit: 5
activeDeadlineSecond: 100