Kubernetes——Job控制器

Job控制器

  与 Deployment 及 DaemonSet 控制器管理的守护进程类的服务应用不同的是,Job 控制器用于调配 Pod 对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将 Pod 对象置于 "Completed"(完成)状态。

  若容器中的进程因错误而终止,则需要依配置确定重启与否,未运行完成的 Pod 对象因其所在的节点故障而意外终止后会被重新调度。Job 控制器的 Pod 对象的状态转换,参考下图:

  工作中,有的作业任务可能需要运行不止一次,用户可以配置它们以串行或并行的方式运行。总结来说,这种类型的 Job 控制器对象两种,具体如下:

    • 单工作队列(work queue)的串行式 Job:即以多个一次性的作业方式串行执行多次作业执行方式,在某个时刻仅存在一个 Pod 资源对象。
    • 多工作队列的并行 Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业,也可以用有限的工作队列运行较多的作业,即工作队列数少于总业数,相当于运行多个串行作业队列。

  Job 控制器常用于管理那些运行一段时间便可 "完成" 的任务,例如计算或备份操作。

一、创建 Job 对象

  Job 控制器的 spec 字段内嵌的必要字段仅为 template,它的使用方式与 Deployment 等控制器并无不同。Job 会为其 Pod 对象自行添加 "job-name=JOB_NAME" 和 "controller-uid=UID" 标签,并使用标签选择器完成对 controller-uid 标签的关联。需要注意的是,Job 位于 API 群租 "batch/v1" 之内。下面的资源清单文件(job-example.yaml)中定义了一个 Job 控制器:

kind: Job
apiVersion: batch/v1
metadata:
  name: elasticsearch-logging-curator-elasticsearch-curator-1655485200
  namespace: kubesphere-logging-system
  labels:
    app: elasticsearch-curator
    release: elasticsearch-logging-curator
  annotations:
    revisions: >-
      {"1":{"status":"completed","succeed":1,"desire":1,"uid":"9acdb1d4-eb50-4309-9534-8848450fd732","start-time":"2022-06-18T01:00:06+08:00","completion-time":"2022-06-18T01:00:09+08:00"}}
spec:
  parallelism: 1
  completions: 1
  backoffLimit: 6
  selector:
    matchLabels:
      controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: elasticsearch-curator
        controller-uid: 9acdb1d4-eb50-4309-9534-8848450fd732
        job-name: elasticsearch-logging-curator-elasticsearch-curator-1655485200
        release: elasticsearch-logging-curator
    spec:
      volumes:
        - name: config-volume
          configMap:
            name: elasticsearch-logging-curator-elasticsearch-curator-config
            defaultMode: 420
      containers:
        - name: elasticsearch-curator
          image: >-
            registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
          command:
            - curator/curator
          args:
            - '--config'
            - /etc/es-curator/config.yml
            - /etc/es-curator/action_file.yml
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/es-curator
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
      restartPolicy: Never
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext:
        runAsUser: 16
      schedulerName: default-scheduler

  Pod模板中的 spec.restartPolicy 默认为 "Always",这对 Job 控制器来说并不适用,因此必须在 Pod 模板中显式设定 restartPolicy 属性的值为 "Never" 或 "OnFailure"。

  使用 "kubectl create" 或 "kubectl apply" 命令完成创建后可查看相关的任务状态,DESIRED 字段表示期望并运行的 Pod 资源数量,而 SUCCESSFUL 则表示成功完成的 Job 数:

[root@mh-k8s-master-prd-243-24 ~]# kubectl get jobs --all-namespaces
NAMESPACE                   NAME                                                             COMPLETIONS   DURATION   AGE
istio-system                jaeger-es-index-cleaner-1655481300                               1/1           3s         2d14h
istio-system                jaeger-es-index-cleaner-1655567700                               1/1           4s         38h
istio-system                jaeger-es-index-cleaner-1655654100                               1/1           3s         14h
kubesphere-logging-system   elasticsearch-logging-curator-elasticsearch-curator-1655485200   1/1           3s         2d13h
kubesphere-logging-system   elasticsearch-logging-curator-elasticsearch-curator-1655571600   1/1           4s         37h
kubesphere-logging-system   elasticsearch-logging-curator-elasticsearch-curator-1655658000   1/1           4s         13h
kubesphere-system           openpitrix-import-job                                            1/1           6m35s      69d
testkube                    jetstack-cert-manager-startupapicheck                            0/1           56d        56d
[root@mh-k8s-master-prd-243-24 ~]# kubectl get job jaeger-es-index-cleaner-1655481300 -n istio-system
NAME                                 COMPLETIONS   DURATION   AGE
jaeger-es-index-cleaner-1655481300   1/1           3s         2d14h
[root@mh-k8s-master-prd-243-24 ~]# 

  其详细信息中可显式所使用的标签选择器及匹配的 Pod 资源的标签,具体如下:

[root@mh-k8s-master-prd-243-24 ~]# kubectl describe jobs jaeger-es-index-cleaner-1655481300 -n istio-system
Name:           jaeger-es-index-cleaner-1655481300
Namespace:      istio-system
Selector:       controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
Labels:         app=jaeger
                app.kubernetes.io/component=cronjob-es-index-cleaner
                app.kubernetes.io/instance=jaeger
                app.kubernetes.io/managed-by=jaeger-operator
                app.kubernetes.io/name=jaeger-es-index-cleaner
                app.kubernetes.io/part-of=jaeger
                controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
                job-name=jaeger-es-index-cleaner-1655481300
Annotations:    revisions:
                  {"1":{"status":"completed","succeed":1,"uid":"c0e4757a-a438-4206-8e66-e7eb16c6cf3c","start-time":"2022-06-17T23:55:07+08:00","completion-t...
Controlled By:  CronJob/jaeger-es-index-cleaner
Parallelism:    1
Completions:    <unset>
Start Time:     Fri, 17 Jun 2022 23:55:07 +0800
Completed At:   Fri, 17 Jun 2022 23:55:10 +0800
Duration:       3s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:           app=jaeger
                    app.kubernetes.io/component=cronjob-es-index-cleaner
                    app.kubernetes.io/instance=jaeger
                    app.kubernetes.io/managed-by=jaeger-operator
                    app.kubernetes.io/name=jaeger-es-index-cleaner
                    app.kubernetes.io/part-of=jaeger
                    controller-uid=c0e4757a-a438-4206-8e66-e7eb16c6cf3c
                    job-name=jaeger-es-index-cleaner-1655481300
  Annotations:      linkerd.io/inject: disabled
                    prometheus.io/scrape: false
                    sidecar.istio.io/inject: false
  Service Account:  jaeger
  Containers:
   jaeger-es-index-cleaner:
    Image:      registry.cn-beijing.aliyuncs.com/kubesphereio/jaeger-es-index-cleaner:1.17
    Port:       <none>
    Host Port:  <none>
    Args:
      7
      http://elasticsearch-logging-data.kubesphere-logging-system.svc:9200
    Environment Variables from:
      jaeger-secret  Secret  Optional: false
    Environment:
      INDEX_PREFIX:  logstash
    Mounts:          <none>
  Volumes:           <none>
Events:              <none>
[root@mh-k8s-master-prd-243-24 ~]# 

二、并行式Job

[root@mh-k8s-master-prd-243-24 ~]#  kubectl explain job
KIND:     Job
VERSION:  batch/v1

DESCRIPTION:
     Job represents the configuration of a single job.

FIELDS:
   apiVersion	<string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind	<string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata	<Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec	<Object>
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status	<Object>
     Current status of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

[root@mh-k8s-master-prd-243-24 ~]# 

  将并行度属性 job.spec.parallelism 的值设置为 1,并设置总任务数 job.spec.completion 属性便能够让 Job 控制器以串行方式运行多任务。

[root@mh-k8s-master-prd-243-24 ~]#  kubectl explain job.spec
KIND:     Job
VERSION:  batch/v1

RESOURCE: spec <Object>

DESCRIPTION:
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     JobSpec describes how the job execution will look like.

FIELDS:
   activeDeadlineSeconds	<integer>
     Specifies the duration in seconds relative to the startTime that the job
     may be active before the system tries to terminate it; value must be
     positive integer

   backoffLimit	<integer>
     Specifies the number of retries before marking this job failed. Defaults to
     6

   completions	<integer>
     Specifies the desired number of successfully finished pods the job should
     be run with. Setting to nil means that the success of any pod signals the
     success of all pods, and allows parallelism to have any positive value.
     Setting to 1 means that parallelism is limited to 1 and the success of that
     pod signals the success of the job. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   manualSelector	<boolean>
     manualSelector controls generation of pod labels and pod selectors. Leave
     `manualSelector` unset unless you are certain what you are doing. When
     false or unset, the system pick labels unique to this job and appends those
     labels to the pod template. When true, the user is responsible for picking
     unique labels and specifying the selector. Failure to pick a unique label
     may cause this and other jobs to not function correctly. However, You may
     see `manualSelector=true` in jobs that were created with the old
     `extensions/v1beta1` API. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector

   parallelism	<integer>
     Specifies the maximum desired number of pods the job should run at any
     given time. The actual number of pods running in steady state will be less
     than this number when ((.spec.completions - .status.successful) <
     .spec.parallelism), i.e. when the work left to do is less than max
     parallelism. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   selector	<Object>
     A label query over pods that should match the pod count. Normally, the
     system sets this field for you. More info:
     https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

   template	<Object> -required-
     Describes the pod that will be created when executing a job. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

   ttlSecondsAfterFinished	<integer>
     ttlSecondsAfterFinished limits the lifetime of a Job that has finished
     execution (either Complete or Failed). If this field is set,
     ttlSecondsAfterFinished after the Job finishes, it is eligible to be
     automatically deleted. When the Job is being deleted, its lifecycle
     guarantees (e.g. finalizers) will be honored. If this field is unset, the
     Job won't be automatically deleted. If this field is set to zero, the Job
     becomes eligible to be deleted immediately after it finishes. This field is
     alpha-level and is only honored by servers that enable the TTLAfterFinished
     feature.

[root@mh-k8s-master-prd-243-24 ~]# 

  下面是一个串行运行 5次 任务的 Job 控制器示例:

apiVersion: batch/v1
kind: Job
metadata:
  name: job-multi
spec:
  completions: 5
  template:
    spec:
	  containers:
	  - name: myjob
	    image: alpine
		command: ["/bin/sh", "-c", "sleep 20"]
	  restartPolicy:  OnFailure

三、Job扩容

  Job控制器的 job.spec.parallelism 定义的并行度表示同时运行的 Pod 对象数,此属性值支持运行时调整从而改变其队列总数,实现扩容和缩容。使用的命令与此前的 Deployment 对象相同。即 "kubeclt scale --replicas” 命令:

kubectl scale jobs job-multi --replicas=2

四、删除Job

  Job 控制器待其 Pod 资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。不过,如果某 Job 控制器的容器应用总是无法正常结束运行,而其 restartPolicy 又定为了重启,则它可能会一直处于不停地重启和错误的循环中。

  • job.spec.activeDeadlineSeconds <integer>:Job 的 deadline,用于为其指定最大活动时间长度,超出此时长的作业将被终止。
  • job.spec.backoffLimit <integer>:将作业标记为失败状态之前的重试次数,默认值为 6.

  例如,下面的配置片段表示其失败重试的次数为 5,并且如果超过 100秒 的时间仍未完成,那么其将被终止:

space:
  backoffLimit: 5
  activeDeadlineSecond: 100
posted @ 2022-06-20 14:36  左扬  阅读(329)  评论(0编辑  收藏  举报
levels of contents