转载： Kubernetes CronJobs: Basics, Tutorial, and Troubleshooting —— concurrency

原文： Kubernetes CronJobs: Basics, Tutorial, and Troubleshooting | Komodor

What Are Kubernetes CronJobs?

A Kubernetes Job is a workload controller object that performs specific tasks on a cluster. It differs from most controller objects such as Deployments and ReplicaSets, which need to constantly reconcile the current state of the cluster with a desired configuration.

A Job has a much more limited function: it runs pods until they complete a specified task, and then terminates them.

A CronJob is the same as a regular Job, only it creates jobs on a schedule (with a syntax similar to the Linux cron utility). CronJobs are used to perform regularly scheduled tasks such as backups and report generation. You can define the tasks to run at specific times or repeat indefinitely (for example, daily or weekly).

This article is a part of a series on Kubernetes Troubleshooting.

What Are the Benefits of Kubernetes CronJobs?

CronJobs are highly useful because they run in their own separate containers, letting you run an operation in the exact containers you need. This allows you to lock each CronJob to a specific version of a container, update each cron individually, and customize it with any specific dependencies it needs.

You can also choose how many resources a CronJob will receive, by setting the minimal available resources a node should have for the CronJob to run on it. You can also set resource limits to avoid overloading the node.

Another useful capability of CronJobs is their built-in retry policy. If a CronJob fails, you can define whether it should run again and how many times it should retry.

Related content: read our guide to fixing kubernetes node not ready error.

Kubernetes CronJobs: Quick Tutorial

How to Create Kubernetes CronJob

Creating a CronJob is very similar to creating a regular Job. We’ll need to define a YAML manifest file that includes the Job name, which containers to run, and commands to execute on the containers.

To create a Kuberntes CronJob:

1. Create a YAML file in a text editor.

nano [mycronjob].yaml

2. The CronJob YAML configuration should look something like this. Pay attention to the spec.schedule field, which defines when and how often the Job should run. We explain the cron schedule syntax in the following section.

apiVersion: batch/v1

kind: CronJob

metadata:

  name: hello

spec:

  schedule: "0 0 * * *"

  jobTemplate:

    spec:

      template:

        spec:

          containers:

          - name: hello

            image: busybox:1.28

            imagePullPolicy: IfNotPresent

            command:

            - /bin/sh

            - -c

            - date; echo Hello World

          restartPolicy: OnFailure

A few important points about this code:

The spec:schedule field specifies that the job should run once per day at midnight.
The spec:template:containers field specifies which container the CronJob should run – an BusyBox image.
Within the containers field we define a series of shell commands to run on the container – in this case, printing “Hello World” to the console.
The restartPolicy can be either Never or OnFailure. If you use the latter, your code needs to be able to handle the possibility of restart after failure.

3. Create your CronJob in the cluster using this command:

kubectl apply -f [filename].yaml

4. Run the following command to monitor task execution:

kubectl get cronjob --watch

CronJob Schedule Syntax

The cron schedule syntax used by the spec.schedule field has five characters, and each character represents one time unit.

FIELD	MINUTES	HOURS	DAYS IN A MONTH	MONTHS	WEEKDAYS
Values	0-59	0-23	1-31	1-12	0-6
Example 1	0	21	*	*	4
Example 2	0	0	12	*	5
Example 3	*	*/1	*	*	*

Here is what each of the examples means:

Example 1 runs the job at 9 pm every Thursday
Example 2 runs the job every Friday at midnight and also on the 12th of each month at midnight
Example 3 runs the job every minute – the */n syntax means to repeat the job every interval

Kubernetes CronJobs Monitoring and Considerations

One CronJob can serve as the model for various jobs, but you may need to adjust it. Here are some considerations when defining a CronJob.

CronJob Concurrency Policy

CronJobs have embedded concurrency controls (a major difference from Unix cron) that let you disable concurrent execution, although Kubernetes enables concurrency by default. With concurrency enabled, a scheduled CronJob run will start even if the last run is incomplete. Concurrency is not desirable for jobs that require sequential execution.

You can control concurrency by configuring the concurrency policy on CronJob objects. You can set one of three values:

Allow – this is the default setting.
Forbid – prevents concurrent runs. Kubernetes skips scheduled starts if the last run hasn’t finished.
Replace – terminates incomplete runs when the next job is scheduled, allowing the new run to proceed.

You can apply the concurrency policy to the cluster to create CronJobs that only permit a single run at any time.

Starting Deadline

A starting deadline determines if your scheduled CronJob run can start. This concept is specific to Kubernetes, defining how long each job run is eligible to begin after the scheduled time has lapsed. It is useful for jobs with disabled concurrency when job runs cannot always start on schedule.

The starting deadline seconds field controls this value. For example, a starting deadline of 15 seconds allows a limited delay – a job scheduled for 10:00:00 can start if the previous run ends at 10:00:14, but not if it ends at 10:00:15.

Retaining Job History

Another two values are the successful jobs history limit and the failed jobs history limit. They control the time limit for retaining the history of these job types (by default, three successful and one failed job). You can change these values – higher values keep the history for longer, which is useful for debugging.

CronJob Monitoring

Kubernetes allows you to monitor CronJobs with mechanisms like the kubectl command. The get command provides a CronJob’s definition and job run details. The jobs within a CronJob should have the CronJob name alongside an appended starting timestamp.After identifying an individual job, you can use a kubectl command to retrieve container logs:

$ kubectl logs job/example-cron-1648239040

Kubernetes CronJobs Errors

Kubernetes Not Scheduling CronJob

This error arises when CronJobs don’t fire as scheduled on Kubernetes. Manually firing the Job shows that it is functioning, yet the Pod for the cron job doesn’t appear.

Sample Scenario

A CronJob is scheduled to fire every 60 seconds on an instance of Microk8s but doesn’t happen. The user tries to fire the Job with the following command manually:

k create job --from=cronjob/demo-cron-job demo-cron-job

While the Job runs after this command, it doesn’t run as scheduled. Here is the manifest of the API object in YAML format:

apiVersion: batch/v1beta1

kind: CronJob

metadata:

  name: demo-cron-job

  namespace: {{ .Values.global.namespace }}

  labels:

    app.kubernetes.io/managed-by: Helm

    app.kubernetes.io/release-name: {{ .Release.Name }}

    app.kubernetes.io/release-namespace: {{ .Release.Namespace }}

spec:

  schedule: "* * * * *"

  concurrencyPolicy: Replace

  successfulJobsHistoryLimit: 1

  jobTemplate:

    spec:

      template:

        spec:

          containers:

          - name: demo-cron-job

            image: demoImage

            imagePullPolicy: IfNotPresent

            command:

            - /bin/sh

            - -c

            - /usr/bin/curl -k http://restfulservices/api/demo-job

          restartPolicy: OnFailure

A possible resolution is to restart the entire namespace and redeploy. However, you should check the following details In such a case:

Was the CronJob scheduling working at one point and doesn’t anymore?
If any cron pods show a “failed” status and if they do, check those pods for a reason behind the failure.
Use the following command to see if the CronJob resource has anything in the events:

kubectl describe cronjob demo-cron-job -n tango

Does the code that the CronJob runs take more than a minute to complete? In that case, the schedule is too congested and needs loosening.
The CronJob controller has built-in restrictions like freezing a job and not scheduling it anymore if it misses scheduling it more than 100 times. Check for this and other restrictions, which you can find here.
Do you scale the cluster down when it isn’t in use?
Are there any third-party webhooks or plugins installed in the cluster? Such webhooks can interfere with pod creation.
Does the namespace have any jobs created? Use the following command to check:
kubectl get jobs -n tango
If there are several job objects, investigate to see why they didn’t generate pods.

Kubernetes CronJob Stops Scheduling Jobs

This error arises when a CronJob stops scheduling the specified job. It is a common error to face when some part of the job fails consistently for a few times.

Sample Scenario

The user scheduled a CronJob that was functioning for some time before it stopped scheduling new jobs. The Job involved a step where it had to pull a container image and failed. Their manifest is shown below:

apiVersion: batch/v1beta1

kind: CronJob

metadata:

  labels:

    app.kubernetes.io/instance: demo-cron-job

    app.kubernetes.io/managed-by: Tiller

    app.kubernetes.io/name: cron

    helm.sh/chart: cron-0.1.0

  name: demo-cron-job

spec:

  concurrencyPolicy: Forbid

  failedJobsHistoryLimit: 1

  jobTemplate:

    metadata:

      creationTimestamp: null

    spec:

      template:

        spec:

          containers:

          - args:

            - -c

            - npm run script

            command:

            - /bin/sh

            env:

            image: 

            imagePullPolicy: Always

            name: cron

            resources: {}

            securityContext:

              runAsUser: 1000

            terminationMessagePath: /dev/demo-termination-log

            terminationMessagePolicy: File

          dnsPolicy: ClusterFirst

          restartPolicy: Never

          schedulerName: default-scheduler

          securityContext: {}

          terminationGracePeriodSeconds: 30

  schedule: 0/30 * * * *

  successfulJobsHistoryLimit: 3

  suspend: false

status: {}

Here, the spec.restartPolicy specification is set to Never. Hence, the entire Pod fails whenever a container in the Pod fails. However, the manifest doesn’t include the .spec.backoffLimit field that specifies how many times the Job will retry before being considered a failed Job. Then, it resorts to the default value, which is 6. Hence, the Job here tries to pull the container image six times before considering it a failed job.

Here are some possible resolutions:

Specify the .spec.backoffLimit field and set a high value
Set the spec.restartPolicy to onFailureso the Pod stays on the node and only the failing container reruns
Consider setting the imagePullPolicy to ifNotPresent. This won’t force re-pulling the image on every job start, unless images are retagged

Error Status on Kubernetes Cron Job with Connection Refused

This error arises when the CronJob involves communicating with an API endpoint. If the endpoint doesn’t respond successfully, the job shows an error status.

Sample Scenario

The user has a cron job that hits a REST API’s endpoint to pull an image of the concerned application. The manifest of the job is as follows:

apiVersion: batch/v1beta1

kind: CronJob

metadata:

  name: demo-cronjob

  labels:

    app: {{ .Release.Name }}

    chart: {{ .Chart.Name }}-{{ .Chart.Version }}

    release: {{ .Release.Name }}

spec:

  concurrencyPolicy: Forbid

  successfulJobsHistoryLimit: 2

  failedJobsHistoryLimit: 2

  startingDeadlineSeconds: 1800

  jobTemplate:

    spec:

      template:

        metadata:

          name: demo-cronjob

          labels:

            app: demo

        spec:

          restartPolicy: OnFailure

          containers:

            - name: demo

              image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"

              command: ["/bin/sh", "-c", "curl http://localhost:8080/demo"]

              readinessProbe:

                httpGet:

                  path: "/demojob"

                  port: 8081

                initialDelaySeconds: 300

                periodSeconds: 60

                timeoutSeconds: 30

                failureThreshold: 3

              livenessProbe:

                httpGet:

                  path: "/demojob"

                  port: 8081

                initialDelaySeconds: 300

                periodSeconds: 60

                timeoutSeconds: 30

                failureThreshold: 3

              resources:

                requests:

                  cpu: 200m

                  memory: 4Gi

                limits:

                  cpu: 1

                  memory: 8Gi

  schedule: "*/40 * * * *"

The user then faces the following error:

curl: (7) Failed to connect to localhost port 8080: Connection refused

Here, the issue is that the user has provided command and arguments which override the container image’s commands and arguments. The command here overrides the default entry point of the container, and no application starts. A possible resolution is to use a bash script that first sets up and runs the REST application, and then the Job can communicate with its endpoint.

Kubernetes Troubleshooting with Komodor

The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can.

This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.

Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:

Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.

reference：

Running Automated Tasks with a CronJob | Kubernetes

Concurrency Policy

The .spec.concurrencyPolicy field is also optional. It specifies how to treat concurrent executions of a job that is created by this cron job. The spec may specify only one of the following concurrency policies:

Allow (default): The cron job allows concurrently running jobs
Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn't finished yet, the cron job skips the new job run
Replace: If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run

Note that concurrency policy only applies to the jobs created by the same cron job. If there are multiple cron jobs, their respective jobs are always allowed to run concurrently.

posted @ 2022-11-17 20:08 PanPan003 阅读(42) 评论(0) 编辑收藏举报

刷新页面返回顶部

晴天彩虹

转载： Kubernetes CronJobs: Basics, Tutorial, and Troubleshooting —— concurrency

What Are Kubernetes CronJobs?

What Are the Benefits of Kubernetes CronJobs?

Kubernetes CronJobs: Quick Tutorial

How to Create Kubernetes CronJob

CronJob Schedule Syntax

Kubernetes CronJobs Monitoring and Considerations

CronJob Concurrency Policy

Starting Deadline

Retaining Job History

CronJob Monitoring

Kubernetes CronJobs Errors

Kubernetes Not Scheduling CronJob

Sample Scenario

Kubernetes CronJob Stops Scheduling Jobs

Sample Scenario

Error Status on Kubernetes Cron Job with Connection Refused

Sample Scenario

Kubernetes Troubleshooting with Komodor

Concurrency Policy

公告