转载: Kubernetes CronJobs: Basics, Tutorial, and Troubleshooting —— concurrency
原文: Kubernetes CronJobs: Basics, Tutorial, and Troubleshooting | Komodor
What Are Kubernetes CronJobs?
A Kubernetes Job is a workload controller object that performs specific tasks on a cluster. It differs from most controller objects such as Deployments and ReplicaSets, which need to constantly reconcile the current state of the cluster with a desired configuration.
A Job has a much more limited function: it runs pods until they complete a specified task, and then terminates them.
A CronJob is the same as a regular Job, only it creates jobs on a schedule (with a syntax similar to the Linux cron utility). CronJobs are used to perform regularly scheduled tasks such as backups and report generation. You can define the tasks to run at specific times or repeat indefinitely (for example, daily or weekly).
This article is a part of a series on Kubernetes Troubleshooting.
What Are the Benefits of Kubernetes CronJobs?
CronJobs are highly useful because they run in their own separate containers, letting you run an operation in the exact containers you need. This allows you to lock each CronJob to a specific version of a container, update each cron individually, and customize it with any specific dependencies it needs.
You can also choose how many resources a CronJob will receive, by setting the minimal available resources a node should have for the CronJob to run on it. You can also set resource limits to avoid overloading the node.
Another useful capability of CronJobs is their built-in retry policy. If a CronJob fails, you can define whether it should run again and how many times it should retry.
Related content: read our guide to fixing kubernetes node not ready error.
Kubernetes CronJobs: Quick Tutorial
How to Create Kubernetes CronJob
Creating a CronJob is very similar to creating a regular Job. We’ll need to define a YAML manifest file that includes the Job name, which containers to run, and commands to execute on the containers.
To create a Kuberntes CronJob:
1. Create a YAML file in a text editor.
nano [mycronjob].yaml
2. The CronJob YAML configuration should look something like this. Pay attention to the spec.schedule
field, which defines when and how often the Job should run. We explain the cron schedule syntax in the following section.
apiVersion: batch/v1 kind: CronJob metadata: name: hello spec: schedule: "0 0 * * *" jobTemplate: spec: template: spec: containers: - name: hello image: busybox:1.28 imagePullPolicy: IfNotPresent command: - /bin/sh - -c - date; echo Hello World restartPolicy: OnFailure
A few important points about this code:
- The
spec:schedule
field specifies that the job should run once per day at midnight. - The
spec:template:containers
field specifies which container the CronJob should run – an BusyBox image. - Within the containers field we define a series of shell commands to run on the container – in this case, printing “Hello World” to the console.
- The
restartPolicy
can be eitherNever
orOnFailure
. If you use the latter, your code needs to be able to handle the possibility of restart after failure.
3. Create your CronJob in the cluster using this command:
kubectl apply -f [filename].yaml
4. Run the following command to monitor task execution:
kubectl get cronjob --watch
CronJob Schedule Syntax
The cron schedule syntax used by the spec.schedule
field has five characters, and each character represents one time unit.
FIELD | MINUTES | HOURS | DAYS IN A MONTH | MONTHS | WEEKDAYS |
Values | 0-59 | 0-23 | 1-31 | 1-12 | 0-6 |
Example 1 | 0 | 21 | * | * | 4 |
Example 2 | 0 | 0 | 12 | * | 5 |
Example 3 | * | */1 | * | * | * |
Here is what each of the examples means:
- Example 1 runs the job at 9 pm every Thursday
- Example 2 runs the job every Friday at midnight and also on the 12th of each month at midnight
- Example 3 runs the job every minute – the
*/n
syntax means to repeat the job every interval
Kubernetes CronJobs Monitoring and Considerations
One CronJob can serve as the model for various jobs, but you may need to adjust it. Here are some considerations when defining a CronJob.
CronJob Concurrency Policy
CronJobs have embedded concurrency controls (a major difference from Unix cron) that let you disable concurrent execution, although Kubernetes enables concurrency by default. With concurrency enabled, a scheduled CronJob run will start even if the last run is incomplete. Concurrency is not desirable for jobs that require sequential execution.
You can control concurrency by configuring the concurrency policy on CronJob objects. You can set one of three values:
- Allow – this is the default setting.
- Forbid – prevents concurrent runs. Kubernetes skips scheduled starts if the last run hasn’t finished.
- Replace – terminates incomplete runs when the next job is scheduled, allowing the new run to proceed.
You can apply the concurrency policy to the cluster to create CronJobs that only permit a single run at any time.
Starting Deadline
A starting deadline determines if your scheduled CronJob run can start. This concept is specific to Kubernetes, defining how long each job run is eligible to begin after the scheduled time has lapsed. It is useful for jobs with disabled concurrency when job runs cannot always start on schedule.
The starting deadline seconds field controls this value. For example, a starting deadline of 15 seconds allows a limited delay – a job scheduled for 10:00:00 can start if the previous run ends at 10:00:14, but not if it ends at 10:00:15.
Retaining Job History
Another two values are the successful jobs history limit and the failed jobs history limit. They control the time limit for retaining the history of these job types (by default, three successful and one failed job). You can change these values – higher values keep the history for longer, which is useful for debugging.
CronJob Monitoring
Kubernetes allows you to monitor CronJobs with mechanisms like the kubectl command. The get command provides a CronJob’s definition and job run details. The jobs within a CronJob should have the CronJob name alongside an appended starting timestamp.After identifying an individual job, you can use a kubectl command to retrieve container logs:
$ kubectl logs job/example-cron-1648239040
Kubernetes CronJobs Errors
Kubernetes Not Scheduling CronJob
This error arises when CronJobs don’t fire as scheduled on Kubernetes. Manually firing the Job shows that it is functioning, yet the Pod for the cron job doesn’t appear.
Sample Scenario
A CronJob is scheduled to fire every 60 seconds on an instance of Microk8s but doesn’t happen. The user tries to fire the Job with the following command manually:
k create job --from=cronjob/demo-cron-job demo-cron-job
While the Job runs after this command, it doesn’t run as scheduled. Here is the manifest of the API object in YAML format:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: demo-cron-job namespace: {{ .Values.global.namespace }} labels: app.kubernetes.io/managed-by: Helm app.kubernetes.io/release-name: {{ .Release.Name }} app.kubernetes.io/release-namespace: {{ .Release.Namespace }} spec: schedule: "* * * * *" concurrencyPolicy: Replace successfulJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: demo-cron-job image: demoImage imagePullPolicy: IfNotPresent command: - /bin/sh - -c - /usr/bin/curl -k http://restfulservices/api/demo-job restartPolicy: OnFailure
A possible resolution is to restart the entire namespace and redeploy. However, you should check the following details In such a case:
- Was the CronJob scheduling working at one point and doesn’t anymore?
- If any cron pods show a “failed” status and if they do, check those pods for a reason behind the failure.
- Use the following command to see if the CronJob resource has anything in the events:
kubectl describe cronjob demo-cron-job -n tango
- Does the code that the CronJob runs take more than a minute to complete? In that case, the schedule is too congested and needs loosening.
- The CronJob controller has built-in restrictions like freezing a job and not scheduling it anymore if it misses scheduling it more than 100 times. Check for this and other restrictions, which you can find here.
- Do you scale the cluster down when it isn’t in use?
- Are there any third-party webhooks or plugins installed in the cluster? Such webhooks can interfere with pod creation.
- Does the namespace have any jobs created? Use the following command to check:
kubectl get jobs -n tango
If there are several job objects, investigate to see why they didn’t generate pods.
Kubernetes CronJob Stops Scheduling Jobs
This error arises when a CronJob stops scheduling the specified job. It is a common error to face when some part of the job fails consistently for a few times.
Sample Scenario
The user scheduled a CronJob that was functioning for some time before it stopped scheduling new jobs. The Job involved a step where it had to pull a container image and failed. Their manifest is shown below:
apiVersion: batch/v1beta1 kind: CronJob metadata: labels: app.kubernetes.io/instance: demo-cron-job app.kubernetes.io/managed-by: Tiller app.kubernetes.io/name: cron helm.sh/chart: cron-0.1.0 name: demo-cron-job spec: concurrencyPolicy: Forbid failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: template: spec: containers: - args: - -c - npm run script command: - /bin/sh env: image: imagePullPolicy: Always name: cron resources: {} securityContext: runAsUser: 1000 terminationMessagePath: /dev/demo-termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 schedule: 0/30 * * * * successfulJobsHistoryLimit: 3 suspend: false status: {}
Here, the spec.restartPolicy
specification is set to Never
. Hence, the entire Pod fails whenever a container in the Pod fails. However, the manifest doesn’t include the .spec.backoffLimit
field that specifies how many times the Job will retry before being considered a failed Job. Then, it resorts to the default value, which is 6. Hence, the Job here tries to pull the container image six times before considering it a failed job.
Here are some possible resolutions:
- Specify the
.spec.backoffLimit
field and set a high value - Set the
spec.restartPolicy
toonFailure
so the Pod stays on the node and only the failing container reruns - Consider setting the
imagePullPolicy
toifNotPresent
. This won’t force re-pulling the image on every job start, unless images are retagged
Error Status on Kubernetes Cron Job with Connection Refused
This error arises when the CronJob involves communicating with an API endpoint. If the endpoint doesn’t respond successfully, the job shows an error status.
Sample Scenario
The user has a cron job that hits a REST API’s endpoint to pull an image of the concerned application. The manifest of the job is as follows:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: demo-cronjob labels: app: {{ .Release.Name }} chart: {{ .Chart.Name }}-{{ .Chart.Version }} release: {{ .Release.Name }} spec: concurrencyPolicy: Forbid successfulJobsHistoryLimit: 2 failedJobsHistoryLimit: 2 startingDeadlineSeconds: 1800 jobTemplate: spec: template: metadata: name: demo-cronjob labels: app: demo spec: restartPolicy: OnFailure containers: - name: demo image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" command: ["/bin/sh", "-c", "curl http://localhost:8080/demo"] readinessProbe: httpGet: path: "/demojob" port: 8081 initialDelaySeconds: 300 periodSeconds: 60 timeoutSeconds: 30 failureThreshold: 3 livenessProbe: httpGet: path: "/demojob" port: 8081 initialDelaySeconds: 300 periodSeconds: 60 timeoutSeconds: 30 failureThreshold: 3 resources: requests: cpu: 200m memory: 4Gi limits: cpu: 1 memory: 8Gi schedule: "*/40 * * * *"
The user then faces the following error:
curl: (7) Failed to connect to localhost port 8080: Connection refused
Here, the issue is that the user has provided command
and arguments which override the container image’s commands and arguments. The command here overrides the default entry point of the container, and no application starts. A possible resolution is to use a bash script that first sets up and runs the REST application, and then the Job can communicate with its endpoint.
Kubernetes Troubleshooting with Komodor
The troubleshooting process in Kubernetes is complex and, without the right tools, can be stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.
Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:
- Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
- In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
- Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
- Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
reference:
Running Automated Tasks with a CronJob | Kubernetes
Concurrency Policy
The .spec.concurrencyPolicy
field is also optional. It specifies how to treat concurrent executions of a job that is created by this cron job. The spec may specify only one of the following concurrency policies:
Allow
(default): The cron job allows concurrently running jobsForbid
: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn't finished yet, the cron job skips the new job runReplace
: If it is time for a new job run and the previous job run hasn't finished yet, the cron job replaces the currently running job run with a new job run
Note that concurrency policy only applies to the jobs created by the same cron job. If there are multiple cron jobs, their respective jobs are always allowed to run concurrently.