Debugging cron jobs

The cron scheduler that runs your jobs is battle-tested and reliable, but it can be challenging to get new jobs running, and inscrutable when old jobs start failing.
This guide will show you how to identify a failing cron job, validate its schedule, and debug common problems.

How to debug your cron jobs

At Cronitor we've been collecting and analyzing data about cron job failures since 2014. Some jobs do fail in their own unique way, but we've observed common patterns that will help you debug and fix your cron jobs most of the time.

The rest of this guide is divided into two sections that focus on debugging new cron jobs and fixing old jobs that start to fail, covering common causes and their suggested solutions.

If you're adding a new cron job and it is not working, this guide covers:

Verify your cron schedule
Cron job schedule expressions are quirky and difficult to write. If your job didn't run when you expected it to, the easiest thing to rule out is a mistake with the cron expression.
Understand cron environment differences
A common experience is to have a job that works flawlessly when run at the command line but fails whenever it's run by cron. When this happens, check for these common issues:
1. The command has an unresovable relative path like ../scripts. (Try an absolute path)
2. The job uses environment variables. (Cron does not load .bashrc and similar files)
3. The command uses advanced bash features (cron uses /bin/sh by default)
Tip: Our free software, CronitorCLI, includes a shell command to test run any job the way cron does.
Check for problems with permissions
Invalid permissions can cause your cron jobs to fail in at least 3 ways. This guide will cover them all.

If an old cron job has stopped working, this guide explores:

Check your cron status
Figure out how to check if your cron job is running at all, and diagnose common errors using the cron log.
Something is consuming all system resources
Over a long enough time there is a lot that can go wrong on a server and the most stable cron job is no match for a disk that is full or an OS that can't spawn new threads. Check all the usual graphs to rule this out.
You've reached an inflection point
Cron jobs are often used for batch processing and other data-intensive tasks that can reveal the constraints of your stack. Jobs often work fine until your data size grows to a point where queries start timing-out or file transfers are too slow.
Infrastructure drift occurs
When app configuration and code changes are deployed it can be easy to overlook the cron jobs on each server. This causes infrastructure drift where hosts are retired or credentials change that break the forgotten cron jobs.
Jobs have begun to overlap themselves
Cron is a very simple scheduler that starts a job at the scheduled time, even if the previous invocation is still running. A small slow-down can lead to a pile-up of overlapped jobs sucking up available resources.
You've added a new bug in your code, or triggered an old one
Sometimes a failure has nothing to do with cron. It can be difficult to thoroughly test cron jobs in a development environment and a bug might exist only in production.

Want alerts if your cron jobs stop working?

Monitor your cron jobs with Cronitor to easily collect output, capture errors and alert you when something goes wrong.

How to fix a cron job that is not running when expected

When you suspect that a cron job is not running when you expect, you my find you have very little hard evidence in the form of log entries or stack traces to guide your debugging. This section will cover the steps to methodically locate your job and diagnose the problem.

1. Locate the scheduled job

Cron jobs are run by a system daemon called crond that watches several locations where crontab files can contain scheduled jobs. The first step to understanding why your job didn't start when expected is to find where your job is scheduled. Tip: If you know where your job is scheduled, skip this step

Search manually for cron jobs on your server

Check your user crontab with crontab -l

dev01: ~ $ crontab -l
# Edit this file to introduce tasks to be run by cron.
# m h  dom mon dow   command
5 4 * * *      /var/cronitor/bin/database-backup.sh

Jobs are commonly created by adding a crontab file in /etc/cron.d/
System-level cron jobs can also be added as a line in /etc/crontab
Sometimes for easy scheduling, jobs are added to /etc/cron.hourly/, /etc/cron.daily/, /etc/cron.weekly/ or /etc/cron.monthly/
It's possible that the job was created in the crontab of another user. Go through each user's crontab using crontab -u username -l
For a complete walk through of these options, see our guide covering where cron jobs are saved

Or, scan for cron jobs automatically with CronitorCLI

Install CronitorCLI for free. Paste each instruction into a terminal and execute:

wget https://cronitor.io/dl/linux_amd64.tar.gz
sudo tar xvf linux_amd64.tar.gz -C /usr/local/bin/
sudo cronitor configure --api-key {{ api_key }} # optional, for Cronitor users

Run cronitor list to scan your system for cron jobs:

If you can't find your job but believe it was previously scheduled double check that you are on the correct server.

If you know you are, then try to rule-out if the job was once scheduled but accidentally deleted. In many systems, crontab files are controlled by a central configuration service like Ansible, and this might overwrite crontab files that have been directly edited. Another common mistake when working with crontab files is to mistype crontab -r when you meant to type crontab -e. This one character difference, one key apart on most keyboards, will delete the crontab without requiring a confirmation prompt.

2. Validate your job schedule

Once you have found your job, verify that it's scheduled correctly. Cron schedules are used widely because they are expressive and powerful, but like regular expressions they are difficult to read. We suggest using Crontab Guru to validate your schedule.

Paste the schedule expression from your crontab into the text field on Crontab Guru
Verify that the plaintext translation of your schedule is correct, and that the next scheduled execution times match your expectations
Check that the effective server timezone matches your expectation. In addition to checking system time using date, check your crontab file for TZ or CRON_TZ timezone declarations that would override system settings. For example, CRON_TZ=America/New_York

3. Check your permissions

Invalid permissions can cause your cron jobs to fail in at least 3 ways:

Jobs added as files in a /etc/cron.*/ directory must be owned by root. Files owned by other users will be ignored, and you may see a message similar to WRONG FILE OWNER in your syslog.
The command must be executable by the user that cron is running your job as. For example if your ubuntu user crontab invokes a script like database-backup.sh, ubuntu must have permission to execute the script. The most direct way is to ensure that the ubuntu user owns the file and then ensure execute permissions are available using chmod +x database-backup.sh.
The user account must be allowed to use cron. First, if a /etc/cron.allow file exists, the user must be listed. Separately, the user cannot be in a /etc/cron.deny list.

Related to the permissions problem, ensure that if your command string contains a % that it is escaped with a backslash.

4. Check that your cron job is running by finding the attempted execution in your logs

When a command is run on schedule, cron will write the activity to a log file. By grepping the log for the name of the command you found in a crontab file you can validate your job and see that it's scheduled correctly and cron is running. If you're unfamiliar with some of these concepts, head over our guide on checking if a cron job is running for more detailed step-by-step instructions.

Begin by grepping for the command (on this ubuntu server, in /var/log/syslog) You will probably need root or sudo access, and be aware of log rotation activity.
```
dev01: ~ $ grep database-backup.sh /var/log/syslog
Aug  5 4:05:01 dev01 CRON[2128]: (ubuntu) CMD (/var/cronitor/bin/database-backup.sh)
```
If you can't find your command in the syslog it could be that the log has been rotated or cleared since your job ran. If possible, rule that out by updating the job to run every minute by changing its schedule to * * * * *.
If your command doesn't appear as an entry in syslog within 2 minutes the problem could be with the underlying cron daemon known as crond. Rule this out quickly by verifying that cron is running by looking up its process ID. If cron is not running no process ID will be returned.
```
dev01: ~ $ pgrep cron
323
```
If you've located your job in a crontab file but persistently cannot find it referenced in syslog, double check that crond has correctly loaded your crontab file. The easiest way to do this is to force a reparse of your crontab by running EDITOR=true crontab -e from your command prompt. If everything is up to date you will see a message like No modification made. Any other message indicates that your crontab file was not reloaded after a previous update but has now been updated. This will also ensure that your crontab file is free of syntax errors.

If you can see in syslog that your job was scheduled and attempted to run correctly but still did not produce the expected result you can assume there is a problem with the command you are trying to run.

How to debug unexpected cron job failures

If you've discovered that a cron job is failing that was previously running normally, the right question to ask is "what has changed". This section will show you how techniques for identifying common problems and re-creating the failure.

1. Test run your command like cron does

When cron runs your command the environment is different from your normal command prompt in subtle but important ways. The first step to troubleshooting is to simulate the cron environment and run your command in an interactive shell.

Run any command like cron does with CronitorCLI

Install CronitorCLI for free. Paste each instruction into a terminal and execute:

wget https://cronitor.io/dl/linux_amd64.tar.gz
sudo tar xvf linux_amd64.tar.gz -C /usr/local/bin/
sudo cronitor configure --api-key {{ api_key }} # optional, for Cronitor users

To force a scheduled cron job to run immediately, use cronitor select to scan your system and present a list of jobs to choose from.
To simulate running any command the way cron does, use cronitor shell:

Or, manually test run a command like cron does

If you are parsing a file in /etc/cron.d/ or /etc/crontab each line is allowed to have an effective "run as" username after the schedule and before the command itself. If this applies to your job, or if your job is in another user's crontab, begin by opening a bash prompt as that user sudo -u username bash
By default, cron will run your command using /bin/sh, not the bash or zsh prompt you are familiar with. Double check your crontab file for an optional SHELL=/bin/bash declaration. If using the default /bin/sh shell, certain features that work in bash like [[command]] syntax will cause syntax errors under cron.
Unlike your interactive shell, cron doesn't load your bashrc or bash_profile so any environment variables defined there are missing in your cron jobs. This is true even if you have a SHELL=/bin/bash declaration. To simulate this, create a command prompt with a clean environment.
```
dev01: ~ $ env -i /bin/sh
```
By default, cron will run commands with your home directory as the current working directory. To ensure you are running the command like cron does, run cd ~ at your prompt. For a step-by-step guide to determine the right directory to use, see our guide on understanding the crontab working directory.
Paste the command to run (everything after the schedule or declared username) into the command prompt. If crontab is unable to run your command, this should fail too and will hopefully contain a useful error message. Common errors include invalid permissions, command not found, and command line syntax errors.
For a step-by-step walk through, see our guide on how to run a command like cron does

If you can reproduce the failure, you might be given clues in the form of error messages or exit codes that can help you diagnose the problem. If no useful error message is given, double check any application logs your job is expected to produce, and ensure that you are not redirecting log and error messages. In linux, command >> /path/to/file will redirect console log messages to the specified file and command >> /path/to/file 2>&1 will redirect both the console log and error messages. Determine if your command has a verbose output or debug log flag that can be added to see additional details at runtime. Ideally, if your job is failing under cron it will fail here too, and you will see a useful error message that explains the failure. Common errors include file not found, misconfigured permissions, and commandline syntax errors.

2. Check for overlapping jobs

At Cronitor our data shows that runtime durations increase over time for a large percentage of cron jobs. As your dataset or userbase grows it's normal to find yourself in a situation where cron starts an instance of your job before the previous one has finished. Depending on the nature of your job this might not be a problem, but several undesired side effects are possible:

Unexpected server or database load could impact other users
Locking of shared resources could cause deadlocks and prevent your jobs from ever completing successfully

Creation of an unanticipated race condition that might result in records being processed multiple times, amplifying server load and possibly impacting customers

To verify if any instances of a job are running on your server presently, grep your process list. In this example, 3 job invocations are running simultaneously:

dev01: ~ $ ps aux | grep database-backup.sh
ubuntu           1343   0.0  0.1  2585948  12004   ??  S    31Jul18   1:04.15 /var/cronitor/bin/database-backup.sh
ubuntu           3659   0.0  0.1  2544664    952   ??  S     1Aug18   0:34.35 /var/cronitor/bin/database-backup.sh
ubuntu           7309   0.0  0.1  2544664   8012   ??  S     2Aug18   0:18.01 /var/cronitor/bin/database-backup.sh

To quickly recover from this, first kill the overlapping jobs and then watch closely when your command is next scheduled to run. It's possible that a one time failure cascaded into several overlapping instances. If it becomes clear that the job often takes longer than the interval between job invocations you may need to take additional steps, e.g.:

Increase the duration between invocations of your job. For example if your job runs every minute now, consider running it every other minute.
Use a tool like flock to ensure that only a single instance of your command is running at any given time. Using flock is easy. After installing from apt-get or yum you only need to prefix the command in your crontab:
```
dev01: ~ $ crontab -l
# Edit this file to introduce tasks to be run by cron.
# m h  dom mon dow   command
\* \* \* \* \*      flock -w 0 /var/cronitor/bin/database-backup.sh
```

Read more about flock from the man page.

What to do if nothing else works

Here are a few things you can try if you've followed this guide and find that your job works flawlessly when run from the cron-like command prompt but fails to complete successful under crontab.

First get the most basic cron job working with a command like date >> /tmp/cronlog. This command will simply echo the execution time to the log file each time it runs. Schedule this to run every minute and tail the logfile for results.
If your basic command works, replace it with your command. As a sanity check, verify if it works.
If your command works by invoking a runtime like python some-command.py perform a few checks to determine that the runtime version and environment is correct. Each language runtime has quirks that can cause unexpected behavior under crontab.
- For python you might find that your web app is using a virtual environment you need to invoke in your crontab.
- For node a common problem is falling back to a much older version bundled with the distribution.
- When using php you might run into the issue that custom settings or extensions that work in your web app are not working under cron or commandline because a different php.ini is loaded.
If you are using a runtime to invoke your command double check that it's pointing to the expected version from within the cron environment.
If nothing else works, restart the cron daemon and hope it helps, stranger things have happened. The way to do this varies from distro to distro so it's best to ask google.

How to debug your cron jobs

If you're adding a new cron job and it is not working, this guide covers:

Verify your cron schedule

Understand cron environment differences

Check for problems with permissions

If an old cron job has stopped working, this guide explores:

Check your cron status

Something is consuming all system resources

You've reached an inflection point

Infrastructure drift occurs

Jobs have begun to overlap themselves

You've added a new bug in your code, or triggered an old one