Cron job troubleshooting guide

Cron has been the standard job scheduler on unix and unix-like systems since the 1970s. It is widely used and usually very reliable. This guide will walk you through locating cron jobs on your server, validating their schedules, and troubleshooting why they fail to start or error unexpectedly.

Why cron jobs fail

  • There are subtle environment differences

    Even if you're setting up a cron job on a server where you've already deployed your application you might find that differences in the unix shell and execution context make it painful to get your job working.

  • Crontab syntax errors are easy to make

    Some schedule expressions, while widely accepted, are not standard and might not always work as expected.

  • System resources have been depleted

    The most stable cron job is no match for a disk that is full or an OS that can't spawn new threads.

  • Jobs live outside your main application

    When app configuration and code changes are deployed it can be easy to overlook the cron jobs on each server.

  • Batch jobs have a different performance profile

    Cron jobs are often used for batch processing, event sourcing and other data intensive tasks that can reveal the constraints of your stack. Jobs may work fine until your data size grows to a point where a new bottleneck is reached.

  • Cron jobs can feel invisible and small failures remain unseen

    Once created, cron jobs are out of sight and small or intermittent failures can go unnoticed for a long time. We created Cronitor to solve the observability problems of cron jobs and give you instant alerts when things go wrong.

  • Cron is a simple scheduler that doesn't do much more than start your job on time

    Cron will not retry failed jobs, it will not care if jobs begin overlapping themselves, and it will not alert you any differently of job failure than it would job success.

What to do if your cron job doesn't start

  1. Locate the scheduled job

    Cron jobs are run by a system daemon called crond that watches several locations for scheduled jobs. The first step to understanding why your job didn't start when expected is to find where your job is scheduled.

    • Check your user crontab with crontab -l
      dev01: ~ $ crontab -l
      # Edit this file to introduce tasks to be run by cron.
      # m h  dom mon dow   command
      5 4 * * *      /var/cronitor/bin/database-backup.sh
    • Jobs are commonly created by adding a crontab file in /etc/cron.d/ (editors note: this is the best practice)
    • System-level cron jobs can also be added as a line in /etc/crontab
    • Sometimes for easy scheduling, jobs are added to /etc/cron.hourly/, /etc/cron.daily/, /etc/cron.weekly/ or /etc/cron.monthly/
    • It's possible that the job was created in the crontab of another user. Go through each user's crontab using crontab -u username -l
    • If you can't find your job but believe it was previously scheduled reconsider if you are on the correct server.
  2. Validate your job schedule

    Once you have found your job, verify that it's scheduled correctly. Cron schedules are commonly used because they are expressive and powerful, but like regular expressions can sometimes be difficult to read. We suggest using Crontab Guru to validate your schedule.

    • Paste the schedule expression from your crontab into the text field on Crontab Guru
    • Verify that the plaintext translation of your schedule is correct, and that the next scheduled execution times match your expectations
    • Check that the effective server timezone matches your expectation. In addition to checking system time using date, check your crontab file for TZ or CRON_TZ timezone declarations that would override system settings. For example, CRON_TZ=America/New_York
  3. Check your permissions

    Invalid permissions can cause your cron jobs to fail in at least 3 ways:

    1. Jobs added as files in a /etc/cron.*/ directory must be owned by root. Files owned by other users will be ignored and you may see a message similar to WRONG FILE OWNER in your syslog.
    2. The command must be executable by the user that cron is running your job as. For example if your ubuntu user crontab invokes a script like database-backup.sh, ubuntu must have permission to execute the script. The most direct way is to ensure that the ubuntu user owns the file and then ensure execute permissions are available using chmod +x database-backup.sh.
    3. The user account must be allowed to use cron. First, if a /etc/cron.allow file exists, the user must be listed. Separately, the user cannot be in a /etc/cron.deny list.

    Related to the permissions problem, ensure that if your command string contains a % that it is escaped with a backslash.

  4. Find the attempted execution in syslog

    When cron attempts to run a command, it logs it in syslog. By grepping syslog for the name of the command you found in a crontab file you can validate that your job is scheduled correctly and cron is running.

    • Begin by grepping for the command in /var/log/syslog (You will probably need root or sudo access.)
      dev01: ~ $ grep database-backup.sh /var/log/syslog
      Aug  5 4:05:01 dev01 CRON[2128]: (ubuntu) CMD (/var/cronitor/bin/database-backup.sh)
    • If you can't find your command in the syslog it could be that the log has been rotated or cleared since your job ran. If possible, rule that out by updating the job to run every minute by changing its schedule to * * * * *.
    • If your command doesn't appear as an entry in syslog within 2 minutes the problem could be with the underlying cron daemon known as crond. Rule this out quickly by verifying that cron is running by looking up its process ID. If cron is not running no process ID will be returned.
      dev01: ~ $ pgrep cron
      323
    • If you've located your job in a crontab file but persistently cannot find it referenced in syslog, double check that crond has correctly loaded your crontab file. The easiest way to do this is to force a reparse of your crontab by running EDITOR=true crontab -e from your command prompt. If everything is up to date you will see a message like No modification made. Any other message indicates that your crontab file was not reloaded after a previous update but has now been updated. This will also ensure that your crontab file is free of syntax errors.

If you can see in syslog that your job was scheduled and attempted to run correctly but still did not produce the expected result you can assume there is a problem with the command you are trying to run.

What to do if your cron job fails unexpectedly

  1. Run your command like cron does

    When cron runs your command the environment is different from your normal command prompt in subtle but important ways. The first step to troubleshooting is to simulate the cron environment and run your command in an interactive shell.

    • If you are parsing a file in /etc/cron.d/ or /etc/crontab each line is allowed to have an effective "run as" username after the schedule and before the command itself. If this applies to your job, or if your job is in another user's crontab, begin by opening a bash prompt as that user sudo -u username bash
    • By default, cron will run your command using /bin/sh, not the bash or zsh prompt you are familiar with. Double check your crontab file for an optional SHELL=/bin/bash declaration. If using the default /bin/sh shell, certain features that work in bash like [[command]] syntax will cause syntax errors under cron.
    • Unlike your interactive shell, cron doesn't load your bashrc or bash_profile so any environment variables defined there are unavailable in your cron jobs. This is true even if you have a SHELL=/bin/bash declaration. To simulate this, create a command prompt with a clean environment.
      dev01: ~ $ env -i /bin/sh
      $
    • By default, cron will run commands with your home directory as the current working directory. To ensure you are running the command like cron does, run cd ~ at your prompt.
    • Paste the command to run (everything after the schedule or declared username) into the command prompt. If crontab is unable to run your command, this should fail too and will hopefully contain a useful error message. Common errors include invalid permissions, command not found, and command line syntax errors.
    • If no useful error message is available, double check any application logs your job is expected to produce, and ensure that you are not redirecting log and error messages. In linux, command >> /path/to/file will redirect console log messages to the specified file and command >> /path/to/file 2>&1 will redirect both the console log and error messages. Determine if your command has a verbose output or debug log flag that can be added to see additional details at runtime.
    • Ideally, if your job is failing under cron it will fail here too and you will see a useful error message, giving you a chance to modify and debug as needed. Common errors include file not found, misconfigured permissions, and commandline syntax errors.
  2. Check for overlapping jobs

    At Cronitor our data shows that runtime durations increase over time for a large percentage of cron jobs. As your dataset and user base grows it's normal to find yourself in a situation where cron starts an instance of your job before the previous one has finished. Depending on the nature of your job this might not be a problem, but several undesired side effects are possible:

    • Unexpected server or database load could impact other users
    • Locking of shared resources could cause deadlocks and prevent your jobs from ever completing successfully
    • Creation of an unanticipated race condition that might result in records being processed multiple times, amplifying server load and possibly impacting customers

    To verify if any instances of a job are running on your server presently, grep your process list. In this example, 3 job invocations are running simultaneously:

    dev01: ~ $ ps aux | grep database-backup.sh
    ubuntu           1343   0.0  0.1  2585948  12004   ??  S    31Jul18   1:04.15 /var/cronitor/bin/database-backup.sh
    ubuntu           3659   0.0  0.1  2544664    952   ??  S     1Aug18   0:34.35 /var/cronitor/bin/database-backup.sh
    ubuntu           7309   0.0  0.1  2544664   8012   ??  S     2Aug18   0:18.01 /var/cronitor/bin/database-backup.sh

    To quickly recover from this, first kill the overlapping jobs and then watch closely when your command is next scheduled to run. It's possible that a one time failure cascaded into several overlapping instances. If it becomes clear that the job often takes longer than the interval between job invocations you may need to take additional steps, e.g.:

    • Increase the duration between invocations of your job. For example if your job runs every minute now, consider running it every other minute.
    • Use a tool like flock to ensure that only a single instance of your command is running at any given time. Using flock is easy. After installing from apt-get or yum you only need to prefix the command in your crontab:
      dev01: ~ $ crontab -l
      # Edit this file to introduce tasks to be run by cron.
      # m h  dom mon dow   command
      * * * * *      flock -w 0 /var/cronitor/bin/database-backup.sh

      Read more about flock from the man page.

What to do if nothing else works

Here are a few things you can try if you've followed this guide and find that your job works flawlessly when run from the cron-like command prompt but fails to complete successful under crontab.

  • First get the most basic cron job working with a command like date >> /tmp/cronlog. This command will simply echo the execution time to the log file each time it runs. Schedule this to run every minute and tail the logfile for results.

  • If your basic command works, replace it with your command. As a sanity check, verify if it works.

  • If your command works by invoking a runtime like python some-command.py perform a few checks to determine that the runtime version and environment is correct. Each language runtime has quirks that can cause unexpected behavior under crontab.

    • For python you might find that your web app is using a virtual environment you need to invoke in your crontab.
    • For node a common problem is falling back to a much older version bundled with the distribution.
    • When using php you might run into the issue that custom settings or extensions that work in your web app are not working under cron or commandline because a different php.ini is loaded.

    If you are using a runtime to invoke your command double check that it's pointing to the expected version from within the cron environment.

  • If nothing else works, restart the cron daemon and hope it helps, stranger things have happened. The way to do this varies from distro to distro so it's best to ask google.