Monitor MySQL Replication Lag

Configuring a MySQL Replica is a relatively straight-forward process. However, one often overlooked detail is to monitor the replication lag time — how far behind the master database the Replication is. In this tutorial we will use Cronitor's heartbeat monitoring (which provides a simple mechanism to report ad-hoc failures of any type of service) to alert us when the lag between the MySQL replica and master passes a pre-defined threshold, or when replication halts.

What causes MySQL Replication Lag?

Lag time in MySQL replication can be caused by variety of issues, and can often be very difficult to debug. Query based MySQL replication is both single threaded and locks tables just like any other query execution. Thus, long running queries and/or queries that set many logs during execution are the among the most common causes of Replication lag. The Percona team has written an excellent set of articles that explain the topic in more depth:

Bash script to alert when MySQL replication lags

The script below will report failure if the MySQL Replication either stops running or falls more than a minute behind the master in replication. Your monitor can be configured to alert you to failures in a variety of different ways including email, SMS, and via third-party integrations.

The best way to run this script is as a cron job on the server that your Replication is running on. The frequency at which it runs is entirely up to you - we run ours every 5 minutes.

Tip:If you're still maintaining your own MySQL backup and restore scripts, check out SnapShooter's MySQL backup service.