Drawing inspiration from the replication chapter of Jeremy and Derek's MySQL book, I recently set up monitoring to measure how far a MySQL slave is behind the master.
I created a simple table to store a timestamp.
CREATE TABLE `heartbeat` ( `timestamp` datetime NOT NULL default '0000-00-00 00:00:00' ); INSERT INTO heartbeat SET timestamp = NOW();
The master updates the timestamp every second using a simple PHP script called
write_mysql_heartbeat.php.
<?php
$dbHost = 'dbhost';
$dbUser = 'mysql_admin';
$dbPass = 'PASSWORD';
$dbName = 'mysql_admin';
$conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database');
mysql_select_db($dbName) or die('Failed to select database: ' . $dbName);
define_syslog_variables();
openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON);
$sql = 'UPDATE heartbeat SET timestamp = now()';
while (true) {
while (mysql_ping($conn) === false) {
syslog(LOG_WARNING, 'Lost connection to database. Will retry in 30 seconds.');
sleep(30);
}
if (! mysql_query($sql, $conn)) {
syslog(LOG_ERR, 'Failed to insert record: ' . mysql_error());
die();
}
sleep(1);
}
I have restartd make
sure that write_mysql_heartbeat.php keeps running with the following entry in
/etc/restartd.conf:
write_mysql_heartbeat write_mysql_heartbeat "php4 -q /usr/local/bin/write_mysql_heartbeat.php"
On the slave side, I wrote a simple munin plugin to check how far the slave is behind the master. Munin is also configured to send alerts through nagios if the slave gets too far behind.
Here's /etc/munin/plugins/mysql_slave_delay:
#!/usr/bin/php4
<?php
$dbHost = 'slave-1';
$dbUser = 'mysql_admin';
$dbPass = 'PASSWORD';
$dbName = 'mysql_admin';
if ($argv[1] == 'config') {
# The host name this plugin is for. (Can be overridden to have
# one machine answer for several)
# The title of the graph
echo "graph_title MySQL Slave Delay\n";
# Arguments to "rrdtool graph". In this case, tell it that the
# lower limit of the graph is '0', and that 1k=1000 (not 1024)
echo "graph_args --base 1000 -l 0\n";
# The Y-axis label
echo "graph_vlabel seconds\n";
# We want Cur/Min/Avg/Max unscaled (i.e. 0.42 seconds instead of
# 420 milliseconds)
echo "graph_scale no\n";
# Graph category. Defaults to 'other'
echo "graph_category mysql\n";
# The fields. "label" is used in the legend. "label" is the only
# required subfield.
echo "seconds.label seconds\n";
# These two are optional. They are only used if you have
# configured your munin to tell a Nagios-server about any
# problems
echo "seconds.warning 30\n";
echo "seconds.critical 120\n";
# This one is purely to add an explanation to the web page. The first
# one is for the graph itself, while the second one is for the field
# "seconds".
echo "graph_info This graph shows how many seconds this MySQL slave is behind the master\n";
echo "seconds.info Number of seconds behind master\n";
# Last, if run with the "config"-parameter, quit here (don't
# display any data)
exit;
}
$conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database');
mysql_select_db($dbName) or die('Failed to select database: ' . $dbName);
define_syslog_variables();
openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON);
$sql = 'SELECT UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(timestamp) from heartbeat';
if (false === ($result = mysql_query($sql, $conn))) {
syslog(LOG_ERR, 'Failed to get record: ' . mysql_error());
die();
}
if (mysql_num_rows($result) != 1) {
syslog(LOG_ERR, 'Failed to find record for heartbeat: ' . $hostname);
die();
}
list($delay) = mysql_fetch_array($result);
echo "seconds.value $delay\n";

The slave is falling behind while backups are running. I still need to do
something about that.

The state is that great fiction by which everyone tries to live at the expense of everyone else. - Frederic Bastiat