Sun, 11 Sep 2005

Monitoring MySQL Replication Slaves

Drawing inspiration from the replication chapter of Jeremy and Derek's MySQL book, I recently set up monitoring to measure how far a MySQL slave is behind the master.

I created a simple table to store a timestamp.

CREATE TABLE `heartbeat` (
  `timestamp` datetime NOT NULL default '0000-00-00 00:00:00'
);
INSERT INTO heartbeat SET timestamp = NOW();
Master

The master updates the timestamp every second using a simple PHP script called write_mysql_heartbeat.php.

<?php

$dbHost = 'dbhost';
$dbUser = 'mysql_admin';
$dbPass = 'PASSWORD';
$dbName = 'mysql_admin';

$conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database');
mysql_select_db($dbName) or die('Failed to select database: ' . $dbName);

define_syslog_variables();
openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON);

$sql = 'UPDATE heartbeat SET timestamp = now()';
while (true) {
        while (mysql_ping($conn) === false) {
                syslog(LOG_WARNING, 'Lost connection to database.  Will retry in 30 seconds.');
                sleep(30);
        }
        if (! mysql_query($sql, $conn)) {
                syslog(LOG_ERR, 'Failed to insert record: ' . mysql_error());
                die();
        }
        sleep(1);
}

I have restartd make sure that write_mysql_heartbeat.php keeps running with the following entry in /etc/restartd.conf:
write_mysql_heartbeat write_mysql_heartbeat "php4 -q /usr/local/bin/write_mysql_heartbeat.php"

Slave

On the slave side, I wrote a simple munin plugin to check how far the slave is behind the master. Munin is also configured to send alerts through nagios if the slave gets too far behind.

Here's /etc/munin/plugins/mysql_slave_delay:

#!/usr/bin/php4
<?php

$dbHost = 'slave-1';
$dbUser = 'mysql_admin';
$dbPass = 'PASSWORD';
$dbName = 'mysql_admin';

if ($argv[1] == 'config') {
        # The host name this plugin is for. (Can be overridden to have
        # one machine answer for several)

        # The title of the graph
        echo "graph_title MySQL Slave Delay\n";
        # Arguments to "rrdtool graph". In this case, tell it that the
        # lower limit of the graph is '0', and that 1k=1000 (not 1024)
        echo "graph_args --base 1000 -l 0\n";
        # The Y-axis label
        echo "graph_vlabel seconds\n";
        # We want Cur/Min/Avg/Max unscaled (i.e. 0.42 seconds instead of
        # 420 milliseconds)
        echo "graph_scale no\n";
        # Graph category. Defaults to 'other'
        echo "graph_category mysql\n";
        # The fields. "label" is used in the legend. "label" is the only
        # required subfield.
        echo "seconds.label seconds\n";
        # These two are optional. They are only used if you have
        # configured your munin to tell a Nagios-server about any
        # problems
        echo "seconds.warning 30\n";
        echo "seconds.critical 120\n";
        # This one is purely to add an explanation to the web page. The first
        # one is for the graph itself, while the second one is for the field
        # "seconds".
        echo "graph_info This graph shows how many seconds this MySQL slave is behind the master\n";
        echo "seconds.info Number of seconds behind master\n";

        # Last, if run with the "config"-parameter, quit here (don't
        # display any data)
        exit;
}

$conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database');
mysql_select_db($dbName) or die('Failed to select database: ' . $dbName);

define_syslog_variables();
openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON);

$sql = 'SELECT UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(timestamp) from heartbeat';
if (false === ($result = mysql_query($sql, $conn))) {
        syslog(LOG_ERR, 'Failed to get record: ' . mysql_error());
        die();
}

if (mysql_num_rows($result) != 1) {
        syslog(LOG_ERR, 'Failed to find record for heartbeat: ' . $hostname);
        die();
}
list($delay) = mysql_fetch_array($result);

echo "seconds.value $delay\n";
Output



The slave is falling behind while backups are running. I still need to do something about that.

tech | Comments | Permanent Link

The state is that great fiction by which everyone tries to live at the expense of everyone else. - Frederic Bastiat