Drawing inspiration from the replication chapter of Jeremy and Derek's MySQL book, I recently set up monitoring to measure how far a MySQL slave is behind the master.
I created a simple table to store a timestamp.
CREATE TABLE `heartbeat` ( `timestamp` datetime NOT NULL default '0000-00-00 00:00:00' ); INSERT INTO heartbeat SET timestamp = NOW();
The master updates the timestamp every second using a simple PHP script called
write_mysql_heartbeat.php
.
<?php $dbHost = 'dbhost'; $dbUser = 'mysql_admin'; $dbPass = 'PASSWORD'; $dbName = 'mysql_admin'; $conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database'); mysql_select_db($dbName) or die('Failed to select database: ' . $dbName); define_syslog_variables(); openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON); $sql = 'UPDATE heartbeat SET timestamp = now()'; while (true) { while (mysql_ping($conn) === false) { syslog(LOG_WARNING, 'Lost connection to database. Will retry in 30 seconds.'); sleep(30); } if (! mysql_query($sql, $conn)) { syslog(LOG_ERR, 'Failed to insert record: ' . mysql_error()); die(); } sleep(1); }
I have restartd make
sure that write_mysql_heartbeat.php keeps running with the following entry in
/etc/restartd.conf:
write_mysql_heartbeat write_mysql_heartbeat "php4 -q /usr/local/bin/write_mysql_heartbeat.php"
On the slave side, I wrote a simple munin plugin to check how far the slave is behind the master. Munin is also configured to send alerts through nagios if the slave gets too far behind.
Here's /etc/munin/plugins/mysql_slave_delay:
#!/usr/bin/php4 <?php $dbHost = 'slave-1'; $dbUser = 'mysql_admin'; $dbPass = 'PASSWORD'; $dbName = 'mysql_admin'; if ($argv[1] == 'config') { # The host name this plugin is for. (Can be overridden to have # one machine answer for several) # The title of the graph echo "graph_title MySQL Slave Delay\n"; # Arguments to "rrdtool graph". In this case, tell it that the # lower limit of the graph is '0', and that 1k=1000 (not 1024) echo "graph_args --base 1000 -l 0\n"; # The Y-axis label echo "graph_vlabel seconds\n"; # We want Cur/Min/Avg/Max unscaled (i.e. 0.42 seconds instead of # 420 milliseconds) echo "graph_scale no\n"; # Graph category. Defaults to 'other' echo "graph_category mysql\n"; # The fields. "label" is used in the legend. "label" is the only # required subfield. echo "seconds.label seconds\n"; # These two are optional. They are only used if you have # configured your munin to tell a Nagios-server about any # problems echo "seconds.warning 30\n"; echo "seconds.critical 120\n"; # This one is purely to add an explanation to the web page. The first # one is for the graph itself, while the second one is for the field # "seconds". echo "graph_info This graph shows how many seconds this MySQL slave is behind the master\n"; echo "seconds.info Number of seconds behind master\n"; # Last, if run with the "config"-parameter, quit here (don't # display any data) exit; } $conn = mysql_connect($dbHost, $dbUser, $dbPass) or die('Failed to connect to database'); mysql_select_db($dbName) or die('Failed to select database: ' . $dbName); define_syslog_variables(); openlog('mysql_heartbeat', LOG_PERROR | LOG_PID, LOG_DAEMON); $sql = 'SELECT UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(timestamp) from heartbeat'; if (false === ($result = mysql_query($sql, $conn))) { syslog(LOG_ERR, 'Failed to get record: ' . mysql_error()); die(); } if (mysql_num_rows($result) != 1) { syslog(LOG_ERR, 'Failed to find record for heartbeat: ' . $hostname); die(); } list($delay) = mysql_fetch_array($result); echo "seconds.value $delay\n";
The slave is falling behind while backups are running. I still need to do
something about that.
The state is that great fiction by which everyone tries to live at the expense of everyone else. - Frederic Bastiat