Wed, 22 Jun 2005

High-Availability SMTP with UCARP on Debian

At customer request, we're going to start offering outbound SMTP service to Postica customers. Doing so requires a much greater guarantee of availability than is required when only accepting mail from other MTAs. MTAs are able to use multiple MX records when attempting to deliver mail, and will queue mail if none of the MX hosts are available. MUAs, on the other hand, can generally only be configured with a single hostname to use as the SMTP server for outbound mail, and tend to show the user an unpleasant error message if there is a problem connecting to the SMTP server.

To provide high-availability, load-balanced SMTP service, I decided to use round-robin DNS in combination with CARP, the UCARP implementation specifically. CARP is a protocol for supporting failover of an IP address, very similar to VRRP.

I installed the Debian ucarp package on two servers. Each server is the preferred server for one ucarp-managed IP address and the backup for the other; points to both addresses. I also installed the iputils-arping package which is used to send gratuitous arps when the IP address moves to a new server thus causing the MAC address to change. Note that the arping program in the iputils-arping package is different than the one in the arping package.

I added two up options to /etc/network/interfaces on each server to start one ucarp process for each IP address when the physical interface to which the ucarp addresses are bound is brought up.

auto eth0
iface eth0 inet static
   up ucarp -i eth0 -s -v 201 -p secretPassword -a \
     --upscript=/etc/ucarp/ --downscript=/etc/ucarp/ -P \
     -z -k 10 --daemonize
   up ucarp -i eth0 -s -v 202 -p secretPassword -a \
     --upscript=/etc/ucarp/ --downscript=/etc/ucarp/ -P \
     -z -k 0 --daemonize
   down pkill ucarp

The interfaces file is essentially the same on the second server, but the values of -k arguments, the advertisement skew which determines priority, are swapped. If you were running ucarp on multiple interfaces, you probably wouldn't want to kill all ucarp processes when bringing an interface down; you might want to use start-stop-daemon with --make-pidfile and --background instead of using ucarp's --daemonize option.

The --upscript and --downscript arguments tell ucarp what scripts to run when taking over or releasing an IP address, respectively. Here's an example of each:
#! /bin/sh
exec 2> /dev/null

/sbin/ip addr add dev "$1"
start-stop-daemon --start --pidfile /var/run/ucarp-arping. \
  --make-pidfile --background --exec /usr/sbin/arping -- -q -U
#! /bin/sh
exec 2> /dev/null

/sbin/ip addr del dev "$1"
start-stop-daemon --stop --pidfile /var/run/ucarp-arping. \
  --exec /usr/sbin/arping
rm /var/run/ucarp-arping.

In theory, it should only be necessary to send a single (or maybe a couple) gratuitous arp. I had a problem when using vrrpd, though, in which the backup host would briefly become the master, the arp table on the router would get updated with the MAC address of the new master, then it would go back to being backup. During this period, the other host would think it was the master the entire time, and so would not send any arp updates making the IP address unreachable until the router's arp table was updated. I don't know if this could occur using CARP, but I prefer to play it safe and have the master continue to send unsolicited arps by using start-stop-daemon to spawn a long-running arping process.

In summary, round-robin DNS is used to balance the load across the two servers, and in the event that one of the servers goes down, both IP addresses will be handled by a single server.

tech » mail | Permanent Link

The state is that great fiction by which everyone tries to live at the expense of everyone else. - Frederic Bastiat