Keepalived and high availability: Advanced topics

This article closes out a three-part foundational Keepalived series and covers some advanced high availability concepts.

Posted: April 1, 2020 by Anthony Critelli (Sudoer)

If you read my first article on using Keepalived for managing simple failover in clusters, then you will recall that VRRP uses the concept of a priority when determining which server will be the active master. The server with the highest priority “wins” and will act as the master, holding onto the VIP and servicing requests. Keepalived provides several useful methods to adjust priority based on the state of your system. In this article, you will explore several of these mechanisms, along with Keepalived’s ability to run scripts when a server’s state changes.

I will only be showing the configuration on server1 for these examples. At this point, you are probably comfortable with the configuration needed on server2 if you have been reading the entire series. If not, take a moment to review the first and second articles of this series before continuing.

Network symbols in the diagrams available via VRT Network Equipment Extension, CC BY-SA 3.0.

Keepalived does a great job of triggering a failover when advertisements aren’t received, such as when the active master dies completely or is unreachable for some other reason. However, you will often find that more fine-grained trigger mechanisms are necessary. For example, your application may run its own health checks to determine the ability of the app to service client requests. You wouldn’t want an unhealthy app server to remain the active master just because it was alive and sending VRRP advertisements.

Note: I found that the version of Keepalived available via the standard package repositories contained bugs that prevented some of the below examples from working properly. If you run into issues, you may want to install Keepalived from source, as described in the previous article.

Tracking processes

One of the most common Keepalived setups involves tracking a process on the server to determine the health of the host. For example, you might set up a pair of highly available webservers and trigger a failover if Apache stops running on one of them.

Keepalived makes this easy through its track_process configuration directives. In the example below, I’ve set up Keepalived to watch the httpd process with a weight of 10. As long as httpd is running, the advertised priority will be 254 (244 + 10 = 254). If httpd stops running, then the priority will drop to 244 and trigger a failover (assuming that a similar configuration exists on server2).

server1# cat keepalived.conf
vrrp_track_process track_apache {
      process httpd
      weight 10
}

vrrp_instance VI_1 {
      state MASTER
      interface eth0
      virtual_router_id 51
      priority 244
      advert_int 1
      authentication {
         auth_type PASS
         auth_pass 12345
      }
      virtual_ipaddress {
         192.168.122.200/24
      }
      track_process {
         track_apache
      }
}

With this configuration in place (and Apache installed and running on both servers), you can test out a failover scenario by stopping Apache and watching the VIP move from server1 to server2:

server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64

server1# systemctl stop httpd

server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 fe80::5054:ff:fe82:d66e/64

server2# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.102/24 192.168.122.200/24 fe80::5054:ff:fe04:2c5d/64

Tracking files

Keepalived also has the ability to make priority decisions based on the contents of a file, which can be useful if you’re running an application that can write values to this file. For example, you might have a background process in your app that periodically performs a health check and writes a value to a file based on the overall health of the application.

The Keepalived man page explains that file tracking is based on the configured weight for the file:

“value will be read as a number in text from the file. If the weight configured against the track_file is 0, a non-zero value in the file will be treated as a failure status, and a zero value will be treated as an OK status, otherwise the value will be multiplied by the weight configured in the track_file statement. If the result is less than -253 any VRRP instance or sync group monitoring the script will transition to the fault state (the weight can be 254 to allow for a negative value being read from the file).”

I will keep things simple and use a weight of 1 for the track file in this example. This configuration will take the numerical value in the file at /var/run/my_app/vrrp_track_file and multiply it by 1.

server1# cat keepalived.conf
vrrp_track_file track_app_file {
      file /var/run/my_app/vrrp_track_file
}

vrrp_instance VI_1 {
      state MASTER
      interface eth0
      virtual_router_id 51
      priority 244
      advert_int 1
      authentication {
         auth_type PASS
         auth_pass 12345
      }
      virtual_ipaddress {
         192.168.122.200/24
      }
      track_file {
         track_app_file weight 1
   }
}

You can now create the file with a starting value and restart Keepalived. The priority can be seen in tcpdump output, as discussed in the second article of this series.

server1# mkdir /var/run/my_app
server1# echo 5 > /var/run/my_app/vrrp_track_file
server1# systemctl restart keepalived
server1# tcpdump proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:19:32.191562 IP server1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 249, authtype simple, intvl 1s, length 20

You can see that the advertised priority is 249, which is the value in the file (5) multiplied by the weight (1) and added to the base priority (244). Similarly, adjusting the priority to 6 will increase the priority:

server1# echo 6 > /var/run/my_app/vrrp_track_file
server1# tcpdump proto 112
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:20:43.214940 IP server1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 250, authtype simple, intvl 1s, length 20

Track interface

For servers with multiple interfaces, it can be useful to adjust the priority of the Keepalived instance based on the status of an interface. For example, a load balancer with a frontend VIP and a backend connection to an internal network might want to trigger a Keepalived failover if the connection to the backend network goes down. This can be accomplished with the track_interface configuration:

server1# cat keepalived.conf
vrrp_instance VI_1 {
      state MASTER
      interface eth0
      virtual_router_id 51
      priority 244
      advert_int 1
      authentication {
         auth_type PASS
         auth_pass 12345
      }
      virtual_ipaddress {
         192.168.122.200/24
      }
      track_interface {
         ens9 weight 5
      }
}

The configuration above assigns a weight of 5 to the status of interface ens9. This will cause server1 to assume a priority of 249 (244 + 5 = 249) as long as ens9 is up. If ens9 goes down, then the priority will drop down to 244 (and trigger a failover, assuming that server2 is configured in the same way). You can test this on a multi-interface server by turning down an interface and watching the VIP move between hosts:

server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64
ens9             UP             192.168.122.15/24 fe80::7444:5ec4:8015:722f/64

server1# ip link set ens9 down

server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 fe80::5054:ff:fe82:d66e/64
ens9             DOWN

server2# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
ens9             UP             192.168.122.119/24 fe80::fc9f:8999:b93e:d491/64
eth0             UP             192.168.122.102/24 192.168.122.200/24 fe80::5054:ff:fe04:2c5d/64

Track script

You’ve seen that Keepalived offers plenty of useful built-in check methods for determining the health and subsequent VRRP priority of a host. However, sometimes more complex environments require the use of custom tooling, such as health check scripts, to meet their needs. Thankfully, Keepalived also has the ability to run an arbitrary script to determine the health of a host. You can adjust the weight of the script, but I’m going to keep things simple for this example: a script that returns 0 will indicate success, while a script that returns anything else will indicate that the Keepalived instance should enter the fault state.

The script is a simple ping to everyone’s favorite 8.8.8.8 Google DNS server, as seen below. In your environment, you will likely use a more complex script to perform whatever health checks you need.

server1# cat /usr/local/bin/keepalived_check.sh
#!/bin/bash

/usr/bin/ping -c 1 -W 1 8.8.8.8 > /dev/null 2>&1

You will notice that I used a timeout of 1 second for ping (-W 1). When writing Keepalived check scripts, it’s a good idea to keep them lightweight and fast. You don’t want a broken server staying the master for a long time because your script is slow.

The Keepalived configuration for a check script is shown below:

server1# cat keepalived.conf
vrrp_script keepalived_check {
      script "/usr/local/bin/keepalived_check.sh"
      interval 1
      timeout 5
      rise 3
      fall 3
}

vrrp_instance VI_1 {
      state MASTER
      interface eth0
      virtual_router_id 51
      priority 244
      advert_int 1
      authentication {
         auth_type PASS
         auth_pass 12345
      }
      virtual_ipaddress {
         192.168.122.200/24
      }
      track_script {
         keepalived_check
      }
}

This looks a lot like the configuration that you’ve been working with, but the vrrp_script block has a few unique directives:

interval: How often the script should be run (1 second).
timeout: How long to wait for the script to return (5 seconds).
rise: How many times the script must return successfully in order for the host to be considered “healthy.” In this example, the script must return successfully 3 times. This helps to prevent a “flapping” condition where a single failure (or success) causes the Keepalived state to quickly flip back and forth.
fall: How many times the script must return unsuccessfully (or time out) in order for the host to be considered “unhealthy.” This functions as the reverse of the rise directive.

You can test this configuration by forcing the script to fail. In the example below, I added an iptables rule that prevents communication with 8.8.8.8. This caused the healthcheck to fail and the VIP to disappear after a few seconds. I can then remove the rule and watch the VIP re-appear.

server1# iptables -I OUTPUT -d 8.8.8.8 -j DROP
server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 fe80::5054:ff:fe82:d66e/64

server1# iptables -D OUTPUT -d 8.8.8.8 -j DROP
server1# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             192.168.122.101/24 192.168.122.200/24 fe80::5054:ff:fe82:d66e/64

A quick tip about scripts in Keepalived: They can be run as a different user besides root. While I didn’t demonstrate that in these examples, take a look at the man page and ensure that you’re using the least privileged user possible to avoid any negative security implications from your check script.

Notify scripts

I’ve been discussing ways to trigger Keepalived responses based on external conditions. However, you probably also want to trigger actions when Keepalived transitions from one state to another. For example, you might want to stop a service when Keepalived enters the backup state, or you might want to kick off an email to an administrator. Keepalived allows you to do this with notify scripts.

Keepalived provides several notify directives for only calling scripts on particular states (notify_master, notify_backup, etc), but I’m going to focus on the bare notify directive as it is the most flexible. When a script in the notify directive is called, it receives four additional arguments (after any arguments that are passed to the script itself).

Listed in order, these are:

Group or instance: Indication of whether the notify is triggered by a VRRP group (not discussed in this series) or a particular VRRP instance.
Name of the group or instance
State that the group or instance is transitioning into
The priority

Taking a look at an example makes this more clear. The script and Keepalived configuration looks like this:

server1# cat /usr/local/bin/keepalived_notify.sh
#!/bin/bash

echo "$1 $2 has transitioned to the $3 state with a priority of $4" > /var/run/keepalived_status

server1# cat keepalived.conf
vrrp_script keepalived_check {
      script "/usr/local/bin/keepalived_check.sh"
      interval 1
      timeout 5
      rise 3
      fall 3
}

vrrp_instance VI_1 {
      state MASTER
      interface eth0
      virtual_router_id 51
      priority 244
      advert_int 1
      authentication {
         auth_type PASS
         auth_pass 12345
      }
      virtual_ipaddress {
         192.168.122.200/24
      }
      track_script {
         keepalived_check
      }
      notify "/usr/local/bin/keepalived_notify.sh"
}

The above configuration will call the /usr/local/bin/keepalived_notify.sh script each time a Keepalived state transition occurs. Since the same check script is in place, you can easily inspect the initial state and then trigger a transition:

server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the MASTER state with a priority of 244

server1# iptables -A OUTPUT -d 8.8.8.8 -j DROP
server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the FAULT state with a priority of 244

server1# iptables -D OUTPUT -d 8.8.8.8 -j DROP
server1# cat /var/run/keepalived_status
INSTANCE VI_1 has transitioned to the MASTER state with a priority of 244

You can see that the command line arguments correspond to the ones that I described at the beginning of this section. Obviously this is a simple example, but notify scripts can perform plenty of complex actions, such as adjusting routing rules or triggering other scripts. They’re a useful way to take external actions based on Keepalived state changes.

Wrapping up

This article closed out a foundational Keepalived series with some advanced concepts. You learned how to trigger Keepalived priority and state changes based on external events, such as process status, interface changes, and even the results of external scripts. You also learned how to trigger notify scripts in response to Keepalived state changes. You can combine two or more of these approaches to build a highly available pair of Linux servers that respond to multiple external stimuli and ensure that traffic always reaches a healthy IP address that can serve client requests.

[ Want to learn more about system administration? Take a free online course: Red Hat Enterprise Linux technical overview. ]

Topics: Linux

Keepalived and high availability: Advanced topics

Tracking processes

Tracking files

Track interface

Track script

Notify scripts

Wrapping up

Anthony Critelli

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.

Keepalived and high availability: Advanced topics

Tracking processes

Tracking files

Track interface

Track script

Notify scripts

Wrapping up

Anthony Critelli

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.

Related Content