LVS Ubuntu

Intro

This page describes how I set up LVS load balancing using IPVS in LVS-DR mode. I wanted to test the viability of LVS to load balance VMware's VDM broker. I decided to set up a single director to start with, fronting two tiny HTTP servers serving simple pages. Here's the diagram:

The director uses IPVS (part of the 2.6 Linux kernel) to load balance. It's enabled in the Ubuntu Server 8.04 kernel by default, so I chose this as the OS. The two WWW servers could run anything; I chose the Hercules virtual appliance. It runs a custom tiny Linux OS with kernel 2.6.17-7 and thttpd, and you can download it from

http://instanbul.sourceforge.net

. Don't be confused by it having ESX in the name; you need to run it with VMware Player (free as in air) or VMware Workstation (not free as in beer costs money). You should be able to do it with Windows machines too. The VIP is that of the service; it is this IP that the clients use, and they will connect to it on the director. The VIP is an alias interface on the director. It is also an alias interface on the WWW servers; this is so that they accept packets sent to that address. If you use different addresses make sure they are all on the same segment and subnet.

IP choice

Choose the IP for your service - the virtual IP (VIP). Mine was 10.15.16.113.

WWW setup

  1. I downloaded Hercules (http://prdownloads.sourceforge.net/istanbul/Hercules-1.3-esx3.zip?download)
  2. I unzipped it
  3. I opened it with VMware Workstation (you could use VMware Player)
  4. I powered it on
  5. Watch it boot and get the IP
  6. Check with your web browser you can get to it at http://<IP>;
  7. Clear your web browser's cache (the no-cache commands in the HTML seem to be ignored by Firefox)
  8. SSH to it as root with password root
  9. Replace /chroot/htdocs/index.html with the HTML below
  10. Refresh the page and check you get one line of text
  11. Place the script below as /etc/init.d/S91lvs
  12. Make it executable with chmod +x /etc/init.d/S91lvs
  13. Shutdown the VM and clone it to get the second one
  14. Power them both on

/chroot/htdocs/index.html:

<html>
<head>
<meta http-equiv="Pragma" content="no-cache">
<!-- Pragma content set to no-cache tells the browser not to cache the page
This may or may not work in IE -->

<meta http-equiv="expires" content="0">
<!-- Setting the page to expire at 0 means the page is immediately expired
Any vales less then one will set the page to expire some time in past and
not be cached. This may not work with Navigator -->
</head>
<title>Fake WWW server 1</title>
<body>
This is fake WWW server 1
</body>
</html>

/etc/init.d/S91lvs: (not too pretty, could be tidied up a lot)

#!/bin/sh

# VIP installation for LVS-DR
# Ian Gibbs flash666 at yahoo dot com

IPEND=113
VIP=10.15.16.$IPEND
NETWORK_INIT_SCRIPT="/etc/init.d/S40network"

service=LVS

case "$1" in

    start)
        echo -n "Starting $service: "
# Down the network
$NETWORK_INIT_SCRIPT stop
echo ""

# Alter ARP behaviour
echo "Modifying kernel ARP params..."
echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/eth0/arp_announce
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce

# Bring the interface back up (doing it this way prevents ARP broadcasts you don't want)
$NETWORK_INIT_SCRIPT start

#install_realserver_vip
/sbin/ifconfig lo:$IPEND $VIP broadcast $VIP netmask 255.255.255.255 up
echo "Added VIP locally:"
/sbin/ifconfig lo:$IPEND

# installing route for VIP $VIP on device lo:$IPEND
/sbin/route add -host $VIP dev lo:$IPEND
echo "Modified routing table:"
/bin/netstat -rn
        ;;

    stop)
        echo -n "Stopping $service: "
        /sbin/ifconfig lo$IPEND down
        ;;
esac

I didn't bother statically defining the IPs on these two VMs; I just let them boot and used their DHCP addresses in the LB Setup. Seemed to be fine. Once you have both these powered on check you can SSH to them, and you can retrieve their relevant web pages. At this point the pages will be identical; modify the index.html second one to say "Fake WWW server 2" so you can tell them apart.

Director setup

I did this in a VM with 256Mb RAM.

  1. Install Ubuntu Server 8.04 LTS, choosing only to install OpenSSH? server
  2. apt-get update
  3. apt-get upgrade
  4. Reboot to move to the latest kernel
  5. apt-get upgrade. If any packages are "kept back", you will need to apt-get install them individually to upgrade them.
  6. apt-get install ipvsadm
  7. Save the following script as /root/lvs-setup.sh and run it as root

/root/lvs-setup.sh:

#!/bin/sh
IPEND=113
VIP=10.15.16.$IPEND
RIP1=10.15.16.39
RIP2=10.15.16.43
 
#director is not gw for realservers: leave icmp redirects on
echo 'setting icmp redirects (1 on, 0 off) '
echo "1" >/proc/sys/net/ipv4/conf/all/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/default/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/eth0/send_redirects

#add ethernet device and routing for VIP $VIP
/sbin/ifconfig eth0:$IPEND $VIP broadcast $VIP netmask 255.255.255.255
/sbin/route add -host $VIP dev eth0:$IPEND
#listing ifconfig info for VIP $VIP
/sbin/ifconfig eth0:$IPEND

#check VIP $VIP is reachable from self (director)
/bin/ping -c 1 $VIP
#listing routing info for VIP $VIP
/bin/netstat -rn

#setup_ipvsadm_table
#clear ipvsadm table
/sbin/ipvsadm -C
#installing LVS services with ipvsadm
#add http to VIP with round robin scheduling
/sbin/ipvsadm -A -t $VIP:http -s rr

#forward http to realserver using direct routing with weight 1
/sbin/ipvsadm -a -t $VIP:http -r $RIP1 -g -w 1
#check realserver reachable from director
ping -c 1 $RIP1

#forward http to realserver using direct routing with weight 1
/sbin/ipvsadm -a -t $VIP:http -r $RIP2 -g -w 1
#check realserver reachable from director
ping -c 1 $RIP2

#displaying ipvsadm settings
/sbin/ipvsadm

That's it. If you now run ipvsadm on the director you should get something similar to this:

root@director1:~# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port      Forward Weight ActiveConn InActConn
TCP 10.15.16.113:www rr
 -> 10.15.16.43:www       Route  1   0     0
 -> 10.15.16.39:www       Route  1   0     0

You can now use your browser to go to

http://<VIP>

; and should get one of the two web servers successfully. Refreshing may or may not get you a different one. You might need to clear your browser's cache each time. Mine had a predilection towards number 2. To test it properly, I installed the links (or lynx) command-line browser on my Kubuntu 8.04 machine:

 apt-get install links

This allows you to retrieve the page without caching it and print out the contents in a nicely formatted way:

root@igibbs-girflet:~# links -dump http://10.15.16.113
  This is fake VDM server 2
root@igibbs-girflet:~# links -dump http://10.15.16.113
  This is fake VDM server 1
root@igibbs-girflet:~# links -dump http://10.15.16.113
  This is fake VDM server 2
root@igibbs-girflet:~# links -dump http://10.15.16.113
  This is fake VDM server 1

As you can see every time I retrieved the page I got sent to a different server. Looking good! Back on the director, ipvsadm confirms two connections have been sent to each:

root@director1:~# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port      Forward Weight ActiveConn InActConn
TCP 10.15.16.113:www rr
 -> 10.15.16.43:www       Route  1   0     2
 -> 10.15.16.39:www       Route  1   0     2

They are inactive because the TCP connection has been closed. Now I went nuts:

root@igibbs-girflet:~# for i in `seq 1 20`; do echo -n "$i: "; links -dump http://10.15.16.113; done
1:  This is fake VDM server 1
2:  This is fake VDM server 2
3:  This is fake VDM server 1
4:  This is fake VDM server 2
5:  This is fake VDM server 1
6:  This is fake VDM server 2
7:  This is fake VDM server 1
8:  This is fake VDM server 2
9:  This is fake VDM server 1
10:  This is fake VDM server 2
11:  This is fake VDM server 1
12:  This is fake VDM server 2
13:  This is fake VDM server 1
14:  This is fake VDM server 2
15:  This is fake VDM server 1
16:  This is fake VDM server 2
17:  This is fake VDM server 1
18:  This is fake VDM server 2
19:  This is fake VDM server 1
20:  This is fake VDM server 2

This command retrieves the page twenty times without a pause, synchronously. Normally I put a pause in these sorts of things, but I wanted to hammer it. These twenty iterations went by within a second. Pretty fast. I upped it to 1000 interations and got a stopwatch out. It took under 5s. At 200 requests per second (and that's limited by the time taken to return the web page, not even the maximum it could cope with) I decided this was going to plenty fast enough for me.

Adding failover

Load-balancing is all well and good, but we need the director to not send traffic to a dead server. IPVS doesn't do this on its own; you need to add a program to do your checking, and then have it "tell" IPVS to stop using that real server. I did this with ldirectord. On the director:

  1. apt-get install ldirectord
  2. ldirectord does the ipvs setup for us, so I modified /root/lvs-setup.sh as below.
  3. Create /etc/ha.d/ldirectord.cf as below.
  4. Reboot
  5. Run /root/lvs-setup.sh
  6. Rerun the 20 interation test. Should work as before.

/root/lvs-setup.sh:

IPEND=4
VIP=10.15.16.$IPEND
IFACE=eth1

#director is not gw for realservers: leave icmp redirects on
echo 'setting icmp redirects (1 on, 0 off) '
echo "1" >/proc/sys/net/ipv4/conf/all/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/default/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/$IFACE/send_redirects

#add ethernet device and routing for VIP $VIP
/sbin/ifconfig $IFACE:$IPEND $VIP broadcast $VIP netmask 255.255.255.255
/sbin/route add -host $VIP dev $IFACE:$IPEND
#listing ifconfig info for VIP $VIP
/sbin/ifconfig $IFACE:$IPEND

#check VIP $VIP is reachable from self (director)
/bin/ping -c 1 $VIP
#listing routing info for VIP $VIP
/bin/netstat -rn

/etc/ha.d/ldirectord.cf:

checktimeout=3
checkinterval=5
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=10.15.16.113:80
#    fallback=127.0.0.1:80
    real=10.15.16.39:80 gate
    real=10.15.16.43:80 gate
    service=http
    request="index.html"
    receive="fake"
    scheduler=wlc
    protocol=tcp
    checktype=negotiate

Now your HA is working. Start by looking at the current status:

root@director1:~# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port      Forward Weight ActiveConn InActConn
TCP 10.15.16.113:www wlc
 -> 10.15.16.43:www       Route  1   0     10
 -> 10.15.16.39:www       Route  1   0     10

See that both the web servers have a weight of 1. Now disconnect the NIC of WWW1. Within 5s, you get this:

root@director1:~# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port      Forward Weight ActiveConn InActConn
TCP 10.15.16.113:www wlc
 -> 10.15.16.43:www       Route  1   0     10
 -> 10.15.16.39:www       Route  0   0     10

The weight of WWW1 has changed to 0. Re run your 20 test and you'll see all the connections go to WWW2. Reconnect the NIC and re-run your 20 test. At first all the connections will go to WWW1. Then on the second run they will be sent to 1 and 2 equally. This is because we have switched LVS scheduler from round robin (rr) to weighted least-connected (wlc). See it in the ipvsadm output? IPVS will load up the now working WWW1 until it is equally loaded to WWW2, and then carry on sharing equally. ldirectord is not just checking it can ping the servers, it's doing an HTTP request and looking for the work 'fake' in the response. If you change /chroot/htdocs/index.html on one of the WWWs to remove the word fake, you'll see they get weighted to 0 as well. Brilliant!

Testing with VDM

I now wanted to reproduce my test against real VDM servers. I installed Vmware View Manager (new name for VDM) 3.0.0 build 127642 on two Windows Server 2003 32-bit machines, and made the second one a replica of the first. I tested first that I could successfully use the two machines directly by establishing a test desktop and connecting to it, making sure that the RDP connection stayed up for at least five minutes.

Finally, I needed to add a 'fake' interface to the Windows servers. I had done this in the Hercules servers above as well - it makes the machine accept packets intended for the VIP where it would otherwise ignore them. To do this, I followed the instructions at

http://support.microsoft.com/kb/842561

. Having added that, you should then have a Local Area Connection 2 listed in *Network Connections*.

  1. Rename this to something sensible. I used "Loopback adaptor for View Load Balancing"
  2. Disable it
  3. Give it a static IP of the VIP, and a netmask of 255.255.255.0. No gateway and no DNS servers.
  4. Open the registry at _HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet?\Services\Tcpip\Parameters\Interfaces_ In here are all the interfaces (some you can see in Network Connections and some can't. Find the one that has IPAddress matching the VIP. That's the loopback adaptor that you just added. Set the value SubnetMask? in this entry to 255.255.255.255.
  5. Open the properties of the adaptor and check that it now has a netmask of 255.255.255.255.
  6. Enable it

I also made some changes to the director so that clients would always get sent to the same machine. Updated conf files below: /root/lvssetup.sh:

IPEND=4
VIP=10.13.246.$IPEND
IFACE=eth1

#director is not gw for realservers: leave icmp redirects on
echo 'setting icmp redirects on'
echo "1" >/proc/sys/net/ipv4/conf/all/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/default/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/$IFACE/send_redirects
echo 1 > /proc/sys/net/ipv4/vs/expire_quiescent_template

#add ethernet device and routing for VIP $VIP
/sbin/ifconfig $IFACE:$IPEND $VIP broadcast $VIP netmask 255.255.255.255
/sbin/route add -host $VIP dev $IFACE:$IPEND
#listing ifconfig info for VIP $VIP
/sbin/ifconfig $IFACE:$IPEND

#check VIP $VIP is reachable from self (director)
/bin/ping -c 1 $VIP
#listing routing info for VIP $VIP
/bin/netstat -rn

/etc/ha.d/ldirectord.cf:

checktimeout=3
checkinterval=5
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=10.13.246.4:443
#    fallback=127.0.0.1:443
    real=10.13.246.62:443 gate 1000
    real=10.13.246.63:443 gate 1000
    service=https
    request="index.jsp"
    receive="Requirements"
    scheduler=sh
    protocol=tcp
    checktype=negotiate
    quiescent=no

I then tested this setup with two computers. It seemed to work quite well. However, because the client MAC was being used to route the requests I would need a computer for each concurrent user. Because we need persistance for VDM (ie we need the the load balancer to always send the client to the same VDM connection server), we have a choice of using LVS persistance (client IP-based) or using the sh scheduler (client MAC-based). Either way we'd need a lot of test computers to try it out. For the moment, I chose to accept that as a limitation of my testing. Possibly later I could try and run up lots of VMs to simulate a larger number of clients. For the moment I would use a JMeter test to simulate two clients with lots of activity.

Running an automated load test

An automated load test exists that uses JMeter to stress the user-facing XML parts of VDM - the bits that go through the load balancer. I implemented the test as below. I created two test users in AD and put them in a group. They were viewtest1 and viewtest2. They had the same password. I created a new non-persistent desktop pool in VDM and tested that it was working. I entitled the newly created group to it. There is a step in the test called get-desktop-connection. It is disabled by default. This is because to use it, you need one available VM for every test run that will occur in three minutes; each call to get-desktop -connection marks a VM with a pending session for the next three minutes, effectively rendering it unavailable. If your test rig is configured to repeat the test rapidly then you will quickly have all the VMs marked with pending sessions, and the test will then start to fail with NoServersAvailableExceptions

?

. If you do decide to enable this step, set all the connection servers in VDM to Direct Connect mode and make sure there are lots of VMs in the desktop. I made a new Ubuntu Server 8.04 VM with 256Mb RAM. I:

  1. apt-get install sun-java6-jre xauth libxtst6 libxi6
  2. mkdir -p /usr/local/jmeter/tests
  3. Downloaded JMeter 2.3.2 (binary) from http://jakarta.apache.org/site/downloads/downloads_jmeter.cgi into /usr/local/jmeter
  4. Extracted the contents of the tarball
  5. Opened /usr/local/jmeter/jakarta-jmeter-2.3.2/bin/jmeter
  6. Changed the HEAP line to read HEAP="-Xms200m -Xmx200m"
  7. Changed the NEW line similarly with values of 100m
  8. Downloaded the jmx file at and placed it in /usr/local/jmeter/tests
  9. SSHd to the VM from another Linux machine with X11 forwarding enabled (ssh -X)
  10. Ran /usr/local/jmeter/jakarta-jmeter-2.3.2/bin/jmeter. The GUI should come up.
  11. File > Open and load the jmx file downloaded earlier.
  12. Go to the Variables section and update USER_PREFIX, PASSWORD, DOMAIN, DESKTOP. Save the test

Removing the director SPOF

All well and good, but now we have introduced an SPOF in the form of the director. If that goes down, users will all be screwed. We need a second director to take over if that happens. So, bring on heartbeat.

-- Ian Gibbs