Introduction
Here I wrote up a little tutorial how to configure a standard RHEL cluster. Configuring a RHEL cluster is quite easy but documentation is sparse and not well organized. We will configure a 4 nodes cluster with shared storage and Heatbeat over a different NIC (not the main data link).
Cluster configuration goals
- Shared storage
- HA-LVM: lvm failover configuration (like HP ServiceGuard) is different from clustered logical volume manager (clvm)!!
- Bonded main data link (eg. bond0 –> eth0 + eth1)
- Hearthbeat on a different data link (eg. eth2)
Cluster installation steps
OS installation
First we performed a full CentOS 5.5 installation using kickstart, we also installed cluster packages like:
- cman
- rgmanager
- qdiskd
- ccs_tools
or
- @clustering (kickstart group)
Networking configuration
We configure 2 different data link:
- Main data link (for applications)
- Heartbeat data link (for cluster communication)
Main data link (bond0) uses ethernet bonding over 2 phisycal eth (eth0, eth1). This configuration assures network high availability when some network paths fail.
Cluster communication (heartbeat) uses a dedicated ethernet link (eth2), configured in a diffentent network and vlan.
To obtain such configuration cerate this file /etc/sysconfig/network-scripts/ifcfg-bond0
from scratch and fill it as below:
DEVICE=bond0 IPADDR=<your server main IP address (eg. 10.200.56.41)> NETMASK=<your server main network mask (eg. 255.255.255.0)> NETWORK=<your server main network (eg. 10.200.56.0)> BROADCAST=<your server main network broadcast (eg. 10.200.56.255)> ONBOOT=yes BOOTPROTO=none USERCTL=no BONDING_OPTS='miimon=100 mode=1' GATEWAY=<your server main default gateway (eg. 10.200.56.1)> TYPE=Ethernet
You can customize BONDING_OPT
. Please see bonding documentation.
Modify /etc/sysconfig/network-scripts/ifcfg-eth{0,1}
:
DEVICE=<eth0 or eth1, etc...> USECTL=no BOOTPROTO=none MASTER=bond0 SLAVE=yes HWADDR=<your eth MAC address (eg. 00:23:7d:3c:18:40)> ONBOOT=yes TYPE=Ethernet
Modify heartbeat nic /etc/sysconfig/network-scripts/ifcfg-eth2
:
DEVICE=eth2 HWADDR=<your eth MAC address (eg. 00:23:7D:3C:CE:96)> ONBOOT=yes BOOTPROTO=none TYPE=Ethernet NETMASK=<your server heartbeat network mask (eg. 255.255.255.0)> IPADDR=<your server main IP address (eg. 192.168.133.41)>
Note that heartbeat eth2
has no default gateway configured. Normally this is not required unless this node is outside other node’s network and there are not specific static routes.
Add this line to /etc/modprobe.conf
:
alias bond0 bonding
Add to /etc/hosts
the informations about each cluster node and replicate the file among the nodes:
# These are example!!! 10.200.56.41 artu.yourdomain.com artu 192.168.133.41 h-artu.yourdomain.com h-artu 10.200.56.42 ginevra.yourdomain.com ginevra 192.168.133.42 h-ginevra.yourdomain.com h-ginevra 10.200.56.43 morgana.yourdomain.com morgana 192.168.133.43 h-morgana.yourdomain.com h-morgana 10.200.56.44 lancelot.yourdomain.com lancelot 192.168.133.44 h-lancelot.yourdomain.com h-lancelot
Logical Volume Manager configuration
We choose not to use clustered logical volume manager (clvmd, sometimes called LVMFailover) but to use HA-LVM instead. HA-LVM is totally different from clvmd and it is quite similar di HP ServiceGuard behaviour.
HA-LVM features
- No needs to run any daemon (like clvmd aka LVMFailover)
- Each volume group can be activated exclusively on one node at a time
- Volume group configuration is not replicated automatically among the nodes (need to run vgscan on the nodes)
- Implementation not dipendent of the cluster status (can work without cluster running at all)
HA-LVM howto
Configure /etc/lvm/lvm.conf
as below:
Substitute existing filter
with:
filter = [ "a/dev/mpath/.*/", "a/c[0-9]d[0-9]p[0-9]$/", "a/sd*/", "r/.*/" ]
check locking_type
:
locking_type = 1
substitute existing volume_list
with:
volume_list = [ "vg00", "<quorum disk volume group>", "@<hostname related to heartbeat nic>" ]
Where:
- vg00 is the name of the root volume group (always active)
- <quorum disk volume group> is the name of the quorum disk volume group (always active)
- @<hostname related to heartbeat nic> is a tag. Each volume group can have one tag at a time. Cluster lvm agents tag the volume groups with the hostname (present into configuration) in order to activate them. LVM activate only volume groups that contain such tag. In this way each volume group tagged can be activated and accessed by one node at a time (because of volume_list settings)
At the end remember to regenerate initrd!
# mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r)
Storage configuration
Depending of your storage system, you should configure multipath, and each should be able to access to the same luns.
Quorum disk
Quorum disk is a 20MB LUN shared on the storage to all cluster nodes. This disk is used by the cluster to tie-break in case of split-brain events. Each node update its own information to the quorum disk. If some nodes experience network problems, the quorum disk assures that only the right group of nodes form the cluster but not both (split-brain)!
Quorum disk creation
First be sure that each node can see the same 20MB LUN. Then, on the first node, create a physical volume:
# pvcreate /dev/mpath1
create a dedicated volume group:
# vgcreate -s 8 vg_qdisk /dev/mpath1
create a logical volume and extend it to maximun volume group size:
# lvcreate -l <max_vg_pe> -n lv_qdisk vg_qdisk
Make sure that this volume group is present into volume_list
inside /etc/lvm/lvm.conf
. It should be activated on all nodes!
On the other nodes perform a:
# vgscan
Should appear the quorum disk volume group.
Quorum disk configuration
Now we have to populate quorum disk space with the right information. To perform this type:
# mkqdisk -c /dev/vg_qdisk/lv_qdisk -l <your_cluster_name>
Note that is not required to use your cluster name as quorum disk label, but it is recommended.
You need also to create a heuristic script to help qdisk when acting as tie-breaker. Create /usr/share/cluster/check_eth_link.sh
:
#!/bin/sh # Network link status checker ethtool $1 | grep -q "Link detected.*yes" exit $?
Now activate the quorum disk:
# service qdiskd start # chkconfig qdiskd on
Logging configuration
In order to assure a good logging you can choose to log the rgmanager to a specific file.
Add this lines to /etc/syslog.conf
:
# Red Hat Cluster local4.* /var/log/rgmanager
Add /var/log/rgmanager
to logrotate syslog settings in /etc/logrotate.d/syslog
:
/var/log/messages /var/log/secure /var/log/maillog /var/log/spooler /var/log/boot.log /var/log/cron /var/log/rgmanager { sharedscripts postrotate /bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true /bin/kill -HUP `cat /var/run/rsyslogd.pid 2> /dev/null` 2> /dev/null || true endscript }
Modify this line in /etc/cluster/cluster.conf
:
<rm log_facility="local4" log_level="5">
Increment /etc/cluster/cluster.conf
version and update on all nodes:
# ccs_tool update /etc/cluster/cluster.conf
Cluster configuration
For configuring cluster you can choose to use:
- Luci web interface
- Manual xml configuration
Configuring cluster using luci
In order to use luci web interface you need to activate service ricci
on all nodes and luci
on one node only:
(on all nodes) # chkconfig ricci on # service ricci start
(choose only a node) # chkconfig luci on # luci_admin init # service luci restart
Please note that luci_admin init
must be executed only the first time and before starting luci service, otherwise luci will be unusable.
now connect to luci: https://node_with_luci.mydomain.com:8084 Here you can create a cluster, add nodes, create services, failover domains etc…
See Recommended cluster configuration to learn the right settings for the cluster.
Configuring cluster editing the XML
You can also manually configure a cluster editing its main config file /etc/cluster/cluster.conf
. To create the config skeleton use:
# ccs_tool create
now the just created config file is not yet usable, you should configure cluster settings, add nodes, create services, failover domains etc…
When config file is complete, copy the file on all nodes and start the cluster in this way:
(on all nodes) # chkconfig cman on # chkconfig rgmanager on # service cman start # service rgmanager start
See Recommended cluster configuration to learn the right settings for the cluster.
See Useful cluster commands to learn some useful console cluster commands to use.
Recommended cluster configuration
Here is attached a /etc/cluster/cluster.conf
file of a fully configured cluster.
For commenting purposes, the file is splitted into several consecutive parts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<?xml version="1.0"?> <cluster alias="jcaps_prd" config_version="26" name="jcaps_prd"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="h-lancelot.yourdomain.com" nodeid="1" votes="1"> <fence/> </clusternode> <clusternode name="h-artu.yourdomain.com" nodeid="2" votes="1"> <fence/> </clusternode> <clusternode name="h-morgana.yourdomain.com" nodeid="3" votes="1"> <fence/> </clusternode> </clusternodes> <cman expected_votes="4"/> <fencedevices/> |
This is the first part of the XML cluster config file.
- First line describes the cluster name and the
config_version
. Each time you modify the XML you must increment the config_version by 1 prior to update the config on all nodes. - Fence deamon line is the default one.
- Cluster node stanza contains the nodes of the cluster. Note that
name
property contains the FQDN of the name. This name determines the eth used for cluster communication. In this example we don’t use the main hostname but the hostname related to the eth we choose to use as cluster communication channel. - Note also that the line
<fence/>
is required. Note that here we do not use any fence device. Due to the nature of HA-LVM the access to the data sould be exclusive by one node at a time. - Cman
expected_votes
is 4 because each node give 1 vote each.
1 2 3 4 5 6 7 8 9 |
<rm log_facility="local4" log_level="5"> <failoverdomains> <failoverdomain name="jcaps_prd" nofailback="0" ordered="0" restricted="1"> <failoverdomainnode name="h-lancelot.yourdomain.com" priority="1"/> <failoverdomainnode name="h-artu.yourdomain.com" priority="1"/> <failoverdomainnode name="h-morgana.yourdomain.com" priority="1"/> </failoverdomain> </failoverdomains> <resources/> |
This section begins resource manager configuration (<rm ...>
).
- Resource manager section can be configured for logging. Rm logs to syslog, here we configured the
log_facility
and the logging level. The facility we specified allows us to log to a separate file (see logging configuration) - We configured also a failover domain containing all cluster node. We want that a service can switch to all cluster nodes, but you can also configure different behaviours here.
1 2 3 4 5 6 7 8 9 |
<service autostart="1" domain="jcaps_prd" exclusive="0" name="subversion" recovery="relocate"> <ip address="10.200.56.60" monitor_link="1"/> <lvm name="vg_subversion_apps" vg_name="vg_subversion_apps"/> <lvm name="vg_subversion_data" vg_name="vg_subversion_data"/> <fs device="/dev/vg_subversion_apps/lv_apps" force_fsck="1" force_unmount="1" fsid="61039" fstype="ext3" mountpoint="/apps/subversion" name="svn_apps" self_fence="0"> <fs device="/dev/vg_subversion_data/lv_repositories" force_fsck="1" force_unmount="1" fsid="3193" fstype="ext3" mountpoint="/apps/subversion/repositories" name="svn_repositories" self_fence="0"/> </fs> <script file="/my_cluster_scripts/subversion/subversion.sh" name="subversion"/> </service> |
This section contains the services in the cluster (like HP ServiceGuard packages)
- We choose the failover domain (in this case our failover domain contains all nodes so the service can run on all nodes)
- We add a ip address resource (use always monitor link!)
- We use also a HA-LVM resource (
<lvm ...>
). Here all VG specified will be tagged with the node name when activating. This means that they can be activated only on the node where the service is running (only on that node!). Note: If you do not specify any LV, all the LVs inside the VG will be activated! - Next there are also
<fs ...>
tags for mounting filesystem resources. It is recommended to useforce_unmount
andforce_fsck
. - You can specify also a custom script for starting application/services and so on. Please note that the script must be LSB compliant. This means that it must handle start|stop|status. Note also that default cluster behaviour is to run the script with status parameter every 30 seconds. If the script status does not return 0, the service will be marked as failed (and probably will be restarted/relocated).
1 |
</rm> |
This section closes the resource manager configuration (closes XML tag).
1 |
<totem consensus="4800" join="60" token="20000" token_retransmits_before_loss_const="20"/> |
This is a crucial part of cluster configuration. Here you specify the failure detection time of cluster.
- RedHat recommends to the CMAN membership (token) timeout value to be at least times that of the qdiskd timeout value. Here the value is 20 seconds.
1 2 3 |
<quorumd interval="2" label="jcaps_prd_qdisk" min_score="2" tko="5" votes="1"> <heuristic interval="2" program="/usr/share/cluster/check_eth_link.sh bond0" score="3"/> </quorumd> |
Here we configure the quorum disk to be used by the cluster.
- We choose a quorum timeout value of 10 seconds (quorumd interval * quorumd tko) which is a half of token timeout (20 seconds).
- We insert also a heuristic script to determine the network health. This will help qdisk to take a decision when split-brain happens.
1 |
</cluster> |
This concludes the configuration file closing XML tags still opened.
Useful cluster commands
- ccs_tool update /etc/cluster/cluster.conf (update cluster.conf among all nodes)
- clustat (see cluster status)
- clusvcadm -e <service> (enable/start a service)
- clusvcadm -d <service> (disable/stop service)
- vgs -o vg_name,vg_size,vg_tags (show all volume groups names, size and tags)
Resources
- RedHat Cluster Suite developer wiki: http://sources.redhat.com/cluster/wiki
- RHCS Configuration guide: http://www.redhat.com/docs/manuals/csgfs
- Migrating HP Serviceguard to RedHat Cluster Suite: 4AA1-xxxxENN (May 2009)
Recent Comments