Methods to deal with split-brain situation:
1. Redundant heartbeat path
network port communication plus serial port communication
2. I/O fencing
Remaining nodes separate failed node from its storage either by shutdown/reboot power port or storage port
3. Quorum disk
Quorum disk is a kind of I/O fencing, but the reboot action is executed by failed node’s own quorum daemon. It also has additional feature: contributing vote to cluster. if you want the last standing node to keep the multiple-nodes cluster running, quorum disk appears to be the only solution.
RHCS (Red Hat Cluster Suite) Quorum disk facts
– A shared block device (SCSI/iSCSI/FC..), Device size requirement is approximately 10MiB
– Supports maximum 16 nodes, nodes id must be sequentially ordered
– Quorum disk can contribute votes. In multiple nodes cluster, together with quorum vote, the last standing node can still keep the cluster running
– single node votes+1 <=Quorum’s disk vote < nodes total votes
– The failure of the shared quorum disk won’t result in cluster failure, as long as Quorum’s disk vote < nodes total votes
– each node write its own health information in its own region, the health is determined by external checking program such as “ping”
Setup Quorum disk
#initialise quorum disk once in any node mkqdisk -c /dev/sdx -l myqdisk
Add quorum disk to cluster
Use luci or system-config-cluster to add quorum disk, following is the result xml file
<clusternodes>
<clusternode name="station1.example.com" nodeid="1" votes="2">
<fence/>
</clusternode>
<clusternode name="station2.example.com" nodeid="2" votes="2">
<fence/>
</clusternode>
<clusternode name="station3.example.com" nodeid="3" votes="2">
<fence/>
</clusternode>
</clusternodes>
#expected votes =9=(nodes total votes + quorum disk votes) = (2+2+2+3)
<cman expected_votes="9"/>
#Health check result is writen to quorum disk every 2 secs
#if health check fails over 5 tko, 10 (2*5) secs, the node is rebooted by quorum daemon
#Each heuristic check is run very 2 secs and earn 1 score,if shell script return is 0
<quorumd interval="2" label="myqdisk" min_score="2" tko="5" votes="3">
<heuristic interval="2" program="ping -c1 -t1 192.168.1.60" score="1"/>
<heuristic interval="2" program="ping -c1 -t1 192.168.1.254" score="1"/>
</quorumd>
Start quorum disk daemon
The daemon is also one of daemons automatically started by cman
service qdiskd start
Check quorum disk information
$ mkqdisk -L -d
mkqdisk v0.6.0
/dev/disk/by-id/scsi-1IET_00010002:
/dev/disk/by-uuid/55fbf858-df75-493b-a764-5640be5a9b46:
/dev/sdc:
Magic: eb7a62c2
Label: myqdisk
Created: Sat May 7 05:56:35 2011
Host: station2.example.com
Kernel Sector Size: 512
Recorded Sector Size: 512
Status block for node 1
Last updated by node 1
Last updated on Sat May 7 15:09:37 2011
State: Master
Flags: 0000
Score: 0/0
Average Cycle speed: 0.001500 seconds
Last Cycle speed: 0.000000 seconds
Incarnation: 4dc4d1764dc4d176
Status block for node 2
Last updated by node 2
Last updated on Sun May 8 01:09:38 2011
State: Running
Flags: 0000
Score: 0/0
Average Cycle speed: 0.001000 seconds
Last Cycle speed: 0.000000 seconds
Incarnation: 4dc55e164dc55e16
Status block for node 3
Last updated by node 3
Last updated on Sat May 7 15:09:38 2011
State: Running
Flags: 0000
Score: 0/0
Average Cycle speed: 0.001500 seconds
Last Cycle speed: 0.000000 seconds
Incarnation: 4dc4d2f04dc4d2f0
The cluster is still running with last node standing
Please note Total votes=quorum votes=5=2+3, if quorum disk vote is less than (node votes+1), the cluster wouldn’t have survived
$cman_tool status
..
Nodes: 1
Expected votes: 9
Quorum device votes: 3
Total votes: 5
Quorum: 5
..
hi mohan,
I got problem while using quorum disk, but i read somewhere that i can also configure quorum without quorum disk, and use network quorum. Can you help me to clear this confiusion for me?
I am configuring in RHEL6.3, and i usually i have to update manually the cluster.conf, and the cluster hangs…
2 kernel: dlm: closing connection to node 2
Can you help me?
Thanks!
Quorum with Disk is Most recommended as they work as split brain concept. 2 kernel: dlm: closing connection to node 2 check the /var/log/messages
stop the network manager services.
Check quorum disk information
$ mkqdisk -L -d
After exploring a handful of the articles on your website, I really like your technique of writing a blog.
I added it to my bookmark website list and will be checking back soon.
Please visit my web site as well and let me know your opinion.