{"id":1262,"date":"2012-08-31T15:51:41","date_gmt":"2012-08-31T07:51:41","guid":{"rendered":"http:\/\/rmohan.com\/?p=1262"},"modified":"2012-08-31T15:52:44","modified_gmt":"2012-08-31T07:52:44","slug":"rhcsred-hat-cluster-suite-quorum-disk","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=1262","title":{"rendered":"RHCS(Red Hat Cluster Suite) quorum disk"},"content":{"rendered":"<p><strong>Methods to deal with split-brain situation: <br \/> <\/strong>1. Redundant heartbeat path <br \/> network port communication plus serial port communication <br \/> 2. I\/O fencing <br \/> Remaining nodes separate failed node from its storage either by shutdown\/reboot power port or storage port <br \/> 3. Quorum disk <br \/> Quorum disk is a kind of I\/O fencing, but the reboot action is executed by failed node&#8217;s own quorum daemon. It also has <strong>additional feature<\/strong>: contributing vote to cluster. if you want the last standing node to keep the multiple-nodes cluster\u00a0 running, quorum disk appears to be the only solution.<br \/> <strong>RHCS (Red Hat Cluster Suite) Quorum disk facts <br \/> <\/strong>&#8211; A shared block device (SCSI\/iSCSI\/FC..), Device size requirement is approximately 10MiB <br \/> &#8211; Supports maximum 16 nodes, nodes id must be sequentially ordered <br \/> &#8211; Quorum disk can contribute\u00a0 votes. In multiple nodes cluster, together with quorum vote, the last standing node can still keep the cluster running <br \/> &#8211; single node votes+1 &lt;=Quorum&#8217;s disk vote &lt; nodes total votes <br \/> &#8211; The failure of the shared quorum disk won\u2019t result in cluster failure, as long as Quorum&#8217;s disk vote &lt; nodes total votes <br \/> &#8211; each node write its own health information in its own region, the health is determined by external checking program such as &#8220;ping&#8221;<br \/> <strong>Setup Quorum disk<\/strong><\/p>\n<pre>#initialise quorum disk once in any node \r\nmkqdisk -c \/dev\/sdx -l myqdisk <\/pre>\n<p><strong>Add quorum disk to cluster\u00a0<\/strong><br \/> Use luci or system-config-cluster to add quorum disk, following is the result xml file<\/p>\n<pre>&lt;clusternodes&gt;\r\n<\/pre>\n<pre>&lt;clusternode name=\"station1.example.com\" nodeid=\"1\" votes=\"2\"&gt;\r\n<\/pre>\n<pre>&lt;fence\/&gt;\r\n<\/pre>\n<pre>&lt;\/clusternode&gt;\r\n<\/pre>\n<pre>&lt;clusternode name=\"station2.example.com\" nodeid=\"2\" votes=\"2\"&gt;\r\n<\/pre>\n<pre>&lt;fence\/&gt;\r\n<\/pre>\n<pre>&lt;\/clusternode&gt;\r\n<\/pre>\n<pre>&lt;clusternode name=\"station3.example.com\" nodeid=\"3\" votes=\"2\"&gt;\r\n<\/pre>\n<pre>&lt;fence\/&gt;\r\n<\/pre>\n<pre>&lt;\/clusternode&gt;\r\n<\/pre>\n<pre>&lt;\/clusternodes&gt;\r\n<\/pre>\n<pre>#expected votes =9=(nodes total votes + quorum disk votes) = (2+2+2+3)       \r\n<\/pre>\n<pre>&lt;cman expected_votes=\"9\"\/&gt; \r\n<\/pre>\n<pre>#Health check result is writen to quorum disk every 2 secs\r\n<\/pre>\n<pre>#if health check fails over 5 tko, 10 (2*5) secs, the node is rebooted by quorum daemon\r\n<\/pre>\n<pre>#Each heuristic check is run very 2 secs and earn 1 score,if shell script return is 0\r\n<\/pre>\n<pre>&lt;quorumd interval=\"2\" label=\"myqdisk\" min_score=\"2\" tko=\"5\" votes=\"3\"&gt;\r\n<\/pre>\n<pre>&lt;heuristic interval=\"2\" program=\"ping -c1 -t1 192.168.1.60\" score=\"1\"\/&gt;\r\n<\/pre>\n<pre>&lt;heuristic interval=\"2\" program=\"ping -c1 -t1 192.168.1.254\" score=\"1\"\/&gt;\r\n<\/pre>\n<pre>&lt;\/quorumd&gt;<\/pre>\n<p><strong>Start quorum disk daemon<\/strong><br \/> The daemon is also one of daemons automatically started by cman<br \/> service qdiskd start<br \/> <strong>Check quorum disk information<\/strong><\/p>\n<pre>$ mkqdisk -L -d\r\n<\/pre>\n<pre>mkqdisk v0.6.0\r\n<\/pre>\n<pre>\/dev\/disk\/by-id\/scsi-1IET_00010002:\r\n<\/pre>\n<pre>\/dev\/disk\/by-uuid\/55fbf858-df75-493b-a764-5640be5a9b46:\r\n<\/pre>\n<pre>\/dev\/sdc:\r\n<\/pre>\n<pre>Magic:                eb7a62c2\r\n<\/pre>\n<pre>Label:                myqdisk\r\n<\/pre>\n<pre>Created:              Sat May  7 05:56:35 2011\r\n<\/pre>\n<pre>Host:                 station2.example.com\r\n<\/pre>\n<pre>Kernel Sector Size:   512\r\n<\/pre>\n<pre>Recorded Sector Size: 512\r\n<\/pre>\n<pre>Status block for node 1\r\n<\/pre>\n<pre>Last updated by node 1\r\n<\/pre>\n<pre>Last updated on Sat May  7 15:09:37 2011\r\n<\/pre>\n<pre>State: Master\r\n<\/pre>\n<pre>Flags: 0000\r\n<\/pre>\n<pre>Score: 0\/0\r\n<\/pre>\n<pre>Average Cycle speed: 0.001500 seconds\r\n<\/pre>\n<pre>Last Cycle speed: 0.000000 seconds\r\n<\/pre>\n<pre>Incarnation: 4dc4d1764dc4d176\r\n<\/pre>\n<pre>Status block for node 2\r\n<\/pre>\n<pre>Last updated by node 2\r\n<\/pre>\n<pre>Last updated on Sun May  8 01:09:38 2011\r\n<\/pre>\n<pre>State: Running\r\n<\/pre>\n<pre>Flags: 0000\r\n<\/pre>\n<pre>Score: 0\/0\r\n<\/pre>\n<pre>Average Cycle speed: 0.001000 seconds\r\n<\/pre>\n<pre>Last Cycle speed: 0.000000 seconds\r\n<\/pre>\n<pre>Incarnation: 4dc55e164dc55e16\r\n<\/pre>\n<pre>Status block for node 3\r\n<\/pre>\n<pre>Last updated by node 3\r\n<\/pre>\n<pre>Last updated on Sat May  7 15:09:38 2011\r\n<\/pre>\n<pre>State: Running\r\n<\/pre>\n<pre>Flags: 0000\r\n<\/pre>\n<pre>Score: 0\/0\r\n<\/pre>\n<pre>Average Cycle speed: 0.001500 seconds\r\n<\/pre>\n<pre>Last Cycle speed: 0.000000 seconds\r\n<\/pre>\n<pre>Incarnation: 4dc4d2f04dc4d2f0\r\n<\/pre>\n<p><strong>The cluster is still running with last node standing<\/strong><br \/> Please note Total votes=quorum votes=5=2+3, if quorum disk vote is less than\u00a0 (node votes+1), the cluster\u00a0 wouldn\u2019t have\u00a0 survived<\/p>\n<pre>$cman_tool status\r\n<\/pre>\n<pre>..\r\n<\/pre>\n<pre>Nodes: 1\r\n<\/pre>\n<pre>Expected votes: 9\r\n<\/pre>\n<pre>Quorum device votes: 3\r\n<\/pre>\n<pre>Total votes: 5\r\n<\/pre>\n<pre>Quorum: 5  \r\n<\/pre>\n<pre>..<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Methods to deal with split-brain situation: 1. Redundant heartbeat path network port communication plus serial port communication 2. I\/O fencing Remaining nodes separate failed node from its storage either by shutdown\/reboot power port or storage port 3. Quorum disk Quorum disk is a kind of I\/O fencing, but the reboot action is executed by failed [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/1262"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1262"}],"version-history":[{"count":3,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/1262\/revisions"}],"predecessor-version":[{"id":1265,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/1262\/revisions\/1265"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1262"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1262"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1262"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}