April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Categories

April 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Cluster Admin: Interview Question

Cluster Admin: Interview Question

Cluster Administration
1 What is a Cluster
A cluster is two or more computers (called as nodes or members) that works together to perform a taks.
2 What are the types of cluster
Storage
High Availability
Load Balancing
High Performance
3 What is CMAN
CMAN is Cluster Manager. It manages cluster quorum and cluster membership.
CMAN runs on each node of a cluster
4 What is Cluster Quorum
Quorum is a voting algorithm used by CMAN.
CMAN keeps a track of cluster quorum by monitoring the count of number of nodes in cluster.
If more than half of members of a cluster are in active state, the cluster is said to be in Quorum
If half or less than half of the members are not active, the cluster is said to be down and all cluster activities will be stopped
Quorum is defined as the minimum set of hosts required in order to provide service and is used to prevent split-brain situations.
The quorum algorithm used by the RHCS cluster is called “simple majority quorum”, which means that more than half of the hosts must be online and communicating in order to provide service.
5 What is split-brain
It is a condition where two instances of the same cluster are running and trying to access same resource at the same time, resulting in corrupted cluster integrity
Cluster must maintain quorum to prevent split-brain issues
6 What is Quorum disk
In case of a 2 node cluster, quorum disk acts as a tie-breaker and prevents split-brain issue
If a node has access to network and quorum disk, it is active
If a node has lost access to network or quorum disk, it is inactive and can be fenced
A Quorum disk, known as a qdisk is small partition on SAN storage used to enhance quorum. It generally carries enough votes to allow even a single node to take quorum during a cluster partition. It does this by using configured heuristics, that is custom tests, to decided which which node or partition is best suited for providing clustered services during a cluster reconfiguration.
7 What is RGManager
RGManager manages and provides failover capabilities for collections of cluster resources called services, resource groups, or resource trees.
In the event of a node failure, RGManager will relocate the clustered service to another node with minimal service disruption. You can also restrict services to certain nodes, such as restricting httpd to one group of nodes while mysql can be restricted to a separate set of nodes.
When the cluster membership changes, openais tells the cluster that it needs to recheck it’s resources. This causes rgmanager, the resource group manager, to run. It will examine what changed and then will start, stop, migrate or recover cluster resources as needed.
Within rgmanager, one or more resources are brought together as a service. This service is then optionally assigned to a failover domain, an subset of nodes that can have preferential ordering.
8 What is Fencing
Fencing is the disconnection of a node from the cluster’s shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity. The cluster infrastructure performs fencing through the fence daemon, fenced.
Power fencing — A fencing method that uses a power controller to power off an inoperable node.
storage fencing — A fencing method that disables the Fibre Channel port that connects storage to an inoperable node.
Other fencing — Several other fencing methods that disable I/O or power of an inoperable node, including IBM Bladecenters, PAP, DRAC/MC, HP ILO, IPMI, IBM RSA II, and others.
9 How to manually fence an inactive node
# fence_ack_manual –n
10 How to see shared IP address (Cluster Resource) if ipconfig doesn’t show it
# ip addr list
11 What is DLM
A lock manager is a traffic cop who controls access to resources in the cluster
As implied in its name, DLM is a distributed lock manager and runs in each cluster node; lock management is distributed across all nodes in the cluster. GFS2 and CLVM use locks from the lock manager.
12 What is Conga
This is a comprehensive user interface for installing, configuring, and managing Red Hat High Availability Add-On.
Luci — This is the application server that provides the user interface for Conga. It allows users to manage cluster services. It can be run from outside cluster environment.
Ricci — This is a service daemon that manages distribution of the cluster configuration. Users pass configuration details using the Luci interface, and the configuration is loaded in to corosync for distribution to cluster nodes. Luci is accessible only among cluster members.
13 What is OpenAis or Corosync
OpenAIS is the heart of the cluster. All other computers operate though this component, and no cluster component can work without it. Further, it is shared between both Pacemaker and RHCS clusters.
In Red Hat clusters, openais is configured via the central cluster.conf file. In Pacemaker clusters, it is configured directly in openais.conf.
14 What is ToTem
The totem protocol defines message passing within the cluster and it is used by openais. A token is passed around all the nodes in the cluster, and the timeout in fencing is actually a token timeout. The counter, then, is the number of lost tokens that are allowed before a node is considered dead.
The totem protocol supports something called ‘rrp’, Redundant Ring Protocol. Through rrp, you can add a second backup ring on a separate network to take over in the event of a failure in the first ring. In RHCS, these rings are known as “ring 0? and “ring 1?.
15 What is CLVM
CLVM is ideal in that by using DLM, the distributed lock manager, it won’t allow access to cluster members outside of openais’s closed process group, which, in turn, requires quorum.
It is ideal because it can take one or more raw devices, known as “physical volumes”, or simple as PVs, and combine their raw space into one or more “volume groups”, known as VGs. These volume groups then act just like a typical hard drive and can be “partitioned” into one or more “logical volumes”, known as LVs. These LVs are where Xen’s domU virtual machines will exist and where we will create our GFS2 clustered file system.
16 What is GFS2
It works much like standard filesystem, with user-land tools like mkfs.gfs2, fsck.gfs2 and so on. The major difference is that it and clvmd use the cluster’s distributed locking mechanism provided by the dlm_controld daemon. Once formatted, the GFS2-formatted partition can be mounted and used by any node in the cluster’s closed process group. All nodes can then safely read from and write to the data on the partition simultaneously.
17 What is the importance of DLM
One of the major roles of a cluster is to provide distributed locking on clustered storage. In fact, storage software can not be clustered without using DLM, as provided by the dlm_controld daemon and using openais’s virtual synchrony via CPG.
Through DLM, all nodes accessing clustered storage are guaranteed to get POSIX locks, called plocks, in the same order across all nodes. Both CLVM and GFS2 rely on DLM, though other clustered storage, like OCFS2, use it as well.
18 What is CCS_TOOL
we can use ccs_tool, the “cluster configuration system (tool)”, to push the new cluster.conf to the other node and upgrade the cluster’s version in one shot.
ccs_tool update /etc/cluster/cluster.conf
19 What is CMAN_TOOL
It is a Cluster Manger tool, it can be used to view nodes and status of cluster
Cman_tool nodes
Cman_tool status
20 What is clusstat
Clusstat is used to see what state the cluster’s resources are in
21 What is clusvadm
Clusvadm is a tool to manage resource in a cluster
clusvcadm -e -m : Enable the on the specified . When a is not specified, the local node where the command was run is assumed.
clusvcadm -d -m : Disable the .
clusvcadm -l : Locks the prior to a cluster shutdown. The only action allowed when a is frozen is disabling it. This allows you to stop the so that rgmanager doesn’t try to recover it (restart, in our two services). Once quorum is dissolved and the cluster is shut down, the service is unlocked and returns to normal operation next time the node regains quorum.
clusvcadm -u : Unlocks a , should you change your mind and decide not to stop the cluster.
22 What is Luci_admin init
This command is run to create Luci Admin user and set password for it
Service luci start, chckconfig luci on
Default port for Luci web server is 8084

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>