Comands LVM VXVM
1 Fdisk,vxdctl Fdisk -l vxdctl enable
2 pvcreate,vxdisksetup pvcreate /dev/sdb* vxdisksetup -I
3 vgcreate,vxdg vgcreate oravg /dev/sdb* vxdg init oradg disk_0 disk_1
4 lvcreate,vxassist lvcreate -L +3G oravg ora_lv vxassist -g
5 mkfs.ext3,mkfs.vxfs mkfs.ext3 /dev/oravg/ora_lv mkfs.vxfs /dev/vx/rdsk/webdg/webvol
6 mount mount -t ext3 /dev/oravg/ora_lv /oravol mount -t vxfs /dev/vx/dsk/oradg/oravol /oravol/
lvm veritas
fdisk vxdctl enable
pvcreate vxdisksetup -i lun
vgcreate vxdg -g adddisk device=lun
lvcreate vxassit -g
mkfs mkfs -F vxfs
mount mount
/etc/fstab /etc/fstab
Linux = native multipathing
solaris = native multpathing
veritas = veritas dynamic multipathing
Veritas cluster Server
or the end of the week, we’re going to continue with the theme of sparse-but-hopefully useful information. Quick little “crib sheets” (preceding by paragraphs and paragraphs of stilted ramblings by the lunatic who pens this blog’s content 😉 For this Friday, we’re going to come back around and take a look at Veritas Cluster Server (VCS) troubleshooting. If you’re interested in more specific examples of problems, solutions and suggestions, with regards to VCS, check out all the VCS related posts from the past year or so. Hopefully you’ll be able to find something useful in our archives, as well. These simple suggestions should work equally well for Unix as well as Linux, if you choose to go the VCS route rather than some less costly one 🙂
And, here we go again; quick, pointed bullets of info. Bite-sized bits of troubleshooting advice that focus on solving the problem, rather than understanding it. That sounds awful, I know, but, sometimes, you have to get things done and, let’s face it, if it’s the job or your arse, who cares about the why? Leave that for philosophers and academics. Plus, since you fix problems so fast, you’ll have plenty of time to read up on the ramifications of your actions later 😉
The setup: Your site is down. It’s a small cluster configuration with only two nodes and redundant nic’s, attached network disk, etc. All you know is that the problem is with VCS (although it’s probably indirectly due to a hardware issue). Something has gone wrong with VCS and it’s, obviously, not responding correctly to whatever terrible accident of nature has occurred. You don’t have much more to go on than that. The person you receive your briefing from thinks the entire clustered server set up (hardware, software, cabling, power, etc) is a bookmark in IE 😉
Now, one by one, in a fashion that zigs on purpose, but has a tendency to zag, here are a few things to look at right off the bat when assessing a situation like this one. Perhaps next week, we’ll look into more advanced troubleshooting (and, of course, you can find lots of specific “weird VCS problem” solutions in our VCS archives)
1. Check if the cluster is working at all.
Log into one of the cluster nodes as root (or a user with equivalent privilege – who shouldn’t exist 😉 and run
host1 # hastatus –summary
or
host1 # hasum <-- both do the same thing, basically
Ex:
host1 # hastatus -summary
-- SYSTEM STATE
-- System State Frozen
A host1 RUNNING 0
A host2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService host1 Y N OFFLINE
B ClusterService host2 Y N ONLINE
B SG_NIC host1 Y N ONLINE
B SG_NIC host2 Y N OFFLINE
B SG_ONE host1 Y N ONLINE
B SG_ONE host2 Y N OFFLINE
B SG_TWO host1 Y N OFFLINE
B SG_TWO host2 Y N OFFLINE
Clearly, your situation is bad: A normal VCS status should indicate that all nodes in the cluster are “RUNNING” (which these are). However, it should also show all service groups as being ONLINE on at least one of the nodes, which isn't the case above with SG_TWO (Service Group 2).
2. Check for cluster communication problems. Here we want to determine if a service group is failing because of any heartbeat failure (The VCS cluster, that is, not another administrator 😉
Check on GAB first, by running:
host1 # gabconfig -a
Ex:
host1 # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 3a1501 membership 01
Port h gen 3a1505 membership 01
This output is okay. You would know you had a problem at this point if any of the following conditions were true:
if no port “a” memberships were present (0 and 1 above), this could indicate a problem with gab or llt (Looked at next)
If no port "h" memberships were present (0 and 1 above), this could indicate a problem with had.
If starting llt causes it to stop immediately, check your heartbeat cabling and llt setup.
Try starting gab, if it's down, with:
host1 # /etc/init.d/gab start
If you're running the command on a node that isn't operational, gab won't be seeded, which means you'll need to force it, like so:
host1 # /sbin/gabconfig -x
3. Check on LLT, now, since there may be something wrong there (even though it wasn't indicated above)
LLT will most obviously present as a crucial part of the problem if your "hastatus -summary" gives you a message that it "can't connect to the server." This will prompt you to check all cluster communication mechanisms (some of which we've already covered).
First, bang out a quick:
host1 # lltconfig
on the command line to see if llt is running at all.
If llt isn't running, be sure to check your console, system messages file (syslog, possibly messages and any logs in /var/log/VRTSvcs/... - usually the "engine log" is worth a quick look) As a rule, I usually do
host1 # ls -tr
when I'm in the VCS log directory to see which log got written to last, and work backward from there. This puts the most recently updated file last in the listing. My assumption is that any pertinent errors got written to one of the fresher log files 🙂 Look in these logs for any messages about bad llt configurations or files, such as /etc/llttab, /etc/llthost and /etc/VRTSvcs/conf/sysname. Also, make sure those three files contain valid entries that "match" <-- This is very important. If you refer to the same facility by 3 different names, even though they all point back to the same IP, VCS can become addled and drop-the-ball.
Examples of invalid entries in LLT config files would include "node numbers" outside the range of 0 to 31 and "cluster numbers" outside the range of 0 to 255.
Now, if LLT "is" running, check its status, like so:
host # lltstat -wn <-- This will let you know if llt on the separate nodes within the cluster can communicate with one another.
Of course, verify physical connections, as well. Also, see our previous post on dlpiping for more low-level-connection VCS troubleshooting tips.
Ex:
host1 # lltstat -vvn
LLT node information:
Node State Link Status Address
0 prsbn012 OPEN
ce0 DOWN
ce1 DOWN
HB172.1 UP 00:03:BA:9D:57:91
HB172.2 UP 00:03:BA:0E:F1:DE
HB173.1 UP 00:03:BA:9D:57:92
HB173.2 UP 00:03:BA:0E:D0:BE
1 prsbn015 OPEN
ce3 UP 00:03:BA:0E:CE:09
ce5 UP 00:03:BA:0E:F4:6B
HB172.1 UP 00:03:BA:9D:5C:69
HB172.2 UP 00:03:BA:0E:CE:08
HB173.1 UP 00:03:BA:0E:F4:6A
HB173.2 UP 00:03:BA:9D:5C:6A
host1 # cat /etc/llttab <-- pardon the lack of low-pri links. We had to build this cluster on the cheap 😉
set-node /etc/VRTSvcs/conf/sysname
set-cluster 100
link ce0 /dev/ce:0 - ether 0x1051 -
link ce1 /dev/ce:1 - ether 0x1052 -
exclude 7-31
host1 # cat /etc/llthosts
0 host1
1 host2
host1 # cat /etc/VRTSvcs/conf/sysname
host1
If llt is down, or you think it might be the problem, either start it or restart it with:
host1 # /etc/init.d/llt.rc start
or
host1 # /etc/init.d/llt.rc stop
host1 # /etc/init.d/llt.rc start
And, that's where we'll end it today. There's still a lot more to cover (we haven't even given the logs more than their minimum due), but that's for next week.
Until then, have a pleasant and relaxing weekend 🙂
Veritas Cluster Server (VCS) Command line
VCS has can be divided into two important parts
Cluster Communication:
Low Latency Transport (LLT) and Global Atomic Broadcast (GAB) are responsible for heartbeat and cluster communication.
LLT status
lltconfig -a list – List all the MAC addresses in cluster
lltstat -l – Lists information about each configured LLT link
lltstat [-nvv|-n] – Verify status of links in cluster
Starting and stopping LLT
lltconfig -c – Start the LLT service
lltconfig -U – stop the LLT running
GAB status
gabconfig -a – List Membership, Verify id GAB is operating
gabdiskhb -l – Check the disk heartbeat status
gabdiskx -l – lists all the exclusive GAB disks and their membership information
Starting and stopping GAB
gabconfig -c -n seed_number – Start the GAB
gabconfig -U – Stop the GAB
HAD:
Stands for High Availability daemon. HAD is responsible for all the cluster functionality.
The commands for Veritas start with “ha” meaning high availability. For example, ‘hastart’, ‘hastop’, ‘hares’ etc. Listed below are commands sorted by category which are used for most day to day operation/management of VCS.
Cluster Status
hastatus -summary – Outputs the status of cluster
hasys -display – Displays the cluster operation status
Start or Stop services
hastart [-force|-stale] – ‘force’ is used to load local configuration
hasys -force 'system' – start the cluster using config file from the mentioned “system”
hastop -local [-force|-evacuate] – ‘local’ option will stop the service only on the system you type the command
hastop -sys 'system' [-force|-evacuate] – ‘sys’ stops had on the system you specify
hastop -all [-force] – ‘all’ stops had on all systems in the cluster
Change VCS Configuration online
haconf -makerw – makes VCS configuration in read/write mode
haconf -dump -makero – Dumps the configuration changes
Agent Operations
haagent -start agent_name -sys system – Starts an agent
haagent -stop agent_name -sys system – Stops an agent
Cluster Operations
haclus -display – Displays cluster information and status
haclus -enable LinkMonitoring – Enables heartbeat link monitoring in the GUI
haclus -disable LinkMonitoring – Disables heartbeat link monitoring in the GUI
Add and Delete Users
hauser -add user_name – Adds a user with read/write access
hauser -add VCSGuest – Adds a user with read-only access
hauser -modify user_name – Modifies a users password
hauser -delete user_name – Deletes a user
hauser -display [user_name] – Displays all users if username is not specified
System Operations
hasys -list – List systems in the cluster
hasys -display – Get detailed information about each system
hasys -add system – Add a system to cluster
hasys -delete system – Delete a system from cluster
Resource Types
hatype -list – List resource types
hatype -display [type_name] – Get detailed information about a resource type
hatype -resources type_name – List all resources of a particular type
hatype -add resource_type – Add a resource type
hatype -modify .... – Set the value of static attributes
hatype -delete resource_type – Delete a resource type
Resource Operations
hares -list – List all resources
hares -dep [resource] – List a resource’s dependencies
hares -display [resource] – Get detailed information about a resource
hares -add resource_type service_group – Add a resource
hares -modify resource attribute_name value – Modify the attributes of the new resource
hares -delete resource – Delete a resource type
hares -online resource -sys systemname – Online a resource, type
hares -offline resource -sys systemname – Offline a resource, type
hares -probe resource -sys system – Cause a resource’s agent to immediately monitor the resource on a particular system
hares -clear resource [-sys system] – Clear a faulted resource
hares -local resource attribute_name value – Make a resource’s attribute value local
hares -global resource attribute_name value – Make a resource’s attribute value global
hares -link parent_res child_res – Specify a dependency between two resources
hares -unlink parent_res child_res – Remove the dependency relationship between two resources
Service Group Operations
hagrp -list – List all service groups
hagrp -resources [service_group] – List a service group’s resources
hagrp -dep [service_group] – List a service group’s dependencies
hagrp -display [service_group] – Get detailed information about a service group
hagrp -online groupname -sys systemname – Start a service group and bring it resources
hagrp -offline groupname -sys systemname – Stop a service group and take it resources offline
hagrp -switch groupname -to systemname – Switch a service group from one system to another
hagrp -freeze -sys -persistent – Gets into Maintenance Mode. Freeze a service group. This will disable online and offline operations
hagrp -unfreeze -sys -persistent] – Take the servicegroup out of maintenance mode
hagrp -enable service_group [-sys system] – Enable a service group
hagrp -disable service_group [-sys system] – Disable a service group
hagrp -enableresources service_group – Enable all the resources in a service group
hagrp -disableresources service_group – Disable all the resources in a service group
hagrp -link parent_group child_group relationship – Specify the dependency relationship between two service groups
hagrp -unlink parent_group child_group – Remove the dependency between two service groups
VCS Startup Process
Please verify that the Cables are setup for heartbeat network. You can tcpdump from one server NIC’s MAC Address to another to verify the connectivity
Step 1:
LLT (Low latency Transport) should be startup first using the “lltstat -c” command. It reads /etc/llttab and /etc/llthosts files and establishes heartbeat network. Heartbeat network is a private network where VCS status information is exchanged by all systems within a VCS cluster. These networks require each system in the cluster to have a dedicated NIC, connected to a private hub. VCS requires a minimum of two dedicated communication channels between each system in a cluster. LLT is a low overhead networking protocol that runs in the kernel. Because it runs in the kernel, it is capable of handling kernel to kernel communications.
Examples of files are as below:
#cat /etc/llthosts
0
1
.
.
n
In the Example below, the linux systems will have interface names such as “eth0/1?. If using any device, then replace “ce” with “qfe0/1? etc..
#cat /etc/llttab
set-node
set-cluster
link ce2 /dev/ce:0 – ether – –
link ce3 /dev/ce:3 – ether – –
link-lowpri ce4 /dev/ce:4 – ether – –
start
Verification for startup can be done using lltstat -n command. “*” represents firstnode (master) in the cluster
#lltstat -n
Node State Links
* 0
1
2
.
.
n
Step 2:
GAB (Group Atomic Board) starts next. It executes /etc/gabtab and checks for other GABs to establish a cluster membership. GAB runs over Low Latency Transport (LLT) and uses broadcasts to distribute cluster configuration information and ensure that each system has a synchronized view of the cluster, including the state of each system, service group, and resource.
# cat /etc/gabtab
/sbin/gabconfig -c -n5 # for a 5 node cluster
GAB can be started using “gabconfig -c” and verified by using “gabconfig -a”. Below is the example output for a 5 node cluster. Port ‘a’ runs GAB service and port ‘h’ runs VCS Deamon
# gabconfig -a
GAB Port Memberships
=========================================
Port a gen 11ff05 membership 012345
Port h gen 11ff09 membership 012345
Step 3:
After both LLT and GAB are loaded, hashadow starts which will load HAD (High Availability Deamon). HAD reads /etc/VRTSvcs/conf/config/main.cf, types.cf and all include.cf’s mentioned in main.cf file.
HAD checks if there are other HADs avaible and registers them with GAB. If there are no other HADs, it loads the main.cf again into HAD memory.Same process will happen when HAD starts on
other nodes. The HAD on the first node will load the main.cf and other include.cf files from the local system and all other HADs will load configuration from the first HAD.
After starting up, HAD will know all the service groups and resources from main.cf. It will call the respective agents to check if the resources are currently online or offline. Based on
main.cf, HAD will online/offline the Service group on the respective nodes.
Cluster is started up with “hastart” command. The status can be verified using “hastatus -sum”
VCS Logfile: /var/
Setup SAN disk for use in a Linux Veritas cluster
For this particular exercise we’re going to go through the entire process of provisioning disk for use in a VCS cluster.
We will use EMC Symmetrix disk zoned and masked to a RHEL 4u6 host as the foundation.
Get the disk(s) presented to the host observing that it’s visible down multiple paths.
# inq -showvol
Inquiry utility, Version V7.3-771 (Rev 0.0) (SIL Version V6.3.0.0 (Edit Level 771)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.
—————————————————————————–
DEVICE :VEND :PROD :REV :SER NUM :Volume :CAP(kb)
—————————————————————————–
/dev/sda :EMC :SYMMETRIX :5771 :0123456789 : 00617: 2880
/dev/sdb :EMC :SYMMETRIX :5771 :0123456789 : 00204: 35654400
/dev/sdc :EMC :SYMMETRIX :5771 :0123456789 : 00206: 35654400
/dev/sdd :EMC :SYMMETRIX :5771 :0123456789 : 00208: 35654400
/dev/sde :EMC :SYMMETRIX :5771 :0123456789 : 0020A: 35654400
/dev/sdf :EMC :SYMMETRIX :5771 :0123456789 : 0020C: 35654400
/dev/sdg :EMC :SYMMETRIX :5771 :0123456789 : 0020E: 35654400
/dev/sdh :EMC :SYMMETRIX :5771 :0123456789 : 00210: 35654400
/dev/sdi :EMC :SYMMETRIX :5771 :0123456789 : 00212: 35654400
/dev/sdj :EMC :SYMMETRIX :5771 :0123456789 : 00214: 35654400
/dev/sdk :EMC :SYMMETRIX :5771 :0123456789 : 00263: 35654400
/dev/sdl :EMC :SYMMETRIX :5771 :0123456789 : 00265: 35654400
/dev/sdm :EMC :SYMMETRIX :5771 :0123456789 : 00267: 35654400
/dev/sdn :EMC :SYMMETRIX :5771 :0123456789 : 00269: 35654400
/dev/sdo :EMC :SYMMETRIX :5771 :0123456789 : 0026B: 35654400
/dev/sdp :EMC :SYMMETRIX :5771 :0123456789 : 00617: 2880
/dev/sdq :EMC :SYMMETRIX :5771 :0123456789 : 00204: 35654400
/dev/sdr :EMC :SYMMETRIX :5771 :0123456789 : 00206: 35654400
/dev/sds :EMC :SYMMETRIX :5771 :0123456789 : 00208: 35654400
/dev/sdt :EMC :SYMMETRIX :5771 :0123456789 : 0020A: 35654400
/dev/sdu :EMC :SYMMETRIX :5771 :0123456789 : 0020C: 35654400
/dev/sdv :EMC :SYMMETRIX :5771 :0123456789 : 0020E: 35654400
/dev/sdw :EMC :SYMMETRIX :5771 :0123456789 : 00210: 35654400
/dev/sdx :EMC :SYMMETRIX :5771 :0123456789 : 00212: 35654400
/dev/sdy :EMC :SYMMETRIX :5771 :0123456789 : 00214: 35654400
/dev/sdz :EMC :SYMMETRIX :5771 :0123456789 : 00263: 35654400
/dev/sdaa :EMC :SYMMETRIX :5771 :0123456789 : 00265: 35654400
/dev/sdab :EMC :SYMMETRIX :5771 :0123456789 : 00267: 35654400
/dev/sdac :EMC :SYMMETRIX :5771 :0123456789 : 00269: 35654400
/dev/sdad :EMC :SYMMETRIX :5771 :0123456789 : 0026B: 35654400
See what disks Veritas can see.
vxdisk -o alldgs list
Initialize the disk for the first time. This needs to be repeated for each individual disk.
/etc/vx/bin/vxdisksetup -i DEVICE format=cdsdisk
See if the intialize worked correctly.
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
EMC0_0 auto:cdsdisk – (dg_grp) online
EMC0_1 auto:cdsdisk – (dg_grp) online
EMC0_2 auto:cdsdisk – (dg_grp) online
EMC0_3 auto:cdsdisk – (dg_grp) online
EMC0_4 auto:cdsdisk – (dg_grp) online
EMC0_5 auto:cdsdisk – (dg_grp) online
EMC0_6 auto:cdsdisk – (dg_grp) online
EMC0_7 auto:cdsdisk – (dg_grp) online
EMC0_8 auto:cdsdisk – (dg_grp) online
EMC0_9 auto:cdsdisk – (dg_grp) online
EMC0_10 auto:cdsdisk – (dg_grp) online
EMC0_11 auto:cdsdisk – (dg_grp) online
EMC0_12 auto:cdsdisk – (dg_grp) online
EMC0_13 auto:cdsdisk – (dg_grp) online
cciss/c0d0 auto:none – – online invalid
All device(s) (e.g. EMC0_n) now show as online.
Create the disk group.
vxdg init dg_name dg_internal_name01=DEVICE
The dg_name is the name of your disk group while dg_internal_name01 is the name of the first disk. In our case dg_internal_name01=EMC0_0.
Add any additional disk to the disk group.
vxdg -g dg_name adddisk dg_internal_name02=EMC0_n+1
Note that EMC0_n+1 is the next free disk that you are attempting to add. So dg_internal_name02=EMC0_1 (remember we started with EMC0_0).
To create the volume
vxassist -g dg_name make lv_name [size] dg_internal_nameN
Repeat as necessary.
Finally, create the file system
mkfs -t vxfs /dev/vx/rdsk/dg_name/lv_name
At this point the volumes are now available to be defined as a mount resource in VCS.
Recent Comments