From HACMP 4.4.1 Admin Guide SC23-4279-02 Page 387

Handling Disk Failures

  Handling shared disk failures differs depending on the type of disk and whether it is a
  concurrent access configuration.
   • SCSI non-RAID—You will have to shut down the nodes sharing the disks.
   • SCSI RAID—Perhaps no downtime, depends on the capabilities of the array.
   • SSA non-RAID—Requires manual intervention. You can replace disks with no system
     downtime.
   • SSA RAID—No downtime necessary.

   See Chapter 5: Maintaining Shared LVM Components in a Concurrent Access Environment,
for information on that procedure.

Replacing a Failed SSA Non-RAID Disk

Note: AIX 4.3.3 has a new utility, replacepv. See the AIX manpage to see if you can use this utility to replace your failed disk. This section describes the process of replacing a mirrored disk drive using C-SPOC commands. The sample cluster is set up as follows: Two nodes, NodeB and NodeA • Two cascading resource groups, each node has top priority for one of the resource groups. • NodeArg: NodeA, NodeB • NodeBrg: NodeB, NodeA • One shared Volume Group: vg1 (includes hdisk1 and hdisk2) • vg1 is included in the resource group NodeArg • vg1 has one mirrored logical volume (lv01) and one filesystem: (/fs1) • The logical volume is mirrored on the two disks in the volume group: >lslv -m lv01 lv01:/fs1 LP PP1 PV1 PP2 PV2 PP3 PV3 0001 0111 hdisk2 0056 hdisk1 • Initial disk configuration on both nodes: hdisk# pdisk# vg defined? hdisk1 pdisk0 vg1 hdisk2 pdisk1 vg1 hdisk3 pdisk2 None When HACMP starts on the cluster, NodeA varies on vg1: >lsvg -o NodeB: NodeA: vg1 Procedure to Replace Failed Disk 1. Unmirror a shared vg: a. From NodeB, execute: smitty cl_admin / Cluster LVM/ Shared vg / Unmirror a shared vg b. Select: NodeArg vg1 These choices appear: None NodeA hdisk1 NodeA hdisk2 c. Select: NodeA hdisk2 d. lslv on NodeA shows logical volume is no longer mirrored: >lslv -m lv01 lv01:/fs1 LP PP1 PV1 PP2 PV2 PP3 PV3 0001 0056 hdisk1 2. Remove a physical volume from a shared vg a. From NodeB execute: smitty cl_admin /Cluster LVM/Shared vg /set vg chars/remove a pv from vg b. Select: NodeArg vg1 These choices appear: NodeA hdisk1 NodeA hdisk2 c. Select: NodeA hdisk2 d. Force deallocation of physical partitions = yes. e. On both nodes, lspv shows hdisk2 is removed from vg1: hdisk1 vg1 hdisk2 none hdisk3 none 3. Remove a disk from the cluster a. From NodeB execute: smitty cl_admin / Cluster Physical VM/ remove a disk from the cluster b. Select: NodeA NodeB These choices appear: 0cc6 0578 c. Select: 0578 (hdisk2) d. Do not keep the definition in the database. e. On both nodes, lspv shows hdisk1 is removed: hdisk1 vg1 hdisk3 none f. On both nodes pdisk1 is still present and unpaired: hdisk1: pdisk0 hdisk3: pdisk2 pdisk0: hdisk1 pdisk1: pdisk2: hdisk3 4. Pull the failed disk and replace it with a new disk. 5. Add a disk to the cluster: a. From NodeB, execute: smitty cl_admin / Cluster Physical VM/ add a disk to the cluster b. Select: NodeA NodeB c. Select: hdisk ssar SSA logical disk drive connection address: (from F4) 0004ac7c036500d d. On both nodes, lspv shows new disk added as hdisk2: hdisk1 vg1 hdisk2 none hdisk3 none e. On both nodes, ssaxlate shows hdisk2 is unpaired with a pdisk: hdisk1: pdisk0 hdisk2: hdisk3: pdisk2 pdisk0: hdisk1 pdisk1: pdisk2: hdisk3 6. Add a disk to a shared vg: a. Execute: smitty cl_admin /Cluster LVM/shared vg/set chars of vg/add pv to vg b. Select: NodeArg vg1 These choices appear: NodeA hdisk2 NodeA hdisk3 c. Select: NodeA hdisk2 d. All data on the physical volume will be destroyed. Continue? yes. e. On both nodes, lspv shows hdisk2 added to vg1: hdisk1 vg1 hdisk2 vg1 hdisk3 none 7. Mirror a shared vg: a. Execute: smitty cl_admin / Cluster LVM/ shared vg/ mirror a shared vg: b. Select: NodeArg vg1 These choices appear: None NodeA hdisk1 NodeA hdisk2 c. Select: NodeA hdisk2 d. On NodeA (where vg is varied on) lslv shows lv is mirrored: lslv -m lv01 lv01:/fs1 LP PP1 PV1 PP2 PV2 PP3 PV3 0001 0056 hdisk1 0056 hdisk2