Troubleshooting ServeRAID Subsystems in a Cluster Environment
Problem: ServeRAID Administration Utility program shows physical devices as DDD state.
Action:
Note: It is normal for disks that have been moved or have failed over to be displayed
in the DDD state if the shared disk display feature has not been enabled.
In this case the disks shown in the DDD state are not really defective.
Problem: ServeRAID shared logical drives do not failover properly.
Action:
If the resource type is shown as "physical disk", the localquorum option was not specified properly when MSCS was installed.
To correct this problem, you must reinstall a high-availability cluster solution using Microsoft Windows NT.
Refer to Chapter 4, 'Installing a High-Availability Cluster Solution Using Windows NT' for instructions.
The SCSI heartbeat connection must be connected to the third channel of the ServeRAID adapter pair that has the quorum drive connected to it.
No disks can be installed on this heartbeat channel.
If you choose to move the quorum drive to another ServeRAID adapter, ID level 5 you must also move the SCSI heartbeat cable on both servers to the new
quorum ServeRAID adapter pair. For more information, see 'ServeRAID Considerations'.
Problem: RAID level 5 logical disks cannot be accessed by the operating system after a failover.
Action: Use the ServeRAID Administration Utility program to check the state of the
logical disk drive to ensure that it is not blocked. Using the utility program, select the logical disk drive and look for Blocked state Yes.
If the logical disk drive is blocked, make sure all physical disks that are part of the logical drive are in the ONL state. If all physical disks are not in the ONL
state, then a disk might have gone bad during a failover or during the resynchronization process after a failover.
Data integrity cannot be guaranteed in this case and the array has been blocked to prevent the possibility of incorrect data being read from the logical drive.
Reinitialize and synchronize the logical drive and restore the data from a back up source.
Depending on the type of data contained on the logical drive and the availability of a recent back up copy, you can unblock the drive
and continue normal operation or replace/rebuild one or more DDD disks.
However, if you do not reinitialize, synchronize and restore the drive, be aware that some data on the disk drive could be lost or corrupted.
Problem: If one of the cluster nodes fails and the surviving node takes
over the cluster resources, occasionally one or more of the IP address resources will stay in the ONL pending state for several minutes after
moving over to the surviving node. After this, the resource will go to the failed state and the following error message will be displayed in the
surviving node's system log (as viewed with the Event Viewer).
For Example: NT Event Log Message:
Date: ??? Event ID: 1069
Time: ??? Source: ClusSvc
User: N/A Type: Error
Computer: ??? Category: (4)
Description:
Cluster resource 'ip address resource name' failed
Action: Do the following:
A message appears stating that the resource must be restarted for the change to take effect.
Problem: After one of the cluster nodes (servers) has been shutdown normally
and the surviving node (server) takes over the cluster resources, occasionally one or more of
the IBM ServeRAID logical disk resources will stay in the 'online pending' state for several minutes,
after moving over to the surviving node (server) (when viewed with the Cluster Administrator).
After this, the resource will go to the 'failed state' and the following error message will
be displayed in the surviving node's (server's) system log (as viewed with the Event Viewer).
For Example: NT Event Log Message:
Date: ??? Event ID: 1069
Time: ??? Source: ClusSvc
User: N/A Type: Error
Computer: ??? Category: (4)
Description:
Cluster resource 'IBM ServeRAID Logical Disk name' failed
Action: No action is necessary to bring the resource online after the failover.
MSCS will successfully re-attempt to bring this resource online on the surviving node within about four minutes.
Problem: You cannot reinstall the ServeRAID Windows NT Cluster Solution.
If a previous version of IBM ServeRAID Cluster Solution has been uninstalled, when attempting to reinstall
the IBM ServeRAID Windows NT Cluster solution, a message incorrectly appears asking if you want to perform an upgrade.
Action: You must delete C3E76E53-F841-11D0-BFA1-08005AB8ED05 registry key.
To delete the registry key, do the following:
Problem: The following error message is displayed when running the IPSHAHTO program on a node.
The warning message is: "Warning: CONFIG_SYNC with 0xA0 command FAILED on Adapter #" and one or
more HSP/SHS devices are defined on adapter pairs or READY (RDY) devices are not part of any logical array configuration on
an adapter pair in a cluster.
Action: If all shared disk resources moved successfully when running the IPSHAHTO program,
it is safe to ignore the error message and no further action is required.
If shared disk resources fail to move when running the IPSHAHTO program, then perform a low-level format
on all HSP/SHS and RDY devices that are not part of any logical array configuration on an adapter pair in a cluster.
Refer to the instructions on low-level formatting HSP/SHS drives in
ServeRAID Adapter Installation and User's Guide (P/N4227022) for further details.
Problem: Array identifiers and logical drive numbers might change during a failover condition.
Action: By design, the array identifiers and logical drive numbers may change during a failover condition.
Consistency between the merge identifiers and Windows NT sticky drive letters is maintained, while the ordering process during
a failover condition is controlled by the Microsoft Cluster Management Software and the available array identifiers on the
surviving node.
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.