Troubleshooting ServeRAID II Subsystems in a Cluster Environment
Problem: ServeRAID Administration Utility program shows physical devices as DDD state.
Action:
Note: It is normal for disks that have been moved or have failed over to be displayed in the DDD state if the
shared disk display feature has not been enabled.
In this case the disks shown in the DDD state are not really defective.
Problem: ServeRAID shared logical drives do not failover properly.
Action:
If the resource type is shown as physical disk, the localquorum option was not specified properly when MSCS was installed.
To correct this problem, you must reinstall a high-availability cluster solution using Microsoft Windows NT.
Refer to Chapter 3, 'Installing the ServeRAID II Adapter for a High-Availability Cluster Solution Using Windows NT' on
page 7 for instructions.
The SCSI heartbeat connection must be connected to the third channel of the ServeRAID adapter pair that has
the quorum drive connected to it. No disks can be installed on this heartbeat channel.
If you choose to move the quorum drive to another ServeRAID II adapter, ID level 5 you must also move the SCSI heartbeat cable on both
servers to the new quorum ServeRAID adapter pair.
For more information, see 'ServeRAID II Considerations'.
Problem: RAID level-5 logical disks cannot be accessed by the operating system after a failover.
Action: Use the ServeRAID Administration Utility program to check the state of the logical disk drive to ensure that it
is not blocked.
Using the utility program, select the logical disk drive and look for Blocked state Yes.
If the logical disk drive is blocked, make sure all physical disks that are part of the logical drive are in the ONL state. If all physical
disks are not in the ONL state, a disk might have gone bad during a failover or during the resynchronization process
after a failover.
Data integrity cannot be guaranteed in this case and the array has been blocked to prevent the
possibility of incorrect data being read from the logical drive.
Reinitialize and synchronize the logical drive and restore the data from a backup source. Depending on the type of
data contained on the logical drive and the availability of a recent backup copy, you can unblock the drive and
continue normal operation or replace/rebuild one or more DDD disks. However, if you do not reinitialize, synchronize
and restore the drive, be aware that some data on the disk drive could be lost or corrupted.
Problem: If one of the cluster servers fails and the surviving server takes over the cluster resources,
occasionally one or more of the IP address resources will stay in the ONL pending state for several minutes
after moving over to the surviving server. After this, the resource will go to the failed state and the following
error message will be displayed in the surviving server's system log (as viewed with the Event Viewer).
For Example:
Windows NT Event Log Message:
Date: ??? Event ID: 1069
Time: ??? Source: ClusSvc
User: N/A Type: Error
Computer: ??? Category: (4)
Description:
Cluster resource 'ip address resource name' failed
Action: No action is necessary to bring the resource online after the failover. After about three minutes MSCS will
successfully reattempt to bring this resource online on the surviving server. However, the following workaround will
reduce the time for the IP addresses to come online.
Note: Changes will take effect the next time the resource is brought online.
Problem: After one of the cluster servers has been shut down normally and the surviving server takes over
the cluster resources, occasionally one or more of the IBM ServeRAID logical disk resources will stay in the
online pending state for several minutes, after moving over to the surviving server (when viewed with the
Cluster Administrator).
After this, the resource will go to the failed state and the following error message will be displayed in the surviving server system log (as viewed with the Event Viewer).
For Example:
Windows NT Event Log Message:
Date: ??? Event ID: 1069
Time: ??? Source: ClusSvc
User: N/A Type: Error
Computer: ??? Category: (4)
Description:
Cluster resource 'IBM ServeRAID Logical Disk name' failed
Action: No action is necessary to bring the resource online after the failover.
MSCS will successfully reattempt to bring this resource online on the surviving server within about four minutes.
Problem: You cannot reinstall the ServeRAID Windows NT Cluster Solution.
If a previous version of IBM ServeRAID Cluster Solution has been uninstalled, when attempting to reinstall the IBM ServeRAID Windows
NT Cluster solution, a message incorrectly appears asking if you want to perform an upgrade.
Action: You must delete C3E76E53-F841-11D0-BFA1-08005AB8ED05 registry key.
To delete the registry key, do the following:
Problem: The following error message is displayed when running the IPSHAHTO program on a server. The
warning message is: "Warning: CONFIG_SYNC with 0xA0 command FAILED on Adapter #" and one or more
HSP/SHS devices are defined on adapter pairs or READY (RDY) devices are not part of any logical array
configuration on an adapter pair in a cluster.
Action: If all shared disk resources moved successfully when running the IPSHAHTO program, it is safe to ignore
the error message and no further action is required.
If shared disk resources fail to move when running the IPSHAHTO program, perform a low-level format on all
HSP/SHS and RDY devices that are not part of any logical array configuration on an adapter pair in a cluster.
Refer to the instructions on low-level formatting HSP/SHS drives in 'ServeRAID Adapter Installation and User's Guide',
(P/N 4227022) for further details.
Problem: Array identifiers and logical drive numbers might change during a failover condition.
Action: By design, the array identifiers and logical drive numbers may change during a failover condition.
Consistency between the merge identifiers and Windows NT sticky drive letters is maintained, while the ordering
process during a failover condition is controlled by the Microsoft Cluster Management Software and the available array
identifiers on the surviving server.
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.