NETFINITY FIBRE: RECOVERY PROCEDURES FOR FAILED DRIVES


Subject: NETFINITY FIBRE: RECOVERY PROCEDURES FOR FAILED DRIVES
New Netfinity server RETAIN Tip: 

    Record number:       H201156
    Device:              D/T8681
    Model:               M
    Hit count:           UHC00000
    Success count:       USC0000
    Publication code:    PC50
    Tip key:
    Date created:        O99/12/08
    Date last altered:   A99/12/30
    Owning B.U.:         USA


Abstract: NETFINITY FIBRE: RECOVERY PROCEDURES FOR FAILED DRIVES


SYMPTOM:

The Netfinity Fibre Channel RAID Controller has at least one failed drive.


PROBLEM ISOLATION AIDS:


FIX:

NOTICE: The following information applies only to drives that are part of the same LUN.

REMEMBER: It is not always obvious when using fiber which drives are part of which LUN.
This can be compounded when there is more than one failed drive on the system.

If there is any question whatsoever, contact your appropriate geography's HelpCenter,
(e.g. in the U.S., contact the IBM HelpCenter at 1-800-772-2227) before taking any corrective action.

Note: "Worldwide HelpCenter Phone List" is available at the following URL:

http://www.pc.ibm.com/support?page=YAST-3P2QYL

******************** Important Guidelines ********************

DRIVE REPLACEMENT (rebuilding a defunct drive):


The proper sequences are:

Power Up: Power on the drive enclosures first, then the fibre controller, then the host system.
Power Down: Power Down the host system first, then the fibre controller, then the drive enclosures.



NOTICE: Before you rebuild a drive, review the following guidelines and general information.


GUIDELINES FOR THE REBUILD OPERATION:

REMEMBER: The replacement hard disk drive must have a capacity equal to or greater than the failed drive.

GENERAL INFORMATION (about the rebuild operation):

A physical hard disk drive can enter the rebuild state if:

You physically replace a failed drive that is part of the critical logical drive.

When you physically replace a failed drive in a critical logical drive, the
Netfinity Fibre controller rebuilds the data on the new physical drive before
it changes the logical drive state back to Optimal.

Recovering from a MULTIPLE Drive Failure:

The Netfinity Fibre controller will rebuild a defunct drive automatically when all of the following conditions exist:

A hot-spare drive with a capacity equal to or greater than the capacity of
the defunct drive is available the moment the drive fails.

A drive of the same or equal capacity is replaced in the same location and
the LUN is in a sufficient state (degraded) to rebuild the data.

When the Netfinity Fibre controller communicates with the hardfile and receives an unexpected response, the controller will mark the drive failed in order to avoid any potential data loss.
For example, this could occur in the event of a power loss to any of the components in the SCSI Netfinity Fibre subsystem.
In this case, the Netfinity Fibre controller will err on the side of safety and will no longer write to that drive, although the drive may not be defective in any way.


For multiple failed drives in a RAID 1 and RAID 5 array, data is lost. Data recovery may be attempted by bringing all but the first drive that was marked failed back to the Optimal state.

Please contact the HelpCenter for help in determining the failure order of the drives.

  1.  Undefine any HSP's. An automatic rebuild may destroy the data if the drives are revived in the wrong order.
  2.  Revive all but the first failed drive (timewise) in the array.
     The LUN should be in a degraded state. If this cannot be determined, please contact the HelpCenter for assistance.
  3.  From the operating system, perform a CHKDSK in read only mode.
     This will ensure that the drives have been revived in the correct order, and that the data is intact.

    NOTE: If the CHKDSK does not complete successfully, then you may have incorrectly revived the drives in the wrong order.

     Do NOT remove power from the controller or the drives.
     Contact the Help Center immediately for assistance in attempting to recover the data.

  4.  Perform a backup of the now known good data.
     This will ensure that if the problem happens again before root cause of the failure is found, there will be a minimal amount of data lost.
  5.  Once the backup of data has been completed, reconstruct the last drive in the LUN.
     This should return the LUN to an Optimal status, and will better protect the data until root cause can be found.
  6.  Once the rebuild completes, a cause for the failed drives such as a bad cable, backplane, etc. must be identified.
     If the reconstruction fails, the last drive may be the source of the error.

    NOTE: If you do not know the order that the Drives went off line, do not Revive any drives until you have  established the order that the drives failed.
     Do NOT remove power from the controller or the drives.

  7.  If no cause can be found, contact the HelpCenter for assistance in determining root cause.


Back to  Jump to TOP-of-PAGE

Please see the LEGAL  -  Trademark notice.
Feel free - send a Email-NOTE  for any BUG on this page found - Thank you.