Impending Drive Failure (High Data Availability Risk)

What Caused the Problem?

A drive is reporting internal errors that could cause the drive to fail. If this drive fails before you follow these recovery steps, the logical drives in the array will fail and all data on the logical drives will be lost. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.

Caution
Resolve this problem immediately, because data will be lost if the indicated drive fails before you follow these recovery steps.

Caution
Electrostatic discharge can damage sensitive components. Use a grounding wrist strap or other anti-static precautions before removing or handling components.

Important Notes

If the current status/RAID level of the logical drives is...

Then go to...

Optimal RAID 0

Recovering RAID 0

Degraded RAID 1, 3, or 5

Recovering Degraded Logical Drives

RAID 1, 3, or 5 with a hot spare drive currently being reconstructed

Recovering with a Reconstructing Hot Spare

Recovering RAID 0

Use the following procedure if the affected logical drives are RAID 0.

Recovery Steps

1

Stop all I/O to the affected logical drives.

2

Back up all data on the affected logical drives. (Step 4 will destroy all data on the affected logical drives.)

Note: To the operating system (OS), a failed logical drive is exactly the same as a failed non-RAID drive. Refer to the OS documentation for any special requirements concerning failed drives and perform them where necessary.

3

Note: If you have FlashCopy logical drives associated with the affected logical drives, these FlashCopy logical drives will no longer be valid once you fail the drive in step 4.

Perform any necessary operations on the FlashCopy logical drives and then delete them.

4

Highlight the affected drive in the Physical View of the Subsystem Management Window and select Drive>>Fail. The affected logical drives become Failed .

5

Remove the failed drive (its fault indicator light should be on).

6

Wait 30 seconds, then insert the new drive. Its fault indicator light may be lit for a short time (one minute or less).

Note: Wait until the new drive is ready (its fault indicator light must be off) before attempting to initialize the logical drives in step 7.

7

Highlight the array associated with the replaced drive in the Logical View of the Subsystem Management Window and select Array>>Initialize.

  • The logical drives in the array are initialized, one at a time.
  • To monitor initialization progress for a logical drive, highlight the logical drive in the Logical View of the Subsystem Management Window and select Logical Drive>>Properties. Note that once the operation in progress has completed, the progress bar is no longer displayed in the Properties dialog.
  • When initialization is completed, all logical drives in the array are Optimal .

Note: Make sure you save this procedure by selecting Save As because once you perform step 7 and the failure is fixed, you will not be able to access the information in steps 10 and 11 from the Recovery Guru.

8

If desired, create any FlashCopys that you deleted in Step 3.

9

Select Recheck to rerun the Recovery Guru to ensure that the failure has been fixed.

10

Add the affected logical drives back to the operating system. You may need to reboot the system to see the re-initialized logical drives.

Note: Do not start I/O to these logical drives until after you restore from backup.

11

Restore the data for the affected logical drives from backup.

Recovering Degraded Logical Drives

Use the following procedure if the affected logical drives are degraded RAID 1, 3, or 5. You will need two replacement drives for this procedure.

Caution

An Impending Drive Failure means that the affected drive is likely to fail. If it fails while you are replacing the failed drive (steps 3 and 4 below), you will lose all data on the affected logical drives.

Recovery Steps

1

Although it is not required, you should stop all I/O to the affected logical drives to reduce the possibility of data loss.

2

Although it is not required, you should back up all data on the affected logical drives.

3

Remove the failed drive (the drive currently failed, not the drive with the Impending Drive Failure). The fault indicator light for the drive should be on.

4

Wait 30 seconds, then insert the new drive. Its fault indicator light may be lit for a short time (one minute or less).

  • Data reconstruction should begin on the new drive (its fault light will go off and the activity indicator lights of the other drives in the array will start blinking).
  • The icon for each logical drive in the affected array changes to Optimal , after the logical drive is reconstructed.
  • To monitor the progress of reconstruction on the affected logical drives or change the reconstruction rate, highlight the logical drive in the Logical View of the Subsystem Management Window; then, select Logical Drive>>Properties. Note that once the operation in progress has completed, the progress bar is no longer displayed in the Properties dialog.
  • If the drive with the Impending Drive Failure fails while data is being reconstructed on the replaced drive, all affected logical drives will fail, and all data will be lost. If this occurs, rerun Recovery Guru and fix the errors reported.

5

Wait until all affected logical drives have returned to an Optimal status. Resume I/O to the affected logical drives, if you stopped it in step 1.

6

Highlight the Impending Drive Failure drive in the Physical View of the Subsystem Management Window and select Drive>>Fail. The logical drives in the array return to a Degraded state.

7

Remove the failed drive (its fault indicator light should be on).

8

Wait 30 seconds, then insert the new drive. Its fault indicator light may be lit for a short time (one minute or less).

9

Select Recheck to rerun the Recovery Guru to ensure that the failure has been fixed.

Recovering with a Reconstructing Hot Spare

Use the following procedure if all of the following conditions apply:

Caution

An Impending Drive Failure means that the affected drive is likely to fail. If it fails while the hot spare is reconstructing, you will lose all data on the affected logical drives. For this reason, you should stop all I/O to the affected logical drives and back up all data on the affected logical drives before replacing the drives.

Recovery Steps

1

Although it is not required, you should stop all I/O to the affected logical drives to reduce the possibility of data loss.

2

Although it is not required, you should back up all data on the affected logical drives.

3

Wait for the hot spare drive to finish reconstructing.

  • All logical drive icons will be Optimal when reconstruction is completed.
  • If the affected drive fails during reconstruction, the affected logical drives will fail, and all data is lost. If this occurs, rerun the Recovery Guru and fix the errors reported, starting with the drive with the Impending Drive Failure.

4

Select Recheck to rerun the Recovery Guru to ensure that the failure has been fixed.