Recovering from an Unresponsive Controller or Storage Subsystem Condition

Related Topics

Recovery Steps

A storage subsystem can have an Unresponsive Status for several reasons. Use the following steps to determine a possible cause and solution.

Important: It can take up to 5 minutes before the Enterprise Management software detects that a storage subsystem has become unresponsive or becomes responsive again. Before performing the suggested actions below, ensure you wait a sufficient amount of time before concluding that the storage subsystem is still unresponsive.

1

Check the Device Tree to see if all storage subsystems in the management domain are unresponsive. If so, check the storage management station network connection to ensure that it can reach the network; otherwise, continue with step 2.

2

Ensure that the controllers are installed and that there is power to the storage subsystem. If there is a problem, correct it. Otherwise:

Storage Subsystem Management Method:

Go to...

(Directly Managed )

Step 3.

(Host-Agent Managed )

Step 4.

3

For a directly managed storage subsystem:

a

Ensure the controller(s) are network-accessible. To do so, use the ping command to verify that the controller can be reached. Use the form ping <host name or controller IP address>. If verification is successful, continue to step b. If verification is unsuccessful, go to step c.

b

Remove the storage subsystem with the Unresponsive Status from the Enterprise Management Window, then use the Add Device option to add the storage subsystem again. If the storage subsystem returns to Optimal Status, you are finished with this procedure. If the storage subsystem does not return to Optimal Status, complete steps c and d.

c

Check the Ethernet cables to ensure that there is no visible damage, and that they are securely connected.

d

Make sure the appropriate network configuration tasks have been performed (for example, IP addresses assigned to each controller). Refer to the storage management software installation guide for details.

If there is a cable or network accessibility problem, fix the problem; otherwise go to step 5.

4

For a host-agent managed storage subsystem:

a

Ensure the host is network-accessible. To do so, use the ping command to verify that the host can be reached. Use the form ping <host name or IP address>. If verification is successful, continue to step b. If verification is unsuccessful, go to step c.

b

Remove the host with the Unresponsive Status from the Enterprise Management Window, then use the Add Device option to add the host again. If the host returns to Optimal Status, you are finished with this procedure. If the host does not return to Optimal Status, complete steps c through f.

c

Ensure that the host is turned on and operational and that the host adapters have been installed.

d

Check all external cables and switches or hubs to ensure that there is no visible damage, and they are securely connected.

e

Ensure the host-agent software is installed and running. If you started the host system before you were connected to the controllers in the storage subsystem, the host-agent software will not be able to detect the controllers. If this is the case, ensure the connections are secure and then restart the host-agent software. Refer to the storage management software installation guide for information on how to restart the host-agent.

f

If you have recently replaced or added the controller, restart the host-agent software so that the new controller is recognized.

If there is a problem, make the appropriate host modifications, otherwise, continue with step 5.

5

Check with other administrators to see if a firmware upgrade was performed on the controllers from another storage management station. If a firmware upgrade was performed, the Enterprise Management software on your management station may not be able to locate the new Subsystem Management Window software needed to manage the storage subsystem with the new version of firmware.

If this is the problem, contact your technical support, otherwise, continue with step 6.

6

Determine if there is an excessive amount of network traffic to one or more controllers. This is a self-correcting problem, because the Enterprise Management software periodically retries (in the background) to establish communication with the controllers in the storage subsystem. If the storage subsystem was unresponsive and a subsequent attempt to connect to the storage subsystem succeeds, the storage subsystem becomes responsive.

For a directly managed storage subsystem, determine if management operations are taking place on the storage subsystem from other storage management stations. There is a controller-determined limit to the number of TCP/IP connections that can be made to the controller before it stops responding to subsequent connection attempts. The type of management operations being performed and the number of management sessions taking place together determine the number of TCP/IP connections made to a controller. This is a self-correcting problem, because after some TCP/IP connections terminate, the controller then becomes responsive to other connection attempts.

7

If the storage subsystem is still unresponsive, you probably have faulty controllers. Contact your technical support.

Related Topics

Learn About Unresponsive Storage Subsystem Conditions