If several node icons in a frame report a failure, (either the nodes are not responding or several adapters are inactive) there may be a network problem.
If the failing nodes or communication adapters are on the same Local Area Network (LAN), verify the LAN hardware. If you determine that the hardware is functioning properly, call the IBM Support Center. Otherwise, follow local procedures for servicing your hardware.
If the nodes are not on the same LAN, diagnose the nodes individually as described in Action 2 - Diagnose individual nodes.
If an individual node icon in a frame reports a failure, use the SP Hardware Perspective to display the Nodes Status page in the Node notebook, for the failing node.
If it shows that the node power is off, turn the node's power on.
If it shows that the node power is on or if the problem persists, call IBM hardware support.
If a node is not responding to a network command, you can access the node by using the tty. This can be done by using the SP Hardware Perspective, selecting the node and performing an open tty action on it. It can also be done by issuing the
s1term -w frame number slot number
command, where frame number is the frame number of the node and slot number is the slot number of the node.
Using either method, you can login to the node and check the hostname, network interfaces, network routes, and hostname resolution to determine why the node is not responding. The Appendix "IP Address and Host Name Changes for SP Systems" in PSSP: Administration Guide contains a procedure for changing hostnames and IP addresses.
If you are using IPv6 alias addresses, verify that each network (Ethernet or token ring) adapter on the affected system has a valid IPv4 address defined. To verify the adapter IP addresses on the control workstation and nodes, run the SYSMAN_test command. This command issues an error message if the node does not have valid IPv4 addresses for all Ethernet and token ring adapters that are used by the SP system. See Verifying System Management installation.
If the ping and telnet commands are successful, but hostResponds still shows Node Not Responding, there may be something wrong with the Topology Services (hats) subsystem. Perform these steps: