IBM Books

Diagnosis Guide

Error symptoms, responses, and recoveries

Use the following table to diagnose problems with the boot component of PSSP. Locate the symptom and perform the action described in the following table.

If you have a symptom that is not in the table, or the recovery action does not correct the problem, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

Table 11. Boot symptoms

Symptom Recovery
Node's LED/LCD stops with a value. See Action 1 - Investigate LED/LCD.
Error messages are in the console.log file. See Action 2 - Examine the log file.
Node never becomes available. See Action 3 - Boot a node in maintenance mode.
setup_server fails with an authorization problem. See Action 5 - Check for multiple Boot/Install servers in RRA mode, secure shell mode, or with AIX Authorization for Remote Commands set to none.
Problems with devices, including the hard disk See Action 4 - Boot a node in diagnostic mode.


Action 1 - Investigate LED/LCD

This is most likely an AIX software or hardware problem. Refer to the entry for the LED/LCD in to determine the cause of the problem. Values that are specific to the RS/6000 SP System appear in SP-specific LED/LCD values. The hardware manuals for your particular node, and the AIX messages contain the remaining LED/LCD values. See Other LED/LCD codes.

Follow any diagnostic or repair actions documented for the LED/LCD. Be sure to reinstall the node if the hard disk is replaced. If an Ethernet card or I/O planar is replaced, be sure to follow the procedure in the reconfigration chapter of PSSP: Installation and Migration Guide for this repair.

After this problem is resolved, the node's boot process should proceed past the failing LED/LCD value and the boot should complete.

Action 2 - Examine the log file

This is an error with a command issued as a result of the boot process, or a subsystem invoked by the boot process. Examine the errors in the console.log. See Console log. Refer to the appropriate chapter in this book for the failing component. Since errors have a cascading effect, correct the earliest problem first, and then note its effect on later failures.

Once the problem is corrected, reboot the node.

Action 3 - Boot a node in maintenance mode

This is a fatal boot error. Check the console.log. See Console log. See the AIX error log for failures. If the node is not accessible via the telnet command or a remote command, try the s1term command to reach the node.

If s1term cannot reach the node, boot the node in maintenance mode by issuing these commands:

  1. spbootins -r maintenance -l node_number
  2. nodecond frame_number slot_number
  3. A s1term window will be opened.
  4. In the s1term window, choose these menu options:
    1. start a limited function maintenance shell
    2. mount the root volume group and start a shell
  5. Examine the console.log and AIX error log, follow diagnostic procedures for the problem, and correct it.

Once the problem is corrected, reboot the node.

Action 4 - Boot a node in diagnostic mode

You can boot a node from Ethernet network in diagnostic mode when you want to run diagnostics on any device, including the hard disk, attached to that node. When a node is booted in diagnostics mode, it brings up the diagnostics menu, just as if the diag command was issued from AIX. But, because the hard disk of the node is not used as the boot device, you can format the hard disk, certify, diagnose, and download microcode.

Caution: Formatting destroys all data on that disk.

To boot a node in diagnostics mode, use the spbootins command. For example, to boot node 12 in diagnostics mode, issue:

spbootins -r diag -1 12

After the spbootins command has been issued, the next time the node is network booted, it will boot using the diagnostic image served over the network. The tty console will open on the display, and you will be able to select actions as in the AIX diag command. See AIX 5L Version 5.1 Commands Reference for a full description of the diag command.

A node in Diagnostic mode will NFS mount the Shared Product Object Tree (SPOT) from the boot/install server for use by the diagnostic image on the node. If device support is not present in the SPOT of the boot/install server, the device will not be supported by diagnostics on the node.

Action 5 - Check for multiple Boot/Install servers in RRA mode, secure shell mode, or with AIX Authorization for Remote Commands set to "none"

Using multiple Boot/Install servers with Restricted Root Access is not recommended and is not automatically supported by PSSP. However, depending on the size of your system and network loads, it may not be possible to install your SP system without a single Boot/Install server.

Boot/Install servers are NIM masters and require remote command access to both the control workstation and the nodes that they serve. PSSP does not automatically create the correct entries in the authorization files to allow the remote commands to function.

To use multiple Boot/Install servers, follow this procedure to manually establish the correct authorizations on your system:

  1. On the control workstation, change the authorization files, depending on the setting of the auth_root_rcmd attribute:

    An entry for the Boot/Install server node hostname in the /.rhosts file.

    An entry for the Boot/Install server node remote command principal in the /.klogin file.

    An entry for the self-host and the spbgroot principal for the Boot/Install server node.
  2. On the Boot/Install server node, edit the /etc/sysctl.conf file and include these entries:

    The last entry is included only if you are initiating a node instal customization.

In order for these changes to take effect, you must stop and restart sysctld on both the control workstation and all Boot/Install servers:

stopsrc -s sysctld
startsrc -s sysctld

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]