IBM Books

Diagnosis Guide


Error symptoms, responses, and recoveries

Use the following table to diagnose problems with the node installation component of PSSP. Locate the symptom and perform the action described in the following table.

If you have a symptom that is not in the table, or the recovery action does not correct the problem, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

Table 10. Node Installation Symptoms

Symptom Recovery
setup_server failure. See Action 1 - Correct NIM environment.
setup_server fails with an authorization problem. In Diagnosing boot problems, see Action 5 - Check for multiple Boot/Install servers in RRA mode, secure shell mode, or with AIX Authorization for Remote Commands set to none.
nodecond command failure. See Action 17 - Correct nodecond command.
Node LED/LCD stops at E105. See Action 19 - Correct network problem.
Node LED/LCD stops at 231/E1F7. See Action 2 - Correct bootp failure.
Node LED/LCD stops at 233. See Action 20 - Perform reconfiguration of the node.
Node LED/LCD stops at 260. See Action 3 - Send boot image to node.
Node LED/LCD stops at c48. See Action 18 - Correct bosinst_data file.
Node LED/LCD stops at u03. See Action 4 - Send install_info file to node.
Node LED/LCD stops at u57. See Action 5 - Send config_info file.
Node LED/LCD stops at u79. See Action 6 - Send script.cust file.
Node LED/LCD stops at u50. See Action 7 - Send tuning.cust file.
Node LED/LCD stops at u54. See Action 8 - Send spfbcheck file.
Node LED/LCD stops at u56. See Action 9 - Send psspfb_script file to the boot/install server.
Node LED/LCD stops at u58. See Action 10 - Send psspfb_script file to the control workstation.
Node LED/LCD stops at u80. See Action 16 - Send PSSP install images.
Node LED/LCD stops at u87. See Action 15 - Check script.cust file.
Node LED/LCD stops at u62. See Action 11 - Send spsec_overrides file.
Node LED/LCD stops at u67. See Action 12 - Send krb.conf file.
Node LED/LCD stops at u68. See Action 13 - Send krb.realms File.
Node LED/LCD stops at u69. See Action 14 - Send krb-srvtab file.
Error message indicating that firstboot.cust failed running copy_env_files. See Action 21 - Check installation with secure remote command option enabled

Actions

Action 1 - Correct NIM environment

This is a problem configuring the NIM environment. Refer to PSSP: Messages Reference for the specific errors that are issued by the setup_server command. Follow the repair action described.

Once the repair is complete, run setup_server and the command should now complete.

Action 2 - Correct bootp failure

This is a node bootp failure. Perform the following steps:

  1. Verify that the node's bootp response is set to install by issuing:
    splstdata -b -l node_number
    

    If the node's bootp response is not set properly, reset it using the spbootins command and restart the installation.

  2. On the node's boot/install server, run setup_server and verify that there are no errors. If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there.
  3. Verify that there is an entry created in the /etc/bootptab file on the node's boot/install server. If there is no entry created after completing these steps, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
  4. In a separate window, restart the bootp daemon in foreground mode.
    1. Edit /etc/inetd.conf and comment out the entry containing bootpd by inserting a "#" at the beginning of the line.
    2. Refresh inetd by issuing the command: refresh -s inetd.
    3. Start the bootp daemon in the foreground by issuing bootpd -s -d -d -d. The repeated -d flags set the level of trace output.
    4. The bootp daemon will now run and report status and errors to stdout.
  5. In a separate window, open a read-only console to the node by using the s1term command.
  6. Reissue nodecond to network boot the node.
  7. Observe the output from the bootpd command.

Typical problems that may be encountered include:

Once the problem is corrected, reissue the nodecond command and the install should proceed normally.

Action 3 - Send boot image to node

The problem is that a tftp of boot image to node failed. Perform these steps:

  1. On the node's boot/install server, run setup_server and verify that there are no errors. If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there.
  2. Verify that the boot image exists in the /tftpboot directory on the boot/install server. This file is named with the node's long reliable hostname.
  3. Verify that the permissions on the file are 644 or greater.
  4. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  5. Verify that the /etc/tftpaccess.ctl file on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, reissue the nodecond command. The install should proceed normally.

Action 4 - Send install_info file to node

The problem is that the node failed to tftp the install_info file. Perform these steps:

  1. Verify that the node name.install_info file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running the command refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 5 - Send config_info file

The problem is that the node failed to tftp the config_info file. Perform these steps:

  1. Verify that the node_name.config_info file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 6 - Send script.cust file

The problem is that the node failed to tftp the script.cust file. Perform these steps:

  1. Verify that the script.cust file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 7 - Send tuning.cust file

The problem is that the node failed to tftp the tuning.cust file. Perform these steps:

  1. Verify that the tuning.cust file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 8 - Send spfbcheck file

The problem is that the node failed to tftp the spfbcheck file. Perform these steps:

  1. Verify that the spfbcheck file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 9 - Send psspfb_script file to the boot/install server

The problem is that the node failed to tftp the psspfb_script file. Perform these steps:

  1. Verify that the psspfb_script file exists on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 10 - Send psspfb_script file to the control workstation

The problem is that the node failed to tftp the psspfb_script file. Perform these steps:

  1. Verify that the psspfb_script file exists on the control workstation in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the control workstation contains a line similar to
    allow:/tftpboot
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 11 - Send spsec_overrides file

The problem is that the node failed to tftp the spsec_overrides file. Perform these steps:

  1. Verify that the spsec_overrides file exists on the control workstation in the /spdata/sys1/spsec directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/spdata/sys1/spsec/
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 12 - Send krb.conf file

The problem is that the node failed to tftp the krb.conf file. Perform these steps:

  1. Verify that the krb.conf file exists on the node's boot/install server in the /etc directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/etc/krb.conf
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 13 - Send krb.realms File

The problem is that the node failed to tftp the krb.realms file. Perform these steps:

  1. Verify that the krb.realms file exists on the node's boot/install server in the /etc directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Verify that the entry for the tftp daemon is not commented out in the /etc/inetd.conf file. If this is the problem, uncomment it and refresh inetd by running refresh -s inetd.
  4. Verify that /etc/tftpaccess.ctl on the boot/install server contains a line similar to
    allow:/etc/krb.realms
    

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 14 - Send krb-srvtab file

The problem is that the node failed to copy the srvtab file. Perform these steps:

  1. Verify that the node name-new-srvtab file exists on the control workstation in the /spdata/sys1/k4srvtabs directory. For nodes running levels of PSSP earlier than PSSP 3.2, this file is on the node's boot/install server in the /tftpboot directory.

    If it is not present, verify that the node's bootp response is set to install, and run setup_server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the file still does not exist, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  2. Verify that the permissions on the file are 644 or greater.
  3. Boot the node, and verify that the /etc/krb-srvtab file exists.

Once the problem is corrected, the node should move past the failing LED/LCD and continue the installation.

Action 15 - Check script.cust file

The problem is that running script.cust is causing a hang condition. Investigate /tftpboot/script.cust. This is a user-supplied script. Look for any problems that might cause the node to hang.

Once the problem is corrected, the node can be reinstalled, and the installation should succeed.

Action 16 - Send PSSP install images

The problem is that a PSSP directory failed to mount. Perform these steps to verify that directory is exported on the node's boot/install server:

  1. Obtain the node's code version by issuing
    splstdata -b -l node_number
    
  2. Verify that the directory /spdata/sys1/install/pssplpp/code version exists on the boot/install server and that it contains the PSSP install images.

    If it is not present or is empty, create the directory and place the appropriate PSSP install images in it. Then run setup_server and reinstall the node.

  3. Issue the exportfs command on the boot/install server. In the command output, look for a line similar to /spdata/sys1/install/pssplpp, and containing the failing node's hostname.

    If the line is not present, run setup_server on the node's boot/install server.

    If there are errors, refer to the entries for the message numbers given in PSSP: Messages Reference and follow the instructions there. If the export still does not succeed, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

  4. If the export appears in the exportfs output, you may be experiencing an NFS problem. Follow the AIX procedures for diagnosing NFS problems.

Once the problem is corrected, reissue the nodecond command, and the install should proceed normally.

Action 17 - Correct nodecond command

The problem is that nodecond was unable to complete the network boot. Consult the log file listed in the error message to determine the exact cause of the error. Typical errors include:

After correcting the problem, reissue the nodecond command. The install should proceed normally.

Action 18 - Correct bosinst_data file

This is a noprompt installation failure. There is a mismatch in the data provided in the noprompt bosinst_data file being used by the node and the actual configuration of the node. The most common mismatch is the specification of a physical disk that does not exist on the node. To determine the nature of the failure, perform the following steps:

  1. Open a write console to the node using the s1term command.
  2. On the console there should be a panel displaying the current choices, and the error that the installation encountered.
  3. If the physical disk specified in the SDR is not present on the system:
    1. Use the spchvgobj command to update the SDR.
    2. Reissue the setup_server command.
    3. Reinstall the node.
  4. If the physical disk specified in the SDR is attached to the node, perform hardware diagnostics to determine why the device is not being configured. After correcting the problem, reinstall the node.

Once the problem is corrected, the node should proceed with the install.

Action 19 - Correct network problem

This generally indicates a network problem. It may be caused by a bad Ethernet card or by an Ethernet adapter being set to an incorrect duplex setting. You can verify the duplex setting on your nodes using the lsattr command. For example:

busio                        Bus I/O address                   False
busintr                      Bus interrupt level               False
intr_priority 3              Interrupt priority                False
tx_que_size   64             TRANSMIT queue size               True
rx_que_size   32             RECEIVE queue size                True
full_duplex   no             Full duplex                       True
use_alt_addr  no             Enable ALTERNATE ETHERNET address True
alt_addr      0x000000000000 ALTERNATE ETHERNET address        True
 

Verify that the full_duplex setting is correct on all nodes for your particular network environment.

If all the adapters are correctly set, perform node diagnostics to determine if there is a bad adapter card in your system. If so, contact IBM Hardware Support to have it replaced. If none of these measures resolve the problem, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center for further assistance.

Action 20 - Perform reconfiguration of the node

This may be caused when a node is replaced with a different type of node without following the procedures documented in PSSP: Installation and Migration Guide. This causes an incorrect setting for the platform type in the NIM client object. If this is the case, follow the procedure for replacing a node with a different type of node in "Reconfiguring the RS/6000 SP System" of PSSP: Installation and Migration Guide. This will cause the NIM client object to be recreated properly.

Action 21 - Check installation with secure remote command option enabled

PSSP 3.4 provides the ability to replace rsh and rcp calls in the PSSP code with secure remote commands and secure remote copy calls.

The root user must be able to run secure remote commands from the control workstation to the nodes without password or passphrase prompts. This normally means that a root public key generated at the control workstation must have been installed on the nodes and the control workstation. In addition, if the boot/install server node for the node is not the control workstation, the boot/install server node's public key must have also been installed on the node.

If secrshell is enabled, and during installation the dsh command fails, you should check that root can issue the secure remote command and secure copy command from the control workstation to the nodes without being prompted for a password or passphrase.

A check should be made that the SDR SP_Restricted class, attributes rcmd_pgm, dsh_remote_cmd, and remote_copy_cmd are correct and consistent. The command splstdata -e displays the current values of these attributes.

If the system administrator is setting the $RCMD_PGM, $DSH_REMOTE_CMD and $REMOTE_COPY_CMD environment variables to override the setting in the SDR, check these variables to make sure that they are consistent. Use these commands:

echo $RCMD_PGM
echo $DSH_REMOTE_CMD
echo $REMOTE_COPY_CMD

to check that the remote shell command choice is accurate and consistent with the executable defined by the $RCMD_PGM, $DSH_REMOTE_CMD and $REMOTE_COPY_CMD environment variables.

If the dsh_remote_cmd or remote_copy_command attributes of the SDR SP_Restricted class are null, the remote command and remote copy methods used must be in or linked to the bin directory. For example, if rcmd_pgm=rsh and dsh_remote_cmd and remote_copy_cmd are null, then executables /bin/rsh and /bin/rcp must exist.

If root cannot issue a secure command to the node without being prompted for a password or passphrase, this will cause a secure remote command install of the nodes to fail. Check the following:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]