Use the following table to diagnose problems with the PSSP software for all
servers. Locate the symptom and perform the action described in this
table.
Table 86. SP-attached Server and Clustered Enterprise Server symptoms
Symptom | Recovery |
---|---|
The ssp.basic file set is not installed. | See Action 1 - Install ssp.basic file set. |
An external hardware daemon or it's directory have not been created on the control workstation. | See Action 2 - Check permissions of ssp.basic. |
The SDR Frame object was not created for a server. | See Action 3 - Correct SDR Frame object. |
The SDR Node object was not created for a server. | See Action 4 - Correct SDR Node object. |
The hardmon daemon is not running. | See Action 5 - Correct System Monitor polling interval. |
The external hardware daemon is not running or not responding. | See Action 6 - Investigate external hardware daemon failure. |
The SAMI communication is not available. | See Action 7 - Restore SAMI communication. |
The S1 communication is not available. | See Action 8 Restore S1 communication. |
The switch port number for the server is not set correctly. | See Action 9 - Correct switch port number in SDR Syspar_map object. |
Perform Installation test 1 - Verify the ssp.basic file set to verify that this is a problem. Install the ssp.basic file set using the installp command. Repeat Installation test 1 - Verify the ssp.basic file set.
Perform Installation test 2 - Check external hardware daemon to verify that this is a problem. Install the ssp.basic file set using the installp command. If any of the permission or file attributes do not match what is shown in Installation test 2 - Check external hardware daemon, issue the chmod or chown commands, as appropriate, to correct the attributes. Repeat Installation test 1 - Verify the ssp.basic file set and Installation test 2 - Check external hardware daemon.
Perform test Configuration test 1 - Check SDR Frame object to verify that this is a problem. Perform these steps:
splstdata: 0022-001 The repository cannot be accessed. Return code was 80.
refer to Diagnosing SDR problems to determine why the SDR cannot be accessed.
For an S70, S7A and S80 type server, the value must be SAMI.
After correcting the problem, repeat Configuration test 1 - Check SDR Frame object.
Perform Configuration test 2 - Check SDR Node object to verify that this is a problem. Perform these steps:
splstdata: 0022-001 The repository cannot be accessed. Return code was 80.
refer to Diagnosing SDR problems to determine why the SDR cannot be accessed.
After correcting the problem, repeat Configuration test 2 - Check SDR Node object.
Perform Operational test 1 - Check hardmon status to verify that this is a problem. Perform these steps:
startsrc -s hardmon
Then, perform Operational test 1 - Check hardmon status again to determine if hardmon was started successfully.
odmget -q subsysname=hardmon SRCsubsys
The output is similar to the following, which is the default:
SRCsubsys: subsysname = "hardmon" synonym = "" cmdargs = "-r 5" path = "/usr/lpp/ssp/bin/hardmon" uid = 0 auditid = 0 standin = "/dev/console" standout = "/dev/console" standerr = "/dev/console" action = 1 multi = 0 contact = 2 svrkey = 0 svrmtype = 0 priority = 20 signorm = 15 sigforce = 15 display = 1 waittime = 15 grpname = ""
If the cmdargs attribute is not "-r 5", correct this by issuing the following command:
chssys -s hardmon -a "-r 5"
Then reissue the odmget command to verify that the new cmdargs attribute is "-r 5".
After correcting the problem, repeat Operational test 1 - Check hardmon status.
Perform Operational Test 2 - Check external hardware daemons to determine if the external hardware daemon is running. Perform Operational test 3 - Check frame responsiveness to determine if the external hardware daemon is responding. If either test produces error results, perform these steps:
The same messages that are in the SP hardware log are also found in the AIX Error log. To obtain full details of all SP hardware messages in this log, issue the command:
errpt -aN sphwlogYou may want to redirect the output to a file, because there could be a large amount of output.
hmcmds -G boot_supervisor F:0
where F is the frame number of the server. This notifies the System Monitor that the external hardware daemon has stopped. The System Monitor then starts the daemon.
hmreinitThe System Monitor daemon (hardmon) will be restarted by the System Resource Controller, and the daemon will then restart all of the external hardware daemons. It also causes the SDR_config command to run, updating the SDR as necessary.
stopsrc -s hardmon
to stop the System Monitor daemon (hardmon), and then issue the command:
splstdata -fto see what ttys (tty and s1_tty) are needed by your external hardware daemons. Refer to Configuration test 1 - Check SDR Frame object for typical output. For an S70, S7A and S80 servers, if one or more of it's ttys has a corresponding entry in the /etc/locks/ directory, delete these entries and repeat this step. A server may be prevented from starting if either of it's two required ttys are locked.
After correcting the problem, repeat Operational Test 2 - Check external hardware daemons and Operational test 3 - Check frame responsiveness.
Perform Operational test 4 - Check SAMI communications to determine if the SAMI communication is available. If you receive an error result, perform these steps:
TYPE : smitty SELECT : devices SELECT : TTY SELECT : Change / Show Characteristics of a TTY select the TTY of interest and press ENTER check the "Enable LOGIN" value
After correcting the problem, repeat Operational test 4 - Check SAMI communications.
Perform Operational test 5 - Check S1 communications to verify that this is a problem. If you receive error results, perform these steps:
TYPE : smitty SELECT : devices SELECT : TTY SELECT : Change / Show Characteristics of a TTY select the TTY of interest and press ENTER check the "Enable LOGIN" value
After correcting the problem, repeat Operational test 5 - Check S1 communications.
Perform Configuration test 3 - Check SDR Syspar_map object to verify that the switch port number is not set correctly for the server. Perform these steps:
0025-080 The SDR routine could not connect to server.
or some other message indicating a problem with the System Data Repository, refer to Diagnosing SDR problems.
Refer to PSSP: Planning Volume 2 for information on determining a valid switch port number, and for situations where you may not wish to have the SDR configuration command automatically assign one for you in a clustered enterprise server system. For a description of the spframe command, see PSSP: Command and Technical Reference.
After correcting the problem, repeat Configuration test 3 - Check SDR Syspar_map object.