These Diagnostic Procedures test the installation, configuration, and operation of all servers.
Use these tests to check that the server has been installed properly.
This test verifies that the ssp.basic file set has been installed correctly. All server device drivers are included in the ssp.basic file set. Issue this lslpp command on the control workstation:
lslpp -l ssp.basic
The output is similar to the following:
Path: /usr/lib/objrepos ssp.basic 3.1.0.8 COMMITTED SP System Support Package Path: /etc/objrepos ssp.basic 3.1.0.8 COMMITTED SP System Support Package
Good results are indicated if entries for ssp.basic exist. Proceed to Installation test 2 - Check external hardware daemon.
Error results are indicated in all other cases. Try to determine why the file set was not installed, and either install it, or contact the IBM Support Center.
This test verifies that the appropriate external hardware daemon and it's directory have been created on the control workstation. Issue these commands on the control workstation:
The output is similar to the following:
-r-x------ 1 bin bin 47166 Sep 15 13:00 /usr/lpp/ssp/install/bin/s70d
The output is similar to the following:
drwxr-xr-x 2 bin bin 512 Sep 15 13:01 s70d
Good results are indicated if output for all the commands is similar to the examples provided. Proceed to Configuration test 1 - Check SDR Frame object.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
Use these tests to check that all servers have been configured properly.
This test verifies that the SDR Frame object was created. The configuration data for all servers must reside in the SDR Frame object. On the control workstation, issue the command:
splstdata -f
For each server you should see a line of output similar to the following. The numbers may be different.
frame# tty s1_tty frame_type hardware_protocol ---------------------------------------------------------- 3 /dev/tty2 /dev/tty3 "" SAMI
Good results are indicated if all of the following are true:
If all of these conditions are true, proceed to Configuration test 2 - Check SDR Node object.
Error results are indicated if not all of these conditions are true. Attempt to fix the SDR data by issuing the spframe command with the appropriate parameters, or contact the IBM Support Center. For a description of the spframe command, refer to PSSP: Command and Technical Reference.
Repeat this test after issuing the spframe command. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the SDR Node object (for s70) was created. The configuration data for all servers must reside in the appropriate SDR object. For each S70, S7A and S80 server, issue this command on the control workstation:
splstdata -n -l N
where N is the node number of the server. If you do not know the node number, issue the command: spmon -G -d to determine node numbers.
For each S70, S7A and S80 server, you should see output similar to the following, which is output from splstdata -n -l 33:
node# frame# slot# slots initial_hostname reliable_hostname dcehostname default_route processor_type processors_installed description -------------------------------------------------------------------------- 33 3 1 1 wild3n01.ppd.pok wild3n01.ppd.pok "" 9.114.130.130 MP 4 7017-S70
Good results are indicated if an appropriate entry exists for each server in the system. Proceed to Configuration test 3 - Check SDR Syspar_map object.
Error results are indicated in all other cases. Attempt to fix the SDR data by issuing the spframe command with the appropriate parameters. If a particular entry exists, but contains incorrect information, you must first delete that Frame object by issuing the spdelfram command. For a description of these commands, refer to PSSP: Command and Technical Reference.
Repeat this test after issuing the spframe command. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the SDR Syspar_map object for all servers was created correctly. The switch port number for the server is stored in the Syspar_map object of the SDR. The switch port number for the server is either defined for the SP-attached server using the spframe command, or can be optionally automatically assigned by the SDR configuration command for clustered enterprise servers. On the control workstation, issue the command once for each server:
SDRGetObjects Syspar_map node_number==N switch_node_number
where N is the node number of the server.
If you do not know the node number, issue the command: spmon -G -d to determine node numbers. For each command, you should see output similar to the following, which is output from SDRGetObjects Syspar_map node_number==17 switch_node_number:
switch_node_number 5
Good results are indicated if the switch port number that is returned matches the value requested when the server was originally defined. For systems with an SP Switch, this should be the switch port number associated with the port in which the server is cabled to the SP Switch. For systems without a switch, this should be any unused valid switch port number on the system. For clustered systems, this can be any unused value in the range 0 to 511. Proceed to Operational test 1 - Check hardmon status.
Error results are indicated if no entry exists for the node, or the returned value is incorrect. Attempt to fix the SDR data by issuing the spframe command with the appropriate parameters. If a Frame object already exists for this server, you must first delete that Frame object by issuing the spdelfram command. For a description of these commands, see PSSP: Command and Technical Reference. For information on assigning valid switch port numbers for all servers, see PSSP: Planning Volume 2.
Repeat this test after issuing the spframe command. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
Use these tests to check that all servers are operating properly.
This test verifies that the System Monitor (hardmon) is active and running correctly. External hardware daemons cannot run if hardmon is not running. Issue the commands:
The output is similar to the following:
Subsystem Group PID Status hardmon 42532 active
The output is similar to the following:
root 42532 5966 0 Sep 15 0 9:42 /usr/lpp/ssp/bin/hardmon -r 5
Good results are indicated if all of the following are true:
If these conditions are met, proceed to Operational Test 2 - Check external hardware daemons.
Error results are indicated if these conditions are not met. To determine why hardmon is not running, or why the argument to the -r flag is not 5, refer to Diagnosing System Monitor problems.
Repeat this test, after taking any action suggested in Diagnosing System Monitor problems. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the s70d daemons are running. If you have one or more S70, S7A or S80 servers, whether they are SP-attached or clustered enterprise servers, issue the command:
ps -ef | grep s70d
For s70, S7A and S80 servers, a line of output for each server is similar to the following:
root 23384 42532 1 Sep 15 2 79:08 /usr/lpp/ssp/install/bin/s70d -d 0 5 1 7 /dev/tty2 /dev/tty3
Good results are indicated if all of the following are true:
To help verify the correctness of the last two parameters, issue the command:
splstdata -f
which produces output that shows what the communication ports are expected to be.
If you receive good results, proceed to Operational test 3 - Check frame responsiveness.
Error results are indicated if any of these conditions are not met. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the frames are responding. External hardware daemons cannot run properly if their frames are not responding. To verify that a particular frame is responding, issue the following command on the control workstation:
hmmon -GQv controllerResponds F:0
where F is the frame number of the server that you are checking. Repeat this test for each server in the system.
The output is similar to the following:
frame F, slot 00: TRUE frame responding to polls
Good results are indicated if the value is TRUE for each server. Proceed to Operational test 4 - Check SAMI communications.
If the value is FALSE, or you do not get any output, the test may have encountered an error. During normal operation, this value may occasionally switch to FALSE, which may simply mean that the daemon happens to be busy and cannot respond to an individual System Monitor request in a timely manner. Therefore, if you get a FALSE value, repeat the hmmon command several more times, waiting at least five seconds between invocations. If the value is consistently FALSE after several attempts, assume this to be error results.
In the case of error results, perform these steps:
Repeat this test after performing these steps. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the SAMI communication is available. For each server, issue the following command twice on the control workstation, waiting at least five seconds between the two commands:
hmmon -GQs F:0,1
where F is the frame number of the server that you are checking. Repeat this test for each server in the system.
This is an example output for an s70 in frame 3:
3 0 nodefail1 FALSE 0x8802 node 01 I2C not responding 3 0 nodeLinkOpen1 FALSE 0x8813 node 01 serial link open 3 0 diagByte 0 0x8823 diagnosis return code 3 0 timeTicks 38701 0x8830 supervisor timer ticks 3 0 type 2 0x883a supervisor type 3 0 codeVersion 769 0x883b supervisor code version 3 0 daemonPollRate 5 0x8867 hardware monitor poll rate 3 0 controllerResponds TRUE 0x88a8 frame responding to polls 3 1 nodePower TRUE 0x944a DC-DC power on 3 1 serialLinkOpen FALSE 0x949d serial link is open 3 1 DPOinProgress FALSE 0x950b delayed power off active 3 1 SRChasMessage FALSE 0x9512 SRC contains a message 3 1 SPCNhasMessage FALSE 0x9513 SPCN contains a message 3 1 LCDhasMessage FALSE 0x9506 LED/LCD contains a message 3 1 src BLANK 0x9510 System Reference Code 3 1 spcn BLANK 0x9511 System Power Cntl Network 3 1 hardwareStatus 72 0x94f3 hardware status byte 3 1 diagByte 15 0x9423 diagnosis return code 3 1 timeTicks 23850 0x9430 supervisor timer ticks 3 1 type 10 0x943a supervisor type 3 1 codeVersion 769 0x943b supervisor code version 3 1 lcd1 BLANK 0x94f4 LCD line 1 3 1 lcd2 BLANK 0x94f5 LCD line 2
Good results are indicated if all of the following are true:
In this case, proceed to Operational test 5 - Check S1 communications.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the S1 communication is available. For each other server, issue the following command on the control workstation:
s1term -G F 1
where F is the frame number of the server that you are checking. Repeat this test for each server in the system.
Good results are indicated if the AIX login prompt is displayed. If you do not see the login prompt after issuing the s1term command, try typing the enter key a second time. If you still do not see the login prompt, consider this to be error results.
To correct the problem, perform these steps:
Repeat this test after performing these steps. If the test still fails, record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.