These procedures check the installation, configuration, and operation of the System Monitor.
Use these tests to check that the System Monitor is installed properly.
This test verifies that the ssp.basic file set has been installed correctly. System Monitor function is included in the ssp.basic file set.
Issue this lslpp command on the control workstation:
lslpp -l ssp.basicGood results are indicated by output similar to the following:
Path: /usr/lib/objrepos ssp.basic 3.1.0.8 COMMITTED SP System Support Package Path: /etc/objrepos ssp.basic 3.1.0.8 COMMITTED SP System Support Package
In this case, proceed to Installation test 2 - Check System Monitor files.
Error results are indicated if entries for ssp.basic do not exist. In this case, try to determine why the file set was not installed, and either attempt to install it, or contact the IBM Support Center.
This test verifies that the System Monitor daemon and associated commands and configuration files have been created in the proper directory on the control workstation. Issue these commands and verify that all these files exist:
ls -l /usr/lpp/ssp/bin/hardmon ls -l /usr/lpp/ssp/bin/hmadm ls -l /usr/lpp/ssp/bin/hmcmds ls -l /usr/lpp/ssp/bin/hmmon ls -l /usr/lpp/ssp/bin/spmon ls -l /usr/lpp/ssp/bin/s1term ls -l /usr/lpp/ssp/bin/spsvrmgr ls -l /usr/lpp/ssp/bin/hmckacls ls -l /usr/lpp/ssp/bin/hmgetacls ls -l /usr/lpp/ssp/bin/hmdceobj ls -l /usr/lpp/ssp/bin/spmon_itest ls -l /usr/lpp/ssp/bin/spmon_ctest ls -l /usr/lpp/ssp/install/bin/hmreinit ls -l /spdata/sys1/spmon/hmacls ls -l /spdata/sys1/spmon/hmthresholds ls -l /spdata/sys1/spmon/hwevents ls -l /spdata/sys1/ucode
Notes:
Good results are indicated if all of the files exist and the files that are located in a bin directory are executable. Proceed to Installation test 3 - Run the spmon_itest command.
Error results are indicated if entries for one or more of these files does not exist. In this case, try reinstalling your SP system, or contact the IBM Support Center. If you have already reinstalled your SP system, resume diagnostics with Installation test 1 - Check ssp.basic file set.
This test verifies that the system monitor is installed correctly. Issue this command on the control workstation:
spmon_itest
Good results are indicated by output similar to:
spmon_itest: Start spmon installation verification test spmon_itest: Verification Succeeded
In this case, proceed to Configuration test 1 - Check /etc/services file.
Error results are indicated in all other cases. Check the file /var/adm/SPlogs/spmon/spmon_itest.log for error messages, and take appropriate action based on the messages. Repeat this test after taking corrective actions based on the messages from spmon_itest.
Use these tests to check that the System Monitor is configured properly.
This test verifies that there is an entry for hardmon in the /etc/services file on the control workstation. Browse the file /etc/services and look for hardmon in the left column:
hardmon 8435/tcp
Good results are indicated if this entry is present. Proceed to Configuration test 2 - Check SDR Frame class.
Error results are indicated if this entry is not present. It should have been made during the installation of the SP system. Contact the IBM Support Center.
This test verifies that the SDR Frame class exists, that all MACN values are the same for all frames, and that these values are the same as the output of the vhostname command.
Issue these commands on the control workstation:
Good results are indicated if all of the MACN values, for each frame in the Frame object, are identical to the value returned by the vhostname command. Proceed to Configuration test 3 - Check SDR SP_ports class.
Error results are indicated if there are differences in the values returned by these two commands. In this case, perform one of these corrective actions:
Repeat this test after performing one of the corrective actions.
This test verifies that the SDR SP_ports class exists, and that an object exists whose daemon attribute is hardmon. The test also verifies that the hostname and port attributes for this object are correct.
Issue these commands on the control workstation:
The output is similar to:
daemon hostname port hardmon sup1.ppd.pok.ibm.com 8435 haemd "" 10000
Good results are indicated if all of these conditions are true:
In this case, proceed to Configuration test 4 - Check SDR Syspar class.
Error results are indicated if one or more of these conditions is not true. In this case, perform one of these corrective actions:
This test verifies that the SDR Syspar class exists, and that one object exists for each system partition. Issue this command on the control workstation:
SDRGetObjects Syspar
Good results are indicated if the Syspar class exists, and there is an object for each system partition. In this case, proceed to Configuration test 5 - Check SDR Switch class.
Error results are indicated in all other cases. If the Syspar class does not exist or it is empty, contact the IBM Support Center. If the Syspar class exists and is not empty, but there is not an object for each system partition, perform one of these actions:
Do not perform this test unless your system has one or more switches.
This test verifies that the SDR Switch class exists, and that one object exists for each switch in the system. Issue the following command on the control workstation:
SDRGetObjects Switch
Good results are indicated if the Switch class exists, and there is an object for each switch in the system. In this case, proceed to Configuration test 6 - Check SDR Syspar_map class.
Error results are indicated in all other cases. If the Switch class does not exist, contact the IBM Support Center. If the Switch class exists, but it is empty, or it does not contain one object for each switch in the system, perform one of these actions:
After running SDR_config, if there is still a problem with the SDR Switch class, it may be because the SDR_config command obtains information from hardmon, and hardmon is inactive or having problems. To determine if this is the case, see Operational test 1 - Check that hardmon Is active.
This test verifies that the SDR Syspar_map class exists, and that one or more objects exist. Issue this command on the control workstation:
SDRGetObjects Syspar_map
Good results are indicated if the Syspar_map class exists, and there is at least one object whose used attribute is 1. Proceed to Configuration test 7 - Check SDR NodeExpansion class.
Error results are indicated in all other cases. If the Syspar_map class does not exist, contact the IBM Support Center. If the Syspar_map class exists, perform one of these actions:
Repeat this test after taking the suggested actions.
Do not perform this test unless your system has one or more Expansion Nodes.
This test verifies that the SDR NodeExpansion class exists, and that one object exists for each Expansion Node in the system. Issue the following command on the control workstation:
SDRGetObjects NodeExpansion
Good results are indicated if the NodeExpansion class exists, and there is an object for each Expansion Node in the system. Proceed to Configuration test 8 - Check frame, node, and switch supervisor cards.
Error results are indicated in all other cases. If the NodeExpansion class does not exist, contact the IBM Support Center. If the NodeExpansion class exists, perform one of these actions:
Repeat this test after taking the suggested actions.
This test verifies that all frame, node, and switch supervisor cards that support microcode download, contain the latest level. Issue this command on the control workstation:
smit supervisor
Select either of the two buttons labeled List Supervisors That Require Action.
Good results are indicated by output similar to:
spsvrmgr: All specified supervisor hardware is current and active. No further action is required at this time.
Proceed to Configuration test 9 - Run the spmon_ctest command.
Error results are indicated in all other cases. If this test fails, it means that one or more of the supervisor cards require some action. Either refer to the entry for the spsvrmgr command in PSSP: Command and Technical Reference, or use smit to take the action for all of the supervisor cards, or one at a time:
Issue this command:
smit supervisor
To take the required action on all supervisor cards, select the button labeled:
Update *ALL* Supervisors That Require Action
To take the required action on a single supervisor card, select the button labeled:
Update Selectable Supervisors That Require Action
Repeat this test after taking the suggested actions. If this test fails again after taking the required action on all supervisor cards that require action, contact the IBM Support Center.
This test verifies that the System Monitor is configured correctly. Issue this command on the control workstation:
spmon_ctest
Good results are indicated if the output is similar to:
spmon_ctest: Start spmon configuration verification test spmon_ctest: Verification Succeeded
Proceed to Operational test 1 - Check that hardmon Is active.
Error results are indicated in all other cases. Check the file /var/adm/SPlogs/spmon_ctest.log for error messages, and take appropriate action based on the messages. Repeat this test after taking corrective action.
Use these tests to check that the System Monitor is operating properly.
This test verifies that the System Monitor (hardmon) is active and running correctly. Issue these commands:
Output from the lssrc command is similar to:
Subsystem Group PID Status hardmon 42532 active
Output from the ps command is similar to:
root 42532 5966 0 Sep 15 0 9:42 /usr/lpp/ssp/bin/hardmon -r 5
Good results are indicated if all of the following are true:
This means that the hardmon daemon polls each frame supervisor, including external hardware daemons, for state information, every five seconds. This is the default. If the hardmon daemon uses a value other than 5 for the -r flag argument, the daemon is not running according to IBM recommendations.
In this case, proceed to Operational test 2 - Run query command.
Error results are indicated in all other cases. To correct, see Action 2 - Start the hardmon daemon. Repeat this test after taking corrective actions. If the problem persists, contact the IBM Support Center.
This test verifies that you can run a typical System Monitor query command. Issue this command:
hmmon -GQ 1:0-17
Good results are indicated if you get state output for slot 0 (the frame itself) in frame 1 (which must always exist), and any nodes in frame 1. An example is:
frame 001, slot 00: node 01 I2C not responding FALSE node 02 I2C not responding TRUE node 03 I2C not responding FALSE node 04 I2C not responding TRUE node 05 I2C not responding FALSE node 06 I2C not responding TRUE node 07 I2C not responding TRUE node 08 I2C not responding TRUE node 09 I2C not responding FALSE node 10 I2C not responding TRUE node 11 I2C not responding FALSE node 12 I2C not responding FALSE node 13 I2C not responding FALSE node 14 I2C not responding FALSE node 15 I2C not responding TRUE node 16 I2C not responding FALSE switch I2C not responding FALSE controller tail is active TRUE node 01 serial link open FALSE node 02 serial link open FALSE . . .
Error results are indicated in all other cases. This test can fail for several reasons. Read any error messages that are output by the hmmon command, and check the hardmon log /var/adm/SPlogs/spmon/hmlogfile.ddd, where ddd is the Julian date of when the file was created, for any new messages. Refer to relevant actions in Error symptoms, responses, and recoveries.