Use this table to diagnose problems with the system monitor component of
PSSP. Locate the symptom and perform the action described in the
table.
Table 55. System Monitor symptoms
Symptom | Recovery |
---|---|
The ssp.basic file set is not installed. | See Action 1 - Install ssp.basic. |
The hardmon daemon is not running. | See Action 2 - Start the hardmon daemon. |
The hardmon daemon keeps terminating and then restarting. | See Action 3 - Investigate the hardmon daemon. |
A System Monitor command, for example hmmon, does not work correctly. | See Action 4 - Investigate System Monitor command problems. |
Hardmon performance is poor. | See Action 5 - Check paging space and CPU utilization. |
Run Installation test 1 - Check ssp.basic file set to verify that this is a problem. Install the ssp.basic file set, by issuing the installp command. Run Installation test 1 - Check ssp.basic file set again to verify that the problem has been resolved.
Run Operational test 1 - Check that hardmon Is active to verify that this is a problem.
If the hardmon daemon is not running on the control workstation, you must start it. Do this by issuing the command:
startsrc -s hardmon
Run Operational test 1 - Check that hardmon Is active to verify that hardmon was started successfully.
If the hardmon daemon uses an incorrect polling interval, it may cause problems. The polling interval is chosen when the hardmon daemon is started by the System Resource Controller. The value is in the cmdargs attribute in the hardmon ODM SRCsubsys object. Check the polling interval by issuing the ODM command:
odmget -q subsysname=hardmon SRCsubsys
Output is similar to the following, which is the default:
SRCsubsys: subsysname = "hardmon" synonym = "" cmdargs = "-r 5" path = "/usr/lpp/ssp/bin/hardmon" uid = 0 auditid = 0 standin = "/dev/console" standout = "/dev/console" standerr = "/dev/console" action = 1 multi = 0 contact = 2 svrkey = 0 svrmtype = 0 priority = 20 signorm = 15 sigforce = 15 display = 1 waittime = 15 grpname = ""If the cmdargs attribute is not "-r 5", correct it issuing the command:
chssys -s hardmon -a "-r 5"Then, reissue the odmget command to verify that the new cmdargs attribute is "-r 5" and run Operational test 1 - Check that hardmon Is active to verify that the problem is resolved.
Possible causes of this problem, and corrective actions are:
Repair the /spdata/sys1/spmon/hmthresholds file, referring to the comments located at the beginning of the file. If the file cannot be repaired, restore it from regular system backups or from the PSSP installation media.
Other than a system error, the only reason that a log file would not be able to open is if the directory /var/adm/SPlogs/spmon does not exist. Verify that this directory exists, and create it if it does not.
Examine the file /var/adm/SPlogs/spmon/hmlogfile.ddd, where ddd is the Julian date. Look for any error messages related to SP Security Services. Also, check the error log for error messages, by issuing the AIX errpt command. Take any action suggested to correct these errors.
The correction is the same as for 3.
The correction is the same as for 3.
Possible causes of this problem, and corrective actions are:
If the problem is authentication or authorization, refer to Diagnosing SP Security Services problems.
hmmon -Q 1:0 hmmon -Q 1:17
produce an error message, since the -G flag was not specified, and 1:0 represents frame 1, and 1:17 represents the switch in frame 1.
Another example, if the current system partition is named PART1, and frame 3 node 8 is in the system partition named PART2, the command:
hmmon -Q 3:8
produces an error message, since the -G flag was not specified, and frame 3 node 8 is not in the current system partition.
In the case where the -G flag was not specified, refer to the entry for the particular command in PSSP: Command and Technical Reference.
Make sure that all objects in the SDR Syspar_map class reflect correct partitioning information. An error in one of these SDR objects can cause hardmon to be unable to locate nodes correctly.
Do not specify a target frame, node, or switch that does not exist in the system.
Verify that the rs232 tty cables from the control workstation to the frame that is not responding are connected correctly. Note that the S70, S7A and S80 type server frames have two rs232 tty cables attached to the control workstation. All other frames have one rs232 tty cable attached to the control workstation.
If the rs232 tty cables were not connected to the proper frames or servers, and you have already configured these frames using the spframe command, perform these steps:
Run the corrective action for cause 4. If the problem is not resolved, check for a frame ID mismatch by issuing the command:
hmmon -GQv controllerIDMismatch F:0
where F is the number of the frame on which the command is not working.
If the value is TRUE, this means that the supervisor of this frame believes that it is attached to a frame other than the one it is physically attached to. To correct this, issue the command:
hmcmds -G setid F:0
where F is the number of the frame on which the command is not working.
To verify the correction, wait 5 seconds, reissue the hmmon command and verify that the value of controllerIDMismatch is FALSE.
Use the kill command on the hmreinit process, and then reissue the hmreinit command.
Possible causes of this problem, and corrective actions are:
Check that the paging space is adequate and adjust it if necessary.
Use the vmstat command to check the overall CPU utilization.
Check the CPU utilization of the hardmon and logging daemons. One method is to issue these commands:
If the CPU utilization rate is very high and this cannot be attributed to the hardmon or logging daemon, look for other processes which are consuming the CPU resources.