Diagnostic procedures for the Logging daemon consist of installation verification tests, configuration verification tests and operational verification tests.
Use these tests to check that the Logging daemon has been properly installed.
This test verifies that the splogd and setup_logd files have been installed in directory /usr/lpp/ssp/bin/ and that they are executable. On the control workstation, issue the commands:
Verify that these two files exist and are executable.
Good results are indicated if both files exist and are executable. Proceed to Installation test 2 - Verify hwevents file.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
Verify that the hwevents file is in the proper location. Refer to hwevents file, which describes how to determine the file's location.
Good results are indicated if the file exists in the proper location. Proceed to Configuration test 1 - Check for state logging.
Error results are indicated in all other cases. Create an hwevents file in the proper location, as described in hwevents file. You can find a sample file in "RS/6000 SP Files and Other Technical Information" of PSSP: Command and Technical Reference. Modify the file as appropriate. Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
Use these tests to check that the Logging daemon has been properly configured.
If you expected splogd to create file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp, but it was not created, verify that the hwevents file contains the proper line to indicate enabling of state logging. Otherwise, proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.
Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes where the hwevents file is located. Verify that there exists a line in this file of the following format:
* * * = * c SP_STATE_LOG
Good results are indicated if the hwevents file contains the uncommented line described. Proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.
Error results are indicated in all other cases. Add the line to the hwevents file, and proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.
If you expected splogd to write to the syslog, but it did not, verify the contents of file /etc/syslog.conf. Also, verify the existence of file /var/adm/SPlogs/SPdaemon.log. Otherwise, go to Configuration test 3 - Check hwevents file.
In file /etc/syslog.conf verify that there exists one of the following three lines:
daemon.notice /var/adm/SPlogs/SPdaemon.log daemon.debug /var/adm/SPlogs/SPdaemon.log daemon.info /var/adm/SPlogs/SPdaemon.log
Verify that file /var/adm/SPlogs/SPdaemon.log exists, and that its permissions are set to 644.
Good results are indicated if the /etc/syslog.conf file contains one of the three lines described, and the SPdaemon.log file exists with permissions 644. Proceed to Configuration test 3 - Check hwevents file.
Error results are indicated in all other cases. Perform these steps:
daemon.notice /var/adm/SPlogs/SPdaemon.log daemon.debug /var/adm/SPlogs/SPdaemon.log daemon.info /var/adm/SPlogs/SPdaemon.log
setup_logd
If you expected splogd to write error information to the AIX error log, but it did not, verify that the hwevents file contains the proper line to enable error logging. Otherwise, proceed to Configuration test 5 - Check sser exits.
Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes the location of the hwevents file. Verify that there exists a line in this file of the following format:
* * * = * b SP_ERROR_LOG
Good results are indicated if the hwevents file contains the uncommented line described here. Proceed to Configuration test 4 - Check SPMON error record templates.
Error results are indicated in all other cases. Add the line to the hwevents file and proceed to Configuration test 4 - Check SPMON error record templates.
This test verifies that the error record templates for SPMON have been defined. Issue the command:
errpt -t | grep SPMON_
The output is similar to:
001BB5DD SPMON_EMSG102_ER PERM H COMMUNICATIONS SUBSYSTEM FAILURE 0D1620A8 SPMON_INFO103_TR UNKN H THRESHOLD HAS BEEN EXCEEDED 3E6F3CE7 SPMON_INFO101_TR UNKN H ELECTRICAL POWER RESUMED 4CEF5A08 SPMON_EMSG100_ER PERM H LINK ERROR 6469E2D8 SPMON_EMSG108_ER PERM S UNABLE TO COMMUNICATE WITH REMOTE NODE 76A4FAD9 SPMON_EMSG104_EM PEND H IMPENDING WORKSTATION SUBSYSTEM FAILURE 8D9F2E66 SPMON_EMSG103_EM PEND H EQUIPMENT MALFUNCTION 93FA22BC SPMON_INFO102_TR UNKN H SOFTWARE A1843F1E SPMON_EMSG101_ER PERM H UNABLE TO COMMUNICATE WITH REMOTE NODE E406336B SPMON_EMSG106_ER PERM H RESOURCES NOT ACTIVE E720BFB5 SPMON_INFO100_TR UNKN H POWER OFF DETECTED E91A5929 SPMON_INFO104_TR TEMP H PROBLEM RESOLVED F708903E SPMON_EMSG107_ER PERM H LOSS OF ELECTRICAL POWER
Good results are indicated if the output is similar to the example output. That is, there are about 13 lines where the data in the second column begins with SPMON_. You may not see the same number of these lines, depending on the software level. Proceed to Configuration test 5 - Check sser exits.
Error results are indicated in all other cases. Determine why error record templates for SPMON have not been defined. You may need to contact the IBM Support Center.
If you expected splogd to run user exits, but they are not running, verify that the hwevents file contains the proper line for the user exit. Otherwise, proceed to Operational test 1 - Check System Monitor daemon.
Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes the location of the hwevents file. Verify that there exists a line in this file, for the user exit you wish to run, as described in the hwevents entry of "RS/6000 SP Files and Other Technical Information" in PSSP: Command and Technical Reference.
Good results are indicated if the hwevents file contains the proper line. Operational test 5 - Check user exit mechanism will verify that the mechanism for running user exits is working. Proceed to Operational test 1 - Check System Monitor daemon.
Error results are indicated in all other cases. Add the appropriate line to the hwevents file. Refer to the hwevents file entry of "RS/6000 SP Files and Other Technical Information" in PSSP: Command and Technical Reference, for information about the format of the user exit line.
Use these tests to check that the Logging daemon is operating properly.
This test verifies if the hardware monitor daemon is active, that is, if the hardmon daemon is currently running. The Logging daemon will not operate properly if hardmon is not running. Issue the command:
lssrc -s hardmon
Look for the entry whose Subsystem is hardmon.
Good results are indicated if the Status column is active. Proceed to Operational test 2 - Check for frames responding.
Error results are indicated in all other cases. Issue the command:
startsrc -s hardmon
to start the hardware monitor daemon. Repeat this test, after taking the suggested action. If this test still fails, refer to Diagnosing System Monitor problems to determine why hardmon will not run.
This test verifies if the frames are responding. The Logging daemon will not operate properly if the frames are not responding. To verify that a particular frame is responding, issue the command:
hmmon -GQv controllerResponds F:0
where F is the frame number you are checking. If there is more than one frame, you should check all frames. For example, if you have frames 1 through 5, you would issue the command:
hmmon -GQv controllerResponds 1-5:0
The output is similar to the following:
frame F, slot 00: TRUE frame responding to polls
Good results are indicated if the value is TRUE for each frame. Proceed to Operational test 3 - Check Logging daemon status.
Error results are indicated if the value is FALSE, or you do not receive any output. Refer to Diagnosing System Monitor problems.
This test verifies if the Logging daemon is active, that is, if the splogd daemon is currently running. Issue the command:
lssrc -s splogd
Look for the entry whose Subsystem is splogd.
Good results are indicated If the Status column is active. Proceed to Operational test 4 - Check error daemon status.
Error results are indicated in all other cases. Issue the command:
startsrc -s splogd
Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
This test verifies that the error daemon is running. Perform this test if entries that you expect to receive, are not written to the error log. If you choose to skip this test, because you are not having a problem with error log entries, proceed to Operational test 5 - Check user exit mechanism.
Issue the command:
ps -ef | grep errdemon
The output is similar to:
root 3206 1 0 Jun 19 - 0:40 /usr/lib/errdemon
Good results are indicated by output showing that the errdemon is running. Proceed to Operational test 5 - Check user exit mechanism.
Error results are indicated in all other cases. Determine why the errdemon is not running. Start the errdemon by issuing the command:
/usr/lib/errdemon
Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
This test verifies that the mechanism used by splogd, to run user exits specified in the hwevents file, is working correctly. If you choose to skip this test because you are not having a problem with running user exits, proceed to Operational test 6 - Check writing to syslog file.
Notes:
For example, if the system contains frames 1 through 5, run the command:
hmmon -GQs 1-5:
to find one. Choose a node in a frame that fits this criteria.
In subsequent steps, the frame number will be referred to as F and the node number as N. Always substitute the numeric frame number for the letter F and the numeric node number for the letter N.
F N serialLinkOpen = * c /tmp/splogd_test
where F and N are the frame number and node number from step 5.
For example, if frame number is 3 and node number is 12, the line is:
3 12 serialLinkOpen = * c /tmp/splogd_test
Save this file.
stopsrc -s splogd
The output is a message similar to:
The stop of the splogd Subsystem was completed successfully.
startsrc -s splogd
You should get a message similar to:
The splogd Subsystem has been started. Subsystem PID is nnnnn.
lssrc -s splogd
and verify that Status column indicates active.
s1term -G F N
where F and N are the frame number and node number from step 5.
hmmon -GQs F:N | grep serialLinkOpen
where F and N are the frame number and node number from step 5.
The value of the serialLinkOpen attribute should now be TRUE.
Good results are indicated if the file /tmp/splogd_test_output did not exist when checked in step 10, but existed when checked in step 14. Proceed to Operational test 6 - Check writing to syslog file.
Error results are indicated in all other cases. Perform these steps:
daemon.notice /var/adm/SPlogs/SPdaemon.log daemon.debug /var/adm/SPlogs/SPdaemon.log daemon.info /var/adm/SPlogs/SPdaemon.log
setup_logd
This test verifies that the mechanism used by splogd, to write to the syslog (if specified in the hwevents file) is working correctly. Perform this test only if there is a node that is currently powered on, for which it is safe to power the node off and then back on. If you choose to skip this test, because you are not having a problem with syslog entries, or you are not able to power off and on a node, proceed to Operational test 7 - Check writing to the AIX Error Log.
Notes:
Perform these steps:
* * * = * b SP_ERROR_LOG
daemon.info /var/adm/SPlogs/SPdaemon.log
stopsrc -s splogd
The output is a message similar to:
The stop of the splogd Subsystem was completed successfully.
startsrc -s splogd
You should get a message similar to:
The splogd Subsystem has been started. Subsystem PID is nnnnn.
lssrc -s splogd
and verify that Status column indicates active for Subsystem splogd.
stopsrc -s syslogd startsrc -s syslogd
hmcmds -Gv off F:N hmcmds -Gv on F:N
where F is the frame number, and N is the node number.
stopsrc -s syslogd startsrc -s syslogd
Good results are indicated if the file /var/adm/SPlogs/SPdaemon.log contains a line, with the current date and time. Make sure that the date and time is current. The line is similar to:
Sep 2 12:42:05 sup1 sphwlog[19826]: LP=PSSP,Fn=splogd.c,SID=I,L#=1339, Information; Node F:N; powerLED; Power is off.
Proceed to Operational test 7 - Check writing to the AIX Error Log.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.
This test verifies that the mechanism used by splogd, to write to the error log (if specified in the hwevents file), is working correctly. Perform this test only if there is a node that is currently powered on, for which it is safe to power the node off and then back on. If you choose to skip this test, because you are not having a problem with error log entries, or you are not able to power off and on a node, proceed to Operational test 8 - Check writing to splogd.state_changes file.
Notes:
Perform these steps:
* * * = * b SP_ERROR_LOG
stopsrc -s splogd
The output is a message similar to:
The stop of the splogd Subsystem was completed successfully.
startsrc -s splogd
You should get a message similar to:
The splogd Subsystem has been started. Subsystem PID is nnnnn.
lssrc -s splogd
Verify that the status column has active for subsystem splogd.
hmcmds -Gv off F:N hmcmds -Gv on F:N
where F is the frame number, and N is the node number.
Good results are indicated if you see, in the error log, these two entries and they have just been made (meaning since you performed 5), and they contain timestamps:
LABEL: SPMON_INFO100_TR LABEL: SPMON_INFO101_TR
The SPMON_INFO100_TR entry indicated that Power Off was detected. The entry SPMON_INFO101_TR indicated that Power On was detected. You can view the error log by running the command:
errpt -a
You should redirect the output to a file. Proceed to Operational test 8 - Check writing to splogd.state_changes file.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.
This test verifies that the mechanism used by splogd, to write to the /var/adm/SPlogs/spmon/splogd.state_changes.timestamp file, if specified by the hwevents file, is working correctly. If you choose to skip this test, because you are not having a problem with the writing to the splogd.state_changes.timestamp file, you are finished with Logging daemon tests.
Notes:
Perform these steps:
For example, if the system contains frames 1 through 5, issue the command hmmon -GQs 1-5: to find one. Choose a node in a frame that fits this criteria.
In subsequent steps, the frame number will be referred to as F and the node number as N. Always substitute the numeric frame number for the letter F and the numeric node number for the letter N.
* * * = * c SP_STATE_LOG
Save this file.
stopsrc -s splogd
You should receive a message similar to:
The stop of the splogd Subsystem was completed successfully.
startsrc -s splogd
You should receive a message similar to:
The splogd Subsystem has been started. Subsystem PID is nnnnn.
lssrc -s splogd
Verify that the Status column indicates active.
s1term -G F N
where F and N are the frame number and node number from step 2.
hmmon -GQs F:N | grep serialLinkOpen
where F and N are the frame number and node number from step 2.
The value of the serialLinkOpen attribute should now be TRUE.
F N serialLinkOpen TRUE F N serialLinkOpen FALSE
Good results are indicated if the file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp was created, and contained the state change, when checked in step 11. You are finished with SP Logging Daemon testing.
Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.