IBM Books

Diagnosis Guide


Diagnostic procedures

Diagnostic procedures for the Logging daemon consist of installation verification tests, configuration verification tests and operational verification tests.

Installation verification tests

Use these tests to check that the Logging daemon has been properly installed.

Installation test 1 - Verify splogd and setup_logd files

This test verifies that the splogd and setup_logd files have been installed in directory /usr/lpp/ssp/bin/ and that they are executable. On the control workstation, issue the commands:

  1. ls -l /usr/lpp/ssp/bin/splogd
  2. ls -l /usr/lpp/ssp/bin/setup_logd

Verify that these two files exist and are executable.

Good results are indicated if both files exist and are executable. Proceed to Installation test 2 - Verify hwevents file.

Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Installation test 2 - Verify hwevents file

Verify that the hwevents file is in the proper location. Refer to hwevents file, which describes how to determine the file's location.

Good results are indicated if the file exists in the proper location. Proceed to Configuration test 1 - Check for state logging.

Error results are indicated in all other cases. Create an hwevents file in the proper location, as described in hwevents file. You can find a sample file in "RS/6000 SP Files and Other Technical Information" of PSSP: Command and Technical Reference. Modify the file as appropriate. Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Configuration verification tests

Use these tests to check that the Logging daemon has been properly configured.

Configuration test 1 - Check for state logging

If you expected splogd to create file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp, but it was not created, verify that the hwevents file contains the proper line to indicate enabling of state logging. Otherwise, proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.

Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes where the hwevents file is located. Verify that there exists a line in this file of the following format:

    *      *     *            =       *     c    SP_STATE_LOG 
Note:
The seven fields must be separated by white space. The line must not begin with a '#' character, because it will be ignored as a comment.

Good results are indicated if the hwevents file contains the uncommented line described. Proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.

Error results are indicated in all other cases. Add the line to the hwevents file, and proceed to Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files.

Configuration test 2 - Check /etc/syslog.conf and /var/adm/SPlogs/SPdaemon.log files

If you expected splogd to write to the syslog, but it did not, verify the contents of file /etc/syslog.conf. Also, verify the existence of file /var/adm/SPlogs/SPdaemon.log. Otherwise, go to Configuration test 3 - Check hwevents file.

In file /etc/syslog.conf verify that there exists one of the following three lines:

              daemon.notice   /var/adm/SPlogs/SPdaemon.log
              daemon.debug    /var/adm/SPlogs/SPdaemon.log
              daemon.info     /var/adm/SPlogs/SPdaemon.log

Verify that file /var/adm/SPlogs/SPdaemon.log exists, and that its permissions are set to 644.

Good results are indicated if the /etc/syslog.conf file contains one of the three lines described, and the SPdaemon.log file exists with permissions 644. Proceed to Configuration test 3 - Check hwevents file.

Error results are indicated in all other cases. Perform these steps:

  1. Delete the following lines, if any of them exist, from the /etc/syslog.conf file. Do not simply comment these lines out. They must be deleted from the file for this step.
                     daemon.notice         /var/adm/SPlogs/SPdaemon.log
                     daemon.debug          /var/adm/SPlogs/SPdaemon.log
                     daemon.info           /var/adm/SPlogs/SPdaemon.log
    
  2. Issue the command:
    setup_logd
    
  3. Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Configuration test 3 - Check hwevents file

If you expected splogd to write error information to the AIX error log, but it did not, verify that the hwevents file contains the proper line to enable error logging. Otherwise, proceed to Configuration test 5 - Check sser exits.

Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes the location of the hwevents file. Verify that there exists a line in this file of the following format:

  *      *       *                  =      *     b    SP_ERROR_LOG
Note:
The seven fields must be separated by white space. The line must not begin with a '#' character, because it will be ignored as a comment.

Good results are indicated if the hwevents file contains the uncommented line described here. Proceed to Configuration test 4 - Check SPMON error record templates.

Error results are indicated in all other cases. Add the line to the hwevents file and proceed to Configuration test 4 - Check SPMON error record templates.

Configuration test 4 - Check SPMON error record templates

This test verifies that the error record templates for SPMON have been defined. Issue the command:

errpt -t | grep SPMON_

The output is similar to:

001BB5DD SPMON_EMSG102_ER   PERM H  COMMUNICATIONS SUBSYSTEM FAILURE       
0D1620A8 SPMON_INFO103_TR   UNKN H  THRESHOLD HAS BEEN EXCEEDED            
3E6F3CE7 SPMON_INFO101_TR   UNKN H  ELECTRICAL POWER RESUMED               
4CEF5A08 SPMON_EMSG100_ER   PERM H  LINK ERROR                             
6469E2D8 SPMON_EMSG108_ER   PERM S  UNABLE TO COMMUNICATE WITH REMOTE NODE 
76A4FAD9 SPMON_EMSG104_EM   PEND H  IMPENDING WORKSTATION SUBSYSTEM FAILURE
8D9F2E66 SPMON_EMSG103_EM   PEND H  EQUIPMENT MALFUNCTION                  
93FA22BC SPMON_INFO102_TR   UNKN H  SOFTWARE                               
A1843F1E SPMON_EMSG101_ER   PERM H  UNABLE TO COMMUNICATE WITH REMOTE NODE 
E406336B SPMON_EMSG106_ER   PERM H  RESOURCES NOT ACTIVE                   
E720BFB5 SPMON_INFO100_TR   UNKN H  POWER OFF DETECTED                     
E91A5929 SPMON_INFO104_TR   TEMP H  PROBLEM RESOLVED                       
F708903E SPMON_EMSG107_ER   PERM H  LOSS OF ELECTRICAL POWER                

Good results are indicated if the output is similar to the example output. That is, there are about 13 lines where the data in the second column begins with SPMON_. You may not see the same number of these lines, depending on the software level. Proceed to Configuration test 5 - Check sser exits.

Error results are indicated in all other cases. Determine why error record templates for SPMON have not been defined. You may need to contact the IBM Support Center.

Configuration test 5 - Check sser exits

If you expected splogd to run user exits, but they are not running, verify that the hwevents file contains the proper line for the user exit. Otherwise, proceed to Operational test 1 - Check System Monitor daemon.

Locate the hwevents file that is currently being used by splogd. Refer to hwevents file, which describes the location of the hwevents file. Verify that there exists a line in this file, for the user exit you wish to run, as described in the hwevents entry of "RS/6000 SP Files and Other Technical Information" in PSSP: Command and Technical Reference.

Good results are indicated if the hwevents file contains the proper line. Operational test 5 - Check user exit mechanism will verify that the mechanism for running user exits is working. Proceed to Operational test 1 - Check System Monitor daemon.

Error results are indicated in all other cases. Add the appropriate line to the hwevents file. Refer to the hwevents file entry of "RS/6000 SP Files and Other Technical Information" in PSSP: Command and Technical Reference, for information about the format of the user exit line.

Operational verification tests

Use these tests to check that the Logging daemon is operating properly.

Operational test 1 - Check System Monitor daemon

This test verifies if the hardware monitor daemon is active, that is, if the hardmon daemon is currently running. The Logging daemon will not operate properly if hardmon is not running. Issue the command:

 lssrc -s hardmon

Look for the entry whose Subsystem is hardmon.

Good results are indicated if the Status column is active. Proceed to Operational test 2 - Check for frames responding.

Error results are indicated in all other cases. Issue the command:

startsrc -s hardmon

to start the hardware monitor daemon. Repeat this test, after taking the suggested action. If this test still fails, refer to Diagnosing System Monitor problems to determine why hardmon will not run.

Operational test 2 - Check for frames responding

This test verifies if the frames are responding. The Logging daemon will not operate properly if the frames are not responding. To verify that a particular frame is responding, issue the command:

hmmon -GQv controllerResponds F:0

where F is the frame number you are checking. If there is more than one frame, you should check all frames. For example, if you have frames 1 through 5, you would issue the command:

hmmon -GQv controllerResponds 1-5:0

The output is similar to the following:

  frame F, slot 00:
     TRUE  frame responding to polls

Good results are indicated if the value is TRUE for each frame. Proceed to Operational test 3 - Check Logging daemon status.

Error results are indicated if the value is FALSE, or you do not receive any output. Refer to Diagnosing System Monitor problems.

Operational test 3 - Check Logging daemon status

This test verifies if the Logging daemon is active, that is, if the splogd daemon is currently running. Issue the command:

lssrc -s splogd

Look for the entry whose Subsystem is splogd.

Good results are indicated If the Status column is active. Proceed to Operational test 4 - Check error daemon status.

Error results are indicated in all other cases. Issue the command:

startsrc -s splogd

Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Operational test 4 - Check error daemon status

This test verifies that the error daemon is running. Perform this test if entries that you expect to receive, are not written to the error log. If you choose to skip this test, because you are not having a problem with error log entries, proceed to Operational test 5 - Check user exit mechanism.

Issue the command:

ps -ef | grep errdemon

The output is similar to:

root  3206     1   0   Jun 19      -  0:40 /usr/lib/errdemon

Good results are indicated by output showing that the errdemon is running. Proceed to Operational test 5 - Check user exit mechanism.

Error results are indicated in all other cases. Determine why the errdemon is not running. Start the errdemon by issuing the command:

/usr/lib/errdemon

Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Operational test 5 - Check user exit mechanism

This test verifies that the mechanism used by splogd, to run user exits specified in the hwevents file, is working correctly. If you choose to skip this test because you are not having a problem with running user exits, proceed to Operational test 6 - Check writing to syslog file.

Notes:

  1. This test cannot be run if any of the following are true:

  2. This test assumes that the hwevents file is in the /spdata/sys1/spmon/ directory. If the file is in another directory, you may still follow these instructions, but substitute the actual path to the location of your hwevents file. Refer to hwevents file, which describes how to determine the location of the hwevents file.

Perform these steps:

  1. Create a trivial program in C, Perl, or another language that simply creates a file named /tmp/splogd_test_output, and writes some text to it. Name the program /tmp/splogd_test. This assumes that these two files do not already exist. If they do, substitute your own file names in this procedure.
  2. Verify that running /tmp/splogd_test creates file /tmp/splogd_test_output.
  3. Remove file splogd_test_output.
  4. Copy the /spdata/sys1/spmon/hwevents file to a backup copy, so that the original hwevents file can be restored at the end of this test.
  5. Run the hmmon command to find a node that has the serialLinkOpen attribute and whose current value of this attribute is FALSE.

    For example, if the system contains frames 1 through 5, run the command:

    hmmon -GQs 1-5:
    

    to find one. Choose a node in a frame that fits this criteria.

    In subsequent steps, the frame number will be referred to as F and the node number as N. Always substitute the numeric frame number for the letter F and the numeric node number for the letter N.

  6. Edit file /spdata/sys1/spmon/hwevents and place the '#' character in the leftmost column of every line in the file. This makes every line a comment. Add the following line to this file, in any location:
      F  N  serialLinkOpen  =  *  c  /tmp/splogd_test 
    

    where F and N are the frame number and node number from step 5.

    For example, if frame number is 3 and node number is 12, the line is:

    3   12   serialLinkOpen  =  *  c  /tmp/splogd_test
    

    Save this file.

  7. Issue this command to stop splogd:
    stopsrc -s splogd
    

    The output is a message similar to:

    The stop of the splogd Subsystem was completed successfully.
    
  8. Issue this command to start splogd:
    startsrc -s splogd
    

    You should get a message similar to:

    The splogd Subsystem has been started. Subsystem PID is nnnnn.
    
  9. Issue this command:
    lssrc -s splogd
    

    and verify that Status column indicates active.

  10. Verify that the file /tmp/splogd_test_output has not yet been created.
  11. Issue the following command to open up an s1term:
    s1term -G F N
    

    where F and N are the frame number and node number from step 5.

  12. Wait 10 seconds and, in another window, issue the command:
    hmmon -GQs F:N | grep serialLinkOpen
    

    where F and N are the frame number and node number from step 5.

    The value of the serialLinkOpen attribute should now be TRUE.

  13. Close the s1term (opened in step 11), by typing the terminal interrupt key, which is usually <Ctrl-c>.
  14. Verify that the file /tmp/splogd_test_output has been created.
  15. Copy the backup copy of the hwevents file that you created in step 4, to /spdata/sys1/spmon/hwevents, to restore this file.

Good results are indicated if the file /tmp/splogd_test_output did not exist when checked in step 10, but existed when checked in step 14. Proceed to Operational test 6 - Check writing to syslog file.

Error results are indicated in all other cases. Perform these steps:

  1. Delete the following lines, if any exist, from the /etc/syslog.conf file. Do not simply comment these lines out. They must be deleted from the file for this step.
                daemon.notice         /var/adm/SPlogs/SPdaemon.log
                daemon.debug          /var/adm/SPlogs/SPdaemon.log
                daemon.info           /var/adm/SPlogs/SPdaemon.log
    
  2. Issue the command:
    setup_logd
    
  3. Repeat this test. If the problem persists, record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Operational test 6 - Check writing to syslog file

This test verifies that the mechanism used by splogd, to write to the syslog (if specified in the hwevents file) is working correctly. Perform this test only if there is a node that is currently powered on, for which it is safe to power the node off and then back on. If you choose to skip this test, because you are not having a problem with syslog entries, or you are not able to power off and on a node, proceed to Operational test 7 - Check writing to the AIX Error Log.

Notes:

  1. This test cannot be run if any of the following are true:

  2. This test assumes that the hwevents file is in the /spdata/sys1/spmon/ directory. If the file is in another directory, you may still follow these instructions, but substitute the actual path to the location of your hwevents file. Refer to hwevents file, which describes how to determine the location of the hwevents file.

Perform these steps:

  1. Edit file /spdata/sys1/spmon/hwevents and add the following line, if it does not already exist. Make sure this line is not commented out (# in leftmost column).
        *    *    *    =    *    b    SP_ERROR_LOG
    
  2. Copy the /etc/syslog.conf file to a backup copy, so that it can be restored at the end of this test.
  3. Modify the original /etc/syslog.conf file so that all lines are commented out (# in leftmost column), and add the following line:
      daemon.info           /var/adm/SPlogs/SPdaemon.log
    
  4. Issue this command to stop splogd:
    stopsrc -s splogd
    

    The output is a message similar to:

    The stop of the splogd Subsystem was completed successfully.
    
  5. Issue this command to start splogd:
    startsrc -s splogd
    

    You should get a message similar to:

    The splogd Subsystem has been started. Subsystem PID is nnnnn.
    
  6. Issue the following command:
    lssrc -s splogd
    

    and verify that Status column indicates active for Subsystem splogd.

  7. Issue the following commands to stop and start the syslogd daemon, so that it rereads the syslog.conf file:
    stopsrc  -s syslogd
    startsrc -s syslogd
    
  8. Choose a node N, in a frame F, that is currently powered on, which you can safely power off and then back on. Power off and then power on this node by issuing the commands:
    hmcmds -Gv off  F:N
    hmcmds -Gv on   F:N
    

    where F is the frame number, and N is the node number.

  9. Copy the backup copy of the syslog.conf file that you created in step 2, to /etc/syslog.conf, to restore this file.
  10. Issue the following commands to stop and start the syslogd daemon, so that it rereads the original syslog.conf file:
    stopsrc  -s syslogd
    startsrc -s syslogd
    
  11. If you modified the original hwevents file in step 1, you may want to comment out the line that you had added or uncommented, and repeat steps 4 and 5.

Good results are indicated if the file /var/adm/SPlogs/SPdaemon.log contains a line, with the current date and time. Make sure that the date and time is current. The line is similar to:

 Sep  2 12:42:05 sup1 sphwlog[19826]: LP=PSSP,Fn=splogd.c,SID=I,L#=1339,
 Information; Node F:N; powerLED; Power is off.

Proceed to Operational test 7 - Check writing to the AIX Error Log.

Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.

Operational test 7 - Check writing to the AIX Error Log

This test verifies that the mechanism used by splogd, to write to the error log (if specified in the hwevents file), is working correctly. Perform this test only if there is a node that is currently powered on, for which it is safe to power the node off and then back on. If you choose to skip this test, because you are not having a problem with error log entries, or you are not able to power off and on a node, proceed to Operational test 8 - Check writing to splogd.state_changes file.

Notes:

  1. This test cannot be run if any of the following are true:

  2. This test assumes that the hwevents file is in the /spdata/sys1/spmon/ directory. If the file is in another directory, you may still follow these instructions, but substitute the actual path to the location of your hwevents file. Refer to hwevents file, which describes how to determine the location of the hwevents file.

Perform these steps:

  1. Edit file /spdata/sys1/spmon/hwevents and add the following line, if it does not already exist. Make sure this line is not comments out (# in leftmost column).
        *    *    *    =    *    b    SP_ERROR_LOG
    
  2. Issue this command to stop splogd:
    stopsrc -s splogd
    

    The output is a message similar to:

    The stop of the splogd Subsystem was completed successfully.
    
  3. Issue this command to start splogd:
    startsrc -s splogd
    

    You should get a message similar to:

    The splogd Subsystem has been started. Subsystem PID is nnnnn.
    
  4. Issue the following command:
    lssrc -s splogd
    

    Verify that the status column has active for subsystem splogd.

  5. Choose a node N, in a frame F, that is currently powered on and which you can safely power off and back on. Power off and then power on this node, by issuing the commands:
    hmcmds -Gv off F:N
    hmcmds -Gv on F:N
    

    where F is the frame number, and N is the node number.

  6. If you modified the original hwevents file in 1, you may want to comment out the line that you had added or uncommented, and repeat steps 2 and 3.

Good results are indicated if you see, in the error log, these two entries and they have just been made (meaning since you performed 5), and they contain timestamps:

LABEL:  SPMON_INFO100_TR
LABEL:  SPMON_INFO101_TR

The SPMON_INFO100_TR entry indicated that Power Off was detected. The entry SPMON_INFO101_TR indicated that Power On was detected. You can view the error log by running the command:

errpt -a 

You should redirect the output to a file. Proceed to Operational test 8 - Check writing to splogd.state_changes file.

Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center, and contact the IBM Support Center.

Operational test 8 - Check writing to splogd.state_changes file

This test verifies that the mechanism used by splogd, to write to the /var/adm/SPlogs/spmon/splogd.state_changes.timestamp file, if specified by the hwevents file, is working correctly. If you choose to skip this test, because you are not having a problem with the writing to the splogd.state_changes.timestamp file, you are finished with Logging daemon tests.

Notes:

  1. This test cannot be run if any of the following are true:

  2. This test assumes that the hwevents file is in the /spdata/sys1/spmon/ directory. If the file is in another directory, you may still follow these instructions, but substitute the actual path to the location of your hwevents file. Refer to hwevents file, which describes how to determine the location of the hwevents file.

Perform these steps:

  1. Copy the /spdata/sys1/spmon/hwevents file to a backup copy, so that the hwevents file can be restored at the end of this test.
  2. Run the hmmon command to find a node that has the serialLinkOpen attribute and whose current value of this attribute is FALSE.

    For example, if the system contains frames 1 through 5, issue the command hmmon -GQs 1-5: to find one. Choose a node in a frame that fits this criteria.

    In subsequent steps, the frame number will be referred to as F and the node number as N. Always substitute the numeric frame number for the letter F and the numeric node number for the letter N.

  3. Edit file /spdata/sys1/spmon/hwevents and place the '#' character in the leftmost column of every line in the file. This makes every line a comment. Add the following line to this file in any location:
       *      *     *            =       *     c    SP_STATE_LOG
    

    Save this file.

  4. Copy file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp to a backup file, and then delete /var/adm/SPlogs/spmon/splogd.state_changes.timestamp. The timestamp part of this file name is numeric, related to the current date.
  5. Issue the following command to stop splogd:
    stopsrc -s splogd
    

    You should receive a message similar to:

    The stop of the splogd Subsystem was completed successfully.
    
  6. Issue the following command to start splogd:
    startsrc -s splogd
    

    You should receive a message similar to:

    The splogd Subsystem has been started. Subsystem PID is nnnnn.
    
  7. Issue the following command:
    lssrc -s splogd
    

    Verify that the Status column indicates active.

  8. Issue the following command to open up an s1term:
    s1term -G F N
    

    where F and N are the frame number and node number from step 2.

  9. Wait 10 seconds and, in another window, issue the command:
    hmmon -GQs F:N | grep serialLinkOpen
    

    where F and N are the frame number and node number from step 2.

    The value of the serialLinkOpen attribute should now be TRUE.

  10. Close the s1term (opened in step 8), by typing the terminal interrupt key, which is usually <Ctrl-c>.
  11. Verify that the file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp has been created, and the state change is present in the file. The timestamp part of the file name will actually appear as numbers, related to the current date. You should see lines similar to the following inside the file:
           F N serialLinkOpen TRUE
           F N serialLinkOpen FALSE
    
  12. Copy the backup copy of the hwevents file that you created in step 1, to /spdata/sys1/spmon/hwevents to restore this file.

Good results are indicated if the file /var/adm/SPlogs/spmon/splogd.state_changes.timestamp was created, and contained the state change, when checked in step 11. You are finished with SP Logging Daemon testing.

Error results are indicated in all other cases. Record all relevant information, see Information to collect before contacting the IBM Support Center and contact the IBM Support Center.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]