IBM Books

Diagnosis Guide


Diagnostic procedures

Use these procedures to determine the cause of an SDR failure.

Check current system partition

To find out which system partition you are currently in, issue the following command:

echo $SP_NAME

If SP_NAME is set, SDR commands are directed to the sdrd daemon for the hostname or IP address that SP_NAME represents. Make sure that your SP_NAME is not set to an unexpected partition. If SP_NAME is not set, SDR commands are directed to the primary partition that is defined in the /etc/SDR_dest_info file. See Action 2 - Analyze system or network changes for the expected format of the /etc/SDR_dest_info file.

Query the state of the sdrd

The sdrd daemons are under the AIX SRC control, and therefore they can be queried using the lssrc -g command. There is one daemon for each system partition, and all of the daemons are in the sdr group. To query a single system partition, issue:

lssrc -s sdr.partition_name

where partition_name is the short hostname for the sdrd being queried.

Stop and start the sdrd

To stop or start all of the sdrd daemons, use the -g flag on the stopsrc or startsrc commands. This targets the whole sdr group. For example:

stopsrc -g sdr

Sometimes the sdrd does not stop right away. The lssrc command can be used to make sure that an sdrd daemon has stopped. If the sdrd daemon has not stopped, the kill -9 command can be used to bring it down immediately.

The sdr command can also be used to start and stop the SDR. Issuing the command:

/usr/lpp/ssp/bin/sdr reset

stops and starts the SDR in the current partition. Using the -spname option, a specific partition's sdrd can be stopped, started, or reset.

Check the sdrd processes using the ps command

To see which sdrd daemons are running in the same system partition or an incorrect system partition name, issue the command:

ps -ef | grep sdrd 

This can be used to find multiple sdrd daemons that are running in the same system partition. This situation is an error.

Check for sdrd memory leaks and CPU utilization

To see if there are sdrd memory leaks, issue these commands:

ps gvc | grep PID 
ps gvc | grep sdrd 

Issue these commands at regular intervals to check for a memory leak.

The SIZE and RSS fields indicate memory usage. They should not grow larger constantly over time. Be aware that the first time a class is accessed, it is cached in the sdrd, causing memory usage to increase. Also, a larger SP system has bigger sdr classes and therefore an sdrd that requires more memory.

The following sequence may be entered at the command line or put into a script to track sdrd CPU usage:

 ps gvc | grep PID > /tmp/sdrd.psgvc 
 while true
  do
  ps gvc | grep sdrd >> /tmp/sdrd.psgvc
  sleep 60
  done 

The %CPU field indicates how much CPU the sdrd is consuming. This will increase when clients are making many requests of the SDR. Typically, this peaks when nodes are booted and when IBM Virtual Shared Disks and GPFS are initialized.

Check for an sdrd hang

A command that connects to the sdrd of each system partition to retrieve system partition information is:

splstdata -p

If this command hangs when trying to connect to a particular system partition, the sdrd of that partition is most likely hung.

If this happens, use the dbx command to search for more information about the hang:

dbx -a sdrdpid 

where sdrdpid is the process id of the sdrd that is hung.

Within dbx, the following subcommands are used to gather more information:

Check for an sdrd core dump

If there is a core dump from the sdrd, it is located in the /var/adm/SPlogs/sdr directory. To force a core dump, issue:

kill -ABRT sdrdpid

where sdrdpid is the process id of the sdrd that is to be dumped. Copy the core file to a safe place.

SDR verification test

The SDR verification test, SDR_test, runs various SDR commands to make sure that objects and files can be created, updated, queried and deleted. See PSSP: Command and Technical Reference for command details. SDR administrator access is required to run this command. To run the command, issue:

/usr/lpp/ssp/bin/SDR_test 

To run the command via SMIT, do the following:

  1. Issue the command: smit SP_verify
  2. Select System Data Repository
Command results are logged in /var/adm/SPlogs/SDR_test.log as well as being sent to stdout and stderr, unless quiet mode is selected. If the command is issued by a user other than root, results are logged in /tmp/SDR_test.log.

Good results indicate that verification succeeded, as in this example:

SDR_test: Start SDR command line verification test
SDR_test: Verification succeeded  

Error results are indicated by SDR_test error messages. One messages is issued for each failing test. There is also an error message from the specific SDR command that failed.

At the end of the test, there is a message with the number of tests that failed, such as:

SDR_test: 0037-213 Verification failed with 10 errors.
 See /var/adm/SPlogs/SDR_test.log

Look for MBCS data in the SDR

With National Language Support, two new fields have been added to the SP class in the SDR. These fields control whether MBCS (Multi-Byte Character Set) data may be written to the SDR. These fields are: SDR_ASCII_only and admin_locale.

The SDRScan routine look through the SDR classes and files and return a return code of 1, if any classes and files in the SDR have MBCS data. The file names and class and attribute names where the MBCS data is found are displayed by the command, unless the quiet option is chosen. MBCS data is only a problem if SDR_ASCII_only is true, or if the data is in a language other than the admin_locale language.

SDRScan can be run only by root on the control workstation. Issue this command:

SDRScan

You will receive messages for any non-ASCII data in the SDR. If there is no non-ASCII data, no messages are issued, and the return code will be zero. If you are running ksh, issue the command:

echo $? 

to see the return code.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]