Use these procedures to determine the cause of an SDR failure.
To find out which system partition you are currently in, issue the following command:
echo $SP_NAME
If SP_NAME is set, SDR commands are directed to the sdrd daemon for the hostname or IP address that SP_NAME represents. Make sure that your SP_NAME is not set to an unexpected partition. If SP_NAME is not set, SDR commands are directed to the primary partition that is defined in the /etc/SDR_dest_info file. See Action 2 - Analyze system or network changes for the expected format of the /etc/SDR_dest_info file.
The sdrd daemons are under the AIX SRC control, and therefore they can be queried using the lssrc -g command. There is one daemon for each system partition, and all of the daemons are in the sdr group. To query a single system partition, issue:
lssrc -s sdr.partition_name
where partition_name is the short hostname for the sdrd being queried.
To stop or start all of the sdrd daemons, use the -g flag on the stopsrc or startsrc commands. This targets the whole sdr group. For example:
stopsrc -g sdr
Sometimes the sdrd does not stop right away. The lssrc command can be used to make sure that an sdrd daemon has stopped. If the sdrd daemon has not stopped, the kill -9 command can be used to bring it down immediately.
The sdr command can also be used to start and stop the SDR. Issuing the command:
/usr/lpp/ssp/bin/sdr reset
stops and starts the SDR in the current partition. Using the -spname option, a specific partition's sdrd can be stopped, started, or reset.
To see which sdrd daemons are running in the same system partition or an incorrect system partition name, issue the command:
ps -ef | grep sdrd
This can be used to find multiple sdrd daemons that are running in the same system partition. This situation is an error.
To see if there are sdrd memory leaks, issue these commands:
ps gvc | grep PID ps gvc | grep sdrd
Issue these commands at regular intervals to check for a memory leak.
The SIZE and RSS fields indicate memory usage. They should not grow larger constantly over time. Be aware that the first time a class is accessed, it is cached in the sdrd, causing memory usage to increase. Also, a larger SP system has bigger sdr classes and therefore an sdrd that requires more memory.
The following sequence may be entered at the command line or put into a script to track sdrd CPU usage:
ps gvc | grep PID > /tmp/sdrd.psgvc while true do ps gvc | grep sdrd >> /tmp/sdrd.psgvc sleep 60 done
The %CPU field indicates how much CPU the sdrd is consuming. This will increase when clients are making many requests of the SDR. Typically, this peaks when nodes are booted and when IBM Virtual Shared Disks and GPFS are initialized.
A command that connects to the sdrd of each system partition to retrieve system partition information is:
splstdata -p
If this command hangs when trying to connect to a particular system partition, the sdrd of that partition is most likely hung.
If this happens, use the dbx command to search for more information about the hang:
dbx -a sdrdpid
where sdrdpid is the process id of the sdrd that is hung.
Within dbx, the following subcommands are used to gather more information:
If there is a core dump from the sdrd, it is located in the /var/adm/SPlogs/sdr directory. To force a core dump, issue:
kill -ABRT sdrdpid
where sdrdpid is the process id of the sdrd that is to be dumped. Copy the core file to a safe place.
The SDR verification test, SDR_test, runs various SDR commands to make sure that objects and files can be created, updated, queried and deleted. See PSSP: Command and Technical Reference for command details. SDR administrator access is required to run this command. To run the command, issue:
/usr/lpp/ssp/bin/SDR_test
To run the command via SMIT, do the following:
Good results indicate that verification succeeded, as in this example:
SDR_test: Start SDR command line verification test SDR_test: Verification succeeded
Error results are indicated by SDR_test error messages. One messages is issued for each failing test. There is also an error message from the specific SDR command that failed.
At the end of the test, there is a message with the number of tests that failed, such as:
SDR_test: 0037-213 Verification failed with 10 errors. See /var/adm/SPlogs/SDR_test.log
With National Language Support, two new fields have been added to the SP class in the SDR. These fields control whether MBCS (Multi-Byte Character Set) data may be written to the SDR. These fields are: SDR_ASCII_only and admin_locale.
The SDRScan routine look through the SDR classes and files and return a return code of 1, if any classes and files in the SDR have MBCS data. The file names and class and attribute names where the MBCS data is found are displayed by the command, unless the quiet option is chosen. MBCS data is only a problem if SDR_ASCII_only is true, or if the data is in a language other than the admin_locale language.
SDRScan can be run only by root on the control workstation. Issue this command:
SDRScan
You will receive messages for any non-ASCII data in the SDR. If there is no non-ASCII data, no messages are issued, and the return code will be zero. If you are running ksh, issue the command:
echo $?
to see the return code.