Use the following table to diagnose problems with the Job Switch Resource
Table (JSRT) Services component of PSSP. Locate the symptom and perform
the action described in the following table.
Table 65. Job Switch Resource Table (JSRT) Services symptoms
Symptom | Recovery |
---|---|
Cannot load or unload a Job Switch Resource Table (JSRT) on a node. |
"Action 1 - Verify JSRT Services installation" "Action 2 - Check the JSRT services log file" "Action 3 - Request more detailed log information" "Action 5 - Check the switch_node_number file" "Action 6 - Check the current status of JSRT Services for a node"
|
Cannot obtain the status of a JSRT window. |
"Action 1 - Verify JSRT Services installation" |
Cannot run the switchtbld daemon. | |
st_status command hangs. No messages are displayed. | |
Cannot load program st_status. | |
st_status returns ST_SYSTEM_ERROR or ST_LOADED_BYOTHER for a window. | |
st_status returns ST_RESERVED status for a window. | |
The st_status command runs in DCE and compatibility mode and issues the message: swtbl_status: 2511-508 Time out while waiting for a response from host hostname. |
"Action 13 - Ensure that the client and master DCE servers are running."
|
API returned ST_NOT_AUTHEN, ST_NOT_AUTHOR or
ST_SECURITY_ERROR.
|
Action 2 - Check the JSRT services log file "Action 3 - Request more detailed log information" "Action 12 - Ensure that the caller is a member of the appropriate DCE security group"
|
When issued on the control workstation, the st_verify script checks the installation of the JSRT Services on every node that is defined in the current system partition. When issued on a single node, it verifies the installation of the JSRT Services on only that node.
The user must be logged in as root in order to perform this action. Run installation verification tests using SMIT, or from the command line, to ensure that installation is complete.
TYPE smit SP_verify (the Installation/Configuration Menu appears) SELECT Job Switch Resource Table Services Installation PRESS Enter
Using the command line, enter: /usr/lpp/ssp/bin/st_verify .
The st_verify script checks that the correct files and directories were installed and that the necessary entries exist in the files. The files and directories are /etc/services , /etc/inittab , and /etc/inetd.conf.
The st_verify script produces an output log, located in /var/adm/SPlogs/st/st_verify.log (by default) or in a location that you specify. After completion, a message is written to stdout stating whether the verification passed or failed. If a failure occurred, examine the log for a list of the errors that were found.
Good results are indicated when a message similar to the following is written to stdout:
Verifying installation of the Job Switch Resource Table Services on node 0. JSRT Services installation verification SUCCESSFUL on node 0. Check /var/adm/SPlogs/st/st_verify.log file for further details.
Error results are indicated when a message similar to the following is written to stdout:
Verifying installation of the Job Switch Resource Table Services on node 6. JSRT Services installation verification FAILED on node 6. Total of 1 errors found on node 6.
If the test fails, check the /var/adm/SPlogs/st/st_verify.log file (or specified log) for further details about which files or directories are missing. Reinstall the ssp.st file set if necessary.
The JSRT Services maintain a single log file, st_log, which is located in: /var/adm/SPlogs/st. This log is located on every node where the services are used. For example, if the swtbl_load_job API is used, entries are found on the local node where the API was invoked, and entries are also found on the nodes that were being loaded by the swtbl_load_job API.
Examine the logs and correct any obvious problems that have been identified.
The following table indicates return codes that may appear in the log. They are defined in /usr/lpp/ssp/include/st_client.h.
If a return code of 2 (ST_NOT_AUTHOR), 3 (ST_NOT_AUTHEN), or 20
(ST_SECURITY_ERROR) appears in the log, see Diagnosing SP Security Services problems and Diagnosing Per Node Key Management (PNKM) problems.
Table 66. JSRT Services return codes
To facilitate debugging, you can set several environment variables before invoking a JSRT Service. The variables provide more detailed information in the st_log file.
The first environment variable is SWTBLAPIERRORMSGS, and it must be set to yes within the caller's environment.
For example, as a ksh user, enter:
export SWTBLAPIERRORMSGS=yes
This is an example of the more detailed log information for a call to swtbl_load_table:
Thu Jun 18 10:12:22 1998: swtbl_load_table: INPUT PARAMETERS: uid - 0 pid - 19118 job_key - 1 requestor_node - k10n11.ppd.pok.ibm.com num_tasks - 1 job_desb - load client test Thu Jun 18 10:12:22 1998: swtbl_load_table: INPUT PARAMETERS virtual task id=0 switch_node_number=10 window_id=1
The second environment variable is SWTBLSECMSGLEVEL and it can be set
to a an integer value which represents a level of detail. See Table 67. If a value greater than 4 is used, the highest level
of tracing (4) is assumed.
Table 67. SWTBLSECMSGLEVEL environment variable values
Value | Level of detail |
---|---|
0 | No tracing is performed. |
1 | Trace error conditions. |
2 | Trace significant flow or data. |
3 | Trace calls and returns from function. |
4 | Trace flows and data - maximum detail. |
For example, as a ksh user, enter:
export SWTBLSECMSGLEVEL=4
This is an example of the more detailed log information for a call to swtbl_load_job:
Wed Aug 25 09:43:25 1999: Ssobj: spsec_authenticate_client failed - cl=c187n12.ppd.pok.ibm.com, gr=switchtbld-status, inf=2502-611 An argument is missing or not valid. Wed Aug 25 09:43:25 1999: Ssobj::addError(eC=7, s=3, r=0, e=0, Error 0, ex=0, er=0, sH = c187n12.ppd.pok.ibm.com Wed Aug 25 09:43:25 1999: Ssobj::addError(g=switchtbld-status, sE=1) Wed Aug 25 09:43:25 1999: Ssobj::addError(sA=2502-611 An argument is missing or not valid.) Wed Aug 25 09:43:25 1999: Ssobj::addError() - Return 7, p=201322e8 Wed Aug 25 09:43:25 1999: Ssobj::returnErrors(s=3, host=c187n12.ppd.pok.ibm.com)
The JSRT Services maintains a set of data files that are located in the /spdata/sys1/st directory on every node. Verify that the directory exists and that the files have root access. Note that the files are not created until the load or unload services have been invoked. These files are also removed when a node is rebooted.
The /spdata/sys1/st/switch_node_number file contains a single integer that represents the switch node number of the node. This file can be read using an AIX editor, or by issuing the cat command. The integer in the file should match the switch_node_number attribute in the SDR Node class for that node. Issue the /usr/lpp/ssp/bin/st_set_switch_number installation script on every node to create the switch_node_number file and set the correct value.
The following messages in the st_log indicate that there is a problem with the switch node number:
/usr/lpp/ssp/bin/st_set_switch_number
The st_status command shows you the current status of the JSRT windows on all nodes or on the node specified. This tells you whether the JSRT windows are loaded, unloaded, reserved by another subsystem, or in error.
To show the status of JSRT windows on all nodes within the current system partition, issue st_status.
To show the status of all JSRT windows on node k10n15, issue: st_status k10n15. Output similar to the following appears:
************************************************************* Status from node: k10n15 User: root Load request from: k10n15 Pid: 12494 Uid: 0 Job Description: No_job_description_given Time of request:Wed_Jan_24_13:38:21_EDT_1998 Adapter: /dev/css0 Memory Allocated: 10000 Window id: 0 ************************************************************* Node k10n15 adapter /dev/css0 window 1 returned ST_RESERVED. Window 1 is RESERVED by VSD. ************************************************************* Node k10n15 adapter /dev/css0 window 2 returned ST_SWITCH_NOT_LOADED ************************************************************* Node k10n15 adapter /dev/css0 window 3 returned ST_SWITCH_NOT_LOADED
If the st_status command hangs, it is trying to contact a node which has lost its TCP/IP communication path. If the command is issued without any arguments, it will try to contact every node in the current SP system partition. If one of the nodes is down on the Ethernet, the command will not return until the TCP/IP timeout value has been reached.
Issue the SDRGetObjects command to determine the nodes in the current SP system partition. For example,
SDRGetObjects Node reliable_hostnameIssue the AIX ping command using the names returned under reliable_hostname from the SDRGetObjects command to determine connectivity.
Now, reissue the st_status command with a specific list of node names.
The st_status command is compiled with a LIBPATH of /usr/lpp/ssp/lib:/usr/lib:/lib. If your LIBPATH conflicts with this, then issue the command as follows:
LIBPATH=/usr/lpp/ssp/lib st_status
The st_clean_table command forces the unload of the Job Switch Resource Table for a specified window on the specified node. See the st_clean_table command man page for more details.
The chgcss command can be used to view and manipulate reserved user space windows. See the chgcss command man page for more details.
The swtbl_adapter_resources API returns the configured resources of the specified adapter on the node from which it is invoked. See the swtbl_adapter_resources API man page for more details.
These are the DCE security groups defined for the JSRT Services:
Table 68. DCE security groups for JSRT Services
If the client DCE servers are running and the master DCE server is not, a timeout can occur even if DCE and compatibility mode is specified. Either stop the client DCE servers by issuing the stopsrc command, or start the master DCE server by issuing the startsrc command.