Administration Guide

Topology Services procedures

Normally, the Topology Services subsystem runs itself without requiring administrator intervention. On occasion, you might need to check the status of the subsystem.

Displaying the status of the Topology Services daemon

You can display the operational status of the Topology Services daemon by issuing the lssrc command. Topology Services monitors the SP Ethernet and the SP switch networks, however, only the SP Ethernet is monitored from the control workstation. To see the status of both networks you need to run the command on a node that is up on the switch.

On the control workstation, enter:

lssrc -ls hats.syspar_name

where syspar_name is the name of the system partition.

On a node, enter:

lssrc -ls hats

In response, the lssrc command writes the status information to the standard output. The information includes:

The information provided by the lssrc -s hats command (short form).
|Six lines for each Network for which this node has an adapter and includes the following information:
- The network name.
- The network index.
- The number of defined members, number of adapters that the configuration reported existing for this network.
- The number of members, number of adapters currently in the membership group.
- The state of the membership group, denoted by S (Stable), U (Unstable), or D (Disabled).
- Adapter ID, the address and instance number for the local adapter in this membership group.
- Group ID, the address and instance number of the membership group. The address of the membership group is also the address of the Group Leader.
- Adapter interface name.
- HB Interval, which corresponds to the Frequency attribute in the SDR TS_Config class. This exists both on a per network basis and a default value which could be different.
- HB Sensitivity, which corresponds to the Sensitivity attribute in the SDR TS_Config class. This exists both on a per network basis and a default value which could be different.
- |Two lines of the network adapter statistics.
- |The PID of the NIMs.
The number of clients connected and the client process IDs and command names.
Configuration Instance, the Instance number of the Machines List file.
The address of the Control Workstation.
Whether the daemon is working in a DCE security environment. If it is, the version number of the key used for mutual authentication is also included.
|The segments pinned. This is NONE, a combination of TEXT and |DATA, or PROC.
The size of the |text, static data, and dynamicdata segments of the process and the number of outstanding allocate memory without corresponding free memory operations.
Whether the daemon is processing a refresh request.
Daemon process CPU time, both in user and kernel modes.
The number of page faults and the number of times the process has been swapped out.
The number of nodes that are seen as reachable (up) from the local node and the number of nodes that are seen as not reachable (down).
A list of nodes that are either up or down, whichever list is smaller. The list of nodes that are down includes only the nodes that are configured and have at least one adapter which Topology Services monitors. Nodes are specified in the list using the format:

N1-N2(I1) N3-N4(I2)...

where N1 is the initial node in a range, N2 is the final node in a range, and I1 is the increment. For example, 5-9(2) specifies nodes 5, 7, and 9. If the increment is 1 then the increment is omitted. If the range has only one node, only the one node number is specified.

The following is an example of the output from the lssrc -ls hats command on a node for the c47s system partition:

|Subsystem         Group            PID     Status
| hats             hats             16366   active
|Network Name   Indx  Defd    Mbrs  St   Adapter ID      Group ID
|SPether        [ 0]   17      16    S   9.114.61.199    9.114.61.200
|SPether        [ 0] en0                 0x32ef11dd      0x3334530a
|HB Interval = 1 secs.  Sensitivity = 4 missed beats
|Packets sent    : UDP 4900376 ICMP 2 Errors: 0 No mbuf: 0
|Packets received: UDP 7742835 ICMP 247 Dropped: 0
|NIM's PID: 20454
|SPswitch       [ 1]    16     13    S   9.114.61.143    9.114.61.144
|SPswitch       [ 1] css0                0x32ef1208      0x3338a235
|HB Interval = 1 secs.  Sensitivity = 4 missed beats
|Packets sent    : UDP 4900237 ICMP 2 Errors: 0 No mbuf: 0
|Packets received: UDP 8526975 ICMP 256 Dropped: 0
|NIM's PID: 12332
|  2 locally connected Clients with PIDs:
|haemd( 13202) hagsd( 14978)
|  Configuration Instance = 986880803
|  Default: HB Interval = 1 secs. Sensitivity = 4 missed beats
|  Control Workstation IP address = 9.114.61.125
|  Daemon employs no security
|  Segments pinned: Text Data.
|  Text segment size: 706 KB. Static data segment size: 317 KB.
|  Dynamic data segment size: 2123. Number of outstanding malloc: 354
|  User time 2449 secs. System time 2973 secs.
|  Number of page faults: 140. Process swapped out 0 times.
|  Number of nodes up: 16. Number of nodes down: 1.
|  Nodes down: 13

The output of the lssrc -l command is shown in the language locale that is installed on the SP node where the hats daemon is running, which in this case is US English. Since different nodes on one SP system can have different language locales, it is also possible that the results of the lssrc -l command might not be legible to you. In such a heterogeneous SP system environment, you need to be careful to run commands that operate only on nodes that use the language locale you understand.

|The two networks being monitored in the last example are named |SPether and SPswitch. The SPether network has 17 adapters defined and |16 of them are members of the group. The group is in the stable |state. The addresses of the local adapter and the Group Leader, and of |the interface name of the local adapter are displayed. In the SPswitch |network, there are 16 adapters defined and 13 of them are members in the |group. Both networks have the same tunables which happen to have the |default values. The daemons on the node have two clients: the |Group Services daemon and the Event Management Daemon. Of the 17 nodes |in this configuration, including the control workstation, only node 13 is |detected as being down.

The chapter on Topology Services in the book PSSP: Diagnosis Guide explains how to use the lssrc command to help in diagnosing problems related to the subsystem. It also contains explanation for the error log entries that might be generated by the subsystem.

Using Topology Services with DCE security

In a message-based distributed system such as Topology Services, security is achieved by:

Authenticating the identity of message senders.
Identifying and discarding replayed messages.

Before PSSP 3.2, Topology Services did not use any authentication and message-replay identification mechanism. In PSSP 3.2, you can set a system partition-wide security policy and the Topology Services daemon complies with that policy. If the security policy requires it, the Topology Services daemon will use the DCE security services for daemon-to-daemon mutual authentication. It will also enable message-replay identification. In this case, the system clocks on all the nodes in a system partition must be synchronized within 120 seconds to prevent false message-replay identification. In the context of security, Topology Services is an SP trusted service.

In PSSP 3.2, the system administrator can enable any combination of the following authentication methods for use by all SP trusted services in a partition:

DCE, where authentication is based on the DCE security services and message-replay identification is enabled.
Compatibility (compat), where authentication is performed in a manner that is compatible with a pre-PSSP 3.2 version of the subsystem.

Since Topology Services did not use any authentication method before PSSP 3.2, enabling any combination of authentication methods that includes compat is equivalent to performing no authentication in Topology Services. For example, if the authentication methods are compat and DCE then the Topology Services daemon does not perform authentication. Topology Services enforces mutual authentication only when DCE is the sole authentication method enabled. Since use of DCE authentication by SP trusted services is available only in PSSP 3.2, this means Topology Services enforces mutual authentication only when all nodes in the SP system partition are running PSSP 3.2.

You can use the /bin/lsauthpts command to determine which authentication methods are currently enabled for SP trusted services in the SP system partition.

When DCE is the only authentication method enabled, for authentication purposes the Topology Services subsystem assumes the identity of different service principals at different stages:

ssp/spbgroot is used to authenticate with the SDR daemon so that Topology Services scripts can update information in the SDR during configuration, initialization, and refresh.
rsct/hats is used for daemon-to-daemon mutual authentication during normal operation.

Those service principals and associated key files are configured and initialized during system installation. This is done, for example, by using the smit spauth_config command. (See the book PSSP: Installation and Migration Guide for more information.) The key file associated with the rsct/hats principal is important for the authentication process. Its path name is /spdata/sys1/keyfiles/rsct/syspar_name/hats, where syspar_name is the name of the SP system partition.

Determining the authentication method enabled in a partition

Security has an important effect on the behavior of a Topology Services daemon: if the Topology Services subsystem on the control workstation cannot determine the authentication method to be used, then all Topology Services daemons in that SP system partition will cease their operations and terminate. If this happens, messages in the AIX error log explain the failure and possible solutions.

When the Topology Services startup script on the control workstation determines it is being initialized for the first time after installation, it attempts to determine which authentication methods are enabled in the SP system partition by using the /bin/lsauthpts command. If DCE is the only authentication method enabled, the script places special flags in the machines.lst file to indicate mutual authentication is to be performed. Similar action is taken by the startup script during Topology Services refresh. Special flags are placed in the machines.lst file indicating the authentication method.

Whenever daemons come up on the control workstation and on the nodes, they start mutual authentication only if the flags in the machines.lst file indicate that they should. Otherwise, no authentication is done. See Initializing Topology Services daemon for more on the initialization process.

Changing the authentication method and key

You can change the authentication methods enabled in an SP system partition by using the /bin/chauthpts command. This command automatically invokes refresh operation on each of the SP trusted services. For Topology Services refresh, as already noted, the startup script attempts to determine the new authentication method, and the daemon complies accordingly. Also as already discussed, any combination of authentication methods, other than DCE only, is equivalent to no authentication.

In rare circumstances, primarily when you suspect that the key currently used for Topology Services mutual authentication might have been compromised, you need to change the key. Changing the key involves some manual steps. Perform these steps with extreme caution. Any mishap can adversely affect the normal operation of the entire SP system partition.

To add a key to the key file, do the following:

Run the following command on the control workstation:
```
dcecp -c keytab add rsct/syspar_name/hats \
-member /.../cell_name/rsct/syspar_name/hats \
-registry -random
```
where:
syspar_name is the name of the SP system partition.
cell_name is the name of the DCE cell.
Distribute the key file to all nodes in the SP system partition. Here are some ways to do that:
- You can use the dsh facility to run the command /usr/lpp/ssp/bin/spauthconfig on all the nodes in the SP system partition.
- You can use the pcp command to copy the key file /spdata/sys1/keyfiles/rsct/syspar_name/hats to all the nodes in the SP system partition.
- If the circumstances are appropriate, you can reboot all the nodes in the SP system partition.
Confirm that all the nodes in the SP system partition have received the new key because the key files must be identical across the entire partition for the Topology Services daemons to be able to communicate.
Refresh Topology Services by running the following command on the control workstation:
```
/usr/sbin/rsct/bin/hatsctrl -r
```
The refresh is propagated automatically to all reachable nodes in the partition. Upon receiving the refresh, a Topology Services daemon retrieves the new key and starts the transition from old key to new key. The transition completes within minutes and all Topology Services daemons begin authenticating with the new key after that.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]