IBM Books

Administration Guide


Configuring and operating Topology Services

The following sections describe how the components of the Topology Services subsystem work together to provide topology services. Included are discussions of the following Topology Services tasks:

Configuring Topology Services

|The RSCT software should have been installed as part of the |installation of the PSSP software with AIX 4.3.3 or as part of |AIX 5L 5.1. The Topology Services subsystem is contained in the |...basic.rte and ...basic.sp file |sets.

The syspar_ctrl command configures all of the system partition-sensitive subsystems. The person who installs PSSP issues the syspar_ctrl command during installation of the control workstation. The syspar_ctrl command is executed automatically on the nodes when the nodes are installed. The syspar_ctrl command is also executed automatically when system partitions are created or destroyed. For more information on using this command, see the books PSSP: Installation and Migration Guide and PSSP: Command and Technical Reference.

The hatsctrl command provides a number of functions for controlling the operation of the Topology Services system. You can use it to:

Except for the clean function, hatsctrl affects the Topology Services subsystem in the current system partition, that is, the system partition that is specified by the SP_NAME environment variable.

The SDR stores node and data information, as well as some tunable data. The class that holds this data is called the TS_Config class. This class holds information that controls some of the operation of Topology Services. The following is a list of the attributes in this class and a brief description of each:

Frequency
Controls how often Topology Services sends a heartbeat to its neighbors. The value is interpreted as the number of seconds between heartbeats. The minimum and default value is 1. On a system with a high amount of paging activity, this number should be kept as small as possible.

Sensitivity
Controls the number of missed heartbeat messages that will cause a Death in Family message to be sent to the Group Leader. Heartbeats are not considered missing until it has been twice the interval indicated by the Frequency attribute. |The default sensitivity value is 4.

Run_FixedPri
Run the daemon with a fixed priority. Since Topology Services is a real-time application, there is a need to avoid scheduling conflicts. The number 1 indicates that the daemon is running with fixed priority, 0 indicates that it is not.

FixedPriValue
This is the actual fixed priority level that is used. The daemon will accept values greater than or equal to 10. The default is 38.

Log_Length
This is the approximate number of lines that a log file can hold before it wraps. |The default is 5,000.

Pinning
This controls the memory Pinning strategy. Text causes the daemon to attempt to pin Text pages, Data attempts to pin Data Pages, Process attempts to pin all pages, and None causes no pages to be pinned. The default is Text.

On RS/6000 systems with heavy or unusual load characteristics, it might be necessary to adjust the Frequency and Sensitivity settings. See Operating Topology Services daemon for more information.

Adding the subsystem

If the hatsctrl command is running on the control workstation, the first step in the add function is to select a Topology Services daemon communications port number and save it in the Syspar_ports SDR class. This port number is then placed in the /etc/services file. If the hatsctrl command is running on a node, the port number is fetched from the SDR and placed in the /etc/services file. Port number are selected from the range 10000 through 10100.

The second step is to add the Topology Services daemon to the System Resource Controller (SRC) using the mkssys command. On the control workstation, the IP address of the system partition is an argument to the hats command in the SRC subsystem specification.

The third step is to add an entry in the /etc/inittab file so that the Topology Services daemon will be started during boot. However, if hatsctrl is running on a High Availability Control Workstation (HACWS), an entry is not made in the /etc/inittab file. Instead, HACWS manages the starting and stopping of the Topology Services daemon.

Note that if the hatsctrl add function terminates with an error, you can rerun the command after fixing the problem. The command takes into account any steps that already completed successfully.

Starting and stopping the subsystem

The start and stop functions of the hatsctrl command run the startsrc and stopsrc commands, respectively. However, hatsctrl automatically specifies the subsystem argument to these SRC commands.

Deleting the subsystem

The delete function of the hatsctrl command removes the subsystem from the SRC, removes the entry from /etc/inittab, and removes the Topology Services daemon communications port number from /etc/services. It does not remove anything from the SDR, because the Topology Services subsystem might still be configured on other nodes in the system partition.

Cleaning the subsystem

The clean function of the hatsctrl command performs the same function as the delete function, except in all system partitions.

The clean function does not remove anything from the SDR. This function is provided to support restoring the system to a known state, where the known state is the (possibly restored copy of the) SDR database.

Tracing the subsystem

The tracing function of the hatsctrl command is provided to supply additional problem determination information when it is requested by the IBM Support Center. Normally, you should not turn tracing on because it might slightly degrade Topology Services subsystem performance and can consume large amounts of disk space in the /var file system.

Initializing Topology Services daemon

Normally, the Topology Services daemon is started by an entry in the /etc/inittab file using the startsrc command. If necessary, you can start the Topology Services daemon using the hatsctrl command or the startsrc command directly. The first part of initialization is done by the startup script, hats. It starts the hatsd daemon, which completes the initialization steps.

Understanding the initialization process

During this initialization, the startup script does the following:

  1. Determine the number of the node by running the /usr/lpp/ssp/install/bin/node_number command. Node 0 is the control workstation.
  2. |Start the NIMs.
  3. Obtain the name of the system partition from the Syspar SDR class. (Remember, one instance of the Topology Services daemon runs on the control workstation for each system partition to which the Topology Services subsystem was added.)
  4. If running on the control workstation, build a condensed version of the node and adapter configuration called the Machines List file. The name of the Machines List file is machines.lst and can be found in the Topology Services current working directory /var/ha/run/hats.syspar_name, where syspar_name is the name of the system partition.

    The Machines List file also has tunable information from the TS_Config SDR class. Then compare this file with the one stored in the SDR. If they are different, update the one in the SDR to reflect any changes.

  5. If not running on the control workstation, retrieve the machines.lst file from the SDR.
  6. Perform file maintenance in the log directory and current working directory to remove the oldest log and rename any core files that might have been generated.
  7. Start the Topology Services hatsd daemon.

The daemon then continues the initialization with the following steps.

  1. Read the current Machines List file and initialize internal data structures.
  2. Initialize daemon-to-daemon communication, as well as client communication.
  3. For each local adapter defined, form a membership consisting of only the local adapter.

The daemon is now in its initialized state and ready to communicate with Topology Services daemons on other nodes. The intent is to expand each singleton membership group formed during initialization to contain as many members as possible. Each adapter has an offset associated with it. Only other adapter membership groups with the same offset can join together to form a larger membership group. Eventually, as long as all the adapters in a particular network can communicate with each other, there will be a single group to which all adapters belong.

Merging all adapters into a single group

Initially the subsystem starts out as N singleton groups, one for each node. Each of those daemons is a Group Leader of those singleton groups and knows which other adapters could join the group by the configuration information. The next step is to begin proclaiming to subordinate nodes.

The proclaim logic tries to find members as efficiently as possible. For the first 3 proclaim cycles, daemons proclaim to only their own subnet, and if the subnet is broadcast-capable, that message is broadcast. The result of this is that given the previous assumption that all daemons started out as singletons, this would evolve into M groups, where M is the number of subnets that span this heartbeat ring. On the fourth proclaim cycle, those M Group Leaders send proclaims to adapters that are outside of their local subnet. This will cause a merging of groups into larger and larger groups until they have coalesced into a single group.

From the time the groups were formed as singletons until they reach a stabilization point, the groups are considered unstable. The stabilization point is reached when a heartbeat ring has no group changes for the interval of 10 times the heartbeat send interval. Up to that point, the proclaim continues on a 4 cycle operation, where 3 cycles only proclaim to the local subnets, and one cycle proclaims to adapters not contained on the local subnet. After the heartbeat ring has reached stability, proclaim messages go out to all adapters not currently in the group regardless of the subnet to which they belong. Adapter groups that are unstable are not used when computing the node connectivity graph.

Operating Topology Services daemon

Normal operation of the Topology Services subsystem does not require administrative intervention. The subsystem is designed to recover from temporary failures, such as node failures or failures of individual Topology Services daemons. Topology Services also provides indications of higher level system failures. However, there are some operational characteristics of interest to system administrators and after adding or removing nodes or adapters, you might need to refresh the subsystem.

Defaults and limitations

The maximum node number allowed is 2047. The maximum number of networks it can monitor is 16. The default number of networks monitored is 2, the SP Ethernet and the SP switch.

Topology Services is meant to be sensitive to network response and this sensitivity is tunable. However, other conditions can degrade the ability of Topology Services to accurately report on adapter or node membership. One such condition is the failure of AIX to schedule the daemon process in a timely manner. This can cause daemons to be late in sending their heartbeats by a significant amount. This can happen because an interrupt rate is too high, the rate of paging activity is too high, or there are other problems. If the daemon is prevented from running for enough time, the node might be considered to be down by other peer daemons. The node down indication, when propagated to subsystems like Virtual Shared Disks and General Parallel File System, will cause those subsystems to perform, in this case, undesirable recovery procedures and take over resources and roles of the node. Whenever these conditions exist, analyze the problem carefully to fully understand it.

Since Topology Services is a real-time process, do not intentionally subvert its use of the CPU because you can cause false indications.

Prior to AIX 4.2.1, Topology Services required you to set the network option nonlocsrcroute to 1 to enable IP source routing (the default is 0). AIX 4.2.1 and later releases now include network options for IP source routing as follows:

Topology Services sets all four of these options to 1 so that the reliable message, feature which utilizes IP source routing, will continue to work. Disabling any of these network options can prevent the reliable message feature from working properly.

Tuning the Topology Services subsystem

The default settings for the frequency and sensitivity tunable attributes discussed in Configuring Topology Services are overly aggressive for SP system partitions that have more than 128 nodes or heavy load conditions. Using the default settings will result in false failure indications. Decide which settings are suitable for your system by considering the following:

These tunable attributes are typically set after the PSSP software is installed and configured on the control workstation and before installing it on the nodes. After PSSP has been installed and configured on the nodes, you must refresh the subsystem after making any tuning adjustments.

You can adjust the tunable attributes by using the hatstune command. For example, to change the frequency attribute to the value 2 and then refresh the Topology Services subsystem, use the command:

hatstune -f 2 -r

For more information about the hatstune command, see the book PSSP: Command and Technical Reference.

For more SP tuning information see the information at the Web address http://www.rs6000.ibm.com/support/sp/.

Refreshing the Topology Services daemon

When your system configuration is changed (such as by adding or removing nodes, adapters, or frames), the Topology Services subsystem needs to be refreshed before it can recognize the new configuration.

In normal circumstances the procedure for changing your system configuration, as described in the book PSSP: Installation and Migration Guide, would have been followed. One of the steps is to run the syspar_ctrl -r -G command from the control workstation. Running that command does cause Topology Services and other partition-sensitive subsystems to refresh themselves. As discussed in Initializing Topology Services daemon, the Topology Services startup script rebuilds the configuration and verifies its validity. The machines.lst configuration file is generated and the SDR copy of the file is updated to allow the nodes to retrieve the new copy. The daemon will be refreshed and then continue to operate using the new configuration. In unusual circumstances, where for some reason that step was not done or was not successful, you can run the syspar_ctrl -r -G command so that Topology Services and other subsystems can recognize the changes.

To refresh only the Topology Services subsystem, run the hatsctrl -r command from the control workstation. It will cause all the reachable nodes in the SP system partition to be refreshed.

Note that if there are nodes in the partition that are unreachable with Topology Services active, they will not be refreshed. Also, if the connectivity problem is resolved such that Topology Services on that node is not restarted, the node refreshes itself to remove the old configuration. Otherwise, it will not acknowledge nodes or adapters that are part of the configuration, but not in the old copy of the configuration.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]