IBM Books

Administration Guide


Configuring and operating Group Services

The following sections describe how the components of the Group Services subsystem work together to provide group services. Included are discussions of Group Services:

Configuring Group Services

|The RSCT software should have been installed as part of the |installation of the PSSP software with AIX 4.3.3 or as part of |AIX 5L 5.1. The Group Services subsystem is contained in the |...basic.rte and ...basic.sp file |sets.

After the components are installed, the subsystem must be configured for operation. Group Services configuration is performed by the hagsctrl command, which is invoked by the syspar_ctrl command.

The syspar_ctrl command configures all of the system-partition-sensitive subsystems. The person who installs PSSP issues the syspar_ctrl command during installation of the control workstation. The syspar_ctrl command is executed automatically on the nodes when the nodes are installed. The syspar_ctrl command is also executed automatically when system partitions are created or destroyed. For more information on using the syspar_ctrl command, see the books RS/6000 SP: Installation Guide and PSSP: Command and Technical Reference.

The hagsctrl command provides a number of functions for controlling the operation of the Group Services system. You can use it to:

Except for the clean function, hagsctrl affects the Group Services subsystem in the current system partition, that is, the system partition that is specified by the SP_NAME environment variable.

Adding the Subsystem

If the hagsctrl command is running on the control workstation, the first step in the add function is to select a Group Services daemon communications port number and save it in the Syspar_ports SDR class. This port number is then placed in the /etc/services file. If the hagsctrl command is running on a node, the port number is fetched from the SDR and placed in the /etc/services file. Port numbers are selected from the range 10000 through 10100.

The second step is to add the Group Services daemon to the System Resource Controller (SRC) using the mkssys command. The system partition name is an argument to the hagsd program in the SRC subsystem specification.

The third step is to add an entry to the /etc/inittab file so that the Group Services daemon will be started during boot. However, if hagsctrl is running on a workstation with the High Availability Control Workstation (HACWS) component of PSSP, no entry is made in the /etc/inittab file. Instead, the HACWS component manages the starting and stopping of the Group Services daemon.

Note that if the hagsctrl add function terminates with an error, the command can be rerun after the problem is fixed. The command takes into account any steps that already completed successfully.

Starting and stopping the subsystem

The start and stop functions of the hagsctrl command simply run the startsrc and stopsrc commands, respectively. However, hagsctrl automatically specifies the subsystem argument to these SRC commands.

Deleting the subsystem

The delete function of the hagsctrl command removes the subsystem from the SRC, removes the entry from /etc/inittab, and removes the Group Services daemon communications port number from /etc/services. It does not remove anything from the SDR, because the Group Services subsystem may still be configured on other nodes in the operational domain.

Cleaning the subsystem

The clean function of the hagsctrl command performs the same function as the delete function, except in all system partitions. In addition, it removes the Group Services daemon remote client communications port number from the /etc/services file.

The clean function does not remove anything from the SDR. This function is provided to support restoring the system to a known state, where the known state is the (possibly restored copy of the) SDR database.

Tracing the subsystem

The tracing function of the hagsctrl command is provided to supply additional problem determination information when it is requested by the IBM Support Center. Normally, tracing should not be turned on, because it might slightly degrade Group Services subsystem performance and can consume large amounts of disk space in the /var file system.

Initializing Group Services daemon

Normally, the Group Services daemon is started by an entry in the /etc/inittab file, via the startsrc command. If necessary, the Group Services daemon can be started using the hagsctrl command or the startsrc command directly.

During initialization, the Group Services daemon performs the following steps:

  1. It gets the number of the node on which it is running using the /usr/lpp/ssp/install/bin/node_number command. Node 0 is the control workstation.
  2. It tries to connect to the Topology Services subsystem. If the connection cannot be established because the Topology Services subsystem is not running, it is scheduled to be retried every 20 seconds. This continues until the connection to Topology Services is established. Until the connection is established, the Group Services daemon writes an AIX error log entry periodically and no clients may connect to the Group Services subsystem.
  3. It performs actions that are necessary to become a daemon. This includes establishing communications with the SRC subsystem so that it can return status in response to SRC commands.
  4. It establishes the Group Services domain, which is the set of nodes within the SP system partition in which a Group Services daemon is executing.

    At this point, one of the GS daemons establishes itself as the GS nameserver. For details, see Establishing the GS nameserver.

    Until the domain is established, no GS client requests to join or subscribe to groups are processed.

  5. It enters the main control loop.

    In this loop, the Group Services daemon waits for requests from GS clients, messages from other Group Services daemons, messages from the Topology Services subsystem, and requests from the SRC for status.

Establishing the GS nameserver

The Group Services subsystem must be able to keep track of the groups that its clients want to form. To do this, it establishes a GS nameserver within each domain (a domain is the set of running nodes within each SP system partition). The GS nameserver is responsible for keeping track of all client groups that are created in that domain.

To ensure that only one node becomes a GS nameserver, Group Services uses the following protocol:

  1. When each daemon is connected to the Topology Services subsystem, it waits for Topology Services to tell it which nodes are currently running in this system partition.
  2. Based on the input from Topology Services, each daemon finds the lowest-numbered running node in the domain. The daemon compares its own node number to the lowest-numbered node and performs one of the following:
  3. Once all running nodes have nominated the GS nameserver-to-be and a coronation timer (about 20 seconds) has expired, the nominee sends an insert message to the nodes. All nodes must acknowledge this message. When they do, the nominee becomes the established GS nameserver, and it sends a commit message to all of the nodes.
  4. At this point, the Group Services domain is established, and requests by clients to join or subscribe to groups are processed.

Note that this description is in effect when all nodes are being booted simultaneously, such as at initial system power-on. It is often the case, however, that a Group Services daemon is already running on at least one node (for example, the control workstation) and is already established as the domain's GS nameserver. In that case, the GS nameserver waits only for Topology Services to identify the newly running nodes. The GS nameserver will then send the newly running nodes proclaim messages that direct the nodes to nominate it as nameserver. Once those nodes then nominate the GS nameserver, the GS nameserver simply executes one or more insert protocols to insert the newly-running nodes into the domain.

Group Services initialization errors

The Group Services subsystem creates AIX error log entries to indicate severe internal problems. For most of these, the best response is to contact the IBM Support Center.

However, if you get a message that there has been no heartbeat connection for some time, it could mean that the Topology Services subsystem is not running.

To check the status of the Topology Services subsystem, issue the lssrc -g hats command. If the response indicates that the Topology Services subsystem is inoperative, try to restart it using the hatsctrl -s command. If you are unable to restart it, call the IBM Support Center.

Group Services daemon operation

Normal operation of the Group Services subsystem requires no administrative intervention. The subsystem normally recovers from temporary failures, such as node failures or failures of Group Services daemons, automatically. However, there are some operational characteristics that might be of interest to administrators.

Due to AIX per-process file descriptor limits, the Group Services subsystem can support a maximum of approximately 2000 GS clients on each node.

The maximum node number that can be contained within a domain is 2047, although Group Services can support node numbers in the range 0 to 65535.

The maximum number of groups to which a GS client can subscribe or that a GS client can join is equivalent to the largest value containable in a signed integer variable.

The maximum number of groups allowed within a domain is 65,535.

These limits are the theoretical maximum limits. In practice, the amount of memory available to the Group Services daemon and its clients will reduce the limits to smaller values.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]