Group Services is a distributed subsystem of the IBM RS/6000 Cluster
Technology (RSCT) software that can run on any |IBM pSeries or RS/6000 system. The RSCT software provides a set of high
availability services to the PSSP software. The other services in the
RSCT software are the Event Management and Topology Services distributed
subsystems. These three distributed subsystems operate within a
domain. A domain is a set of |IBM
pSeries or RS/6000 nodes where the RSCT components execute and, exclusively of
other nodes, provide their services. On the SP system, a domain is an
SP system partition. Note that a node might be in more than one RSCT
domain; the control workstation is a member of each system partition and,
therefore, a member of each RSCT domain. When a node is a member of
more than one domain, there is an executing copy of each RSCT component for
each domain.
The function of the Group Services subsystem is to provide other subsystems with a distributed coordination and synchronization service. These other subsystems that depend upon Group Services are called client subsystems. Each client subsystem forms one or more groups by having its processes connect to the Group Services subsystem and use the various Group Services interfaces. A process of a client subsystem is called a GS client. For example, Event Management is a Group Services client subsystem. The Event Manager daemon on each node is a GS client.
A group consists of two pieces of information:
Group Services guarantees that all processes that are joined to a group see the same values for the group information, and that they see all changes to the group information in the same order. In addition, the processes may initiate changes to the group information via protocols that are controlled by Group Services.
A GS client that has joined a group is called a provider. A GS client that wishes only to monitor a group, without being able to initiate changes in the group, is called a subscriber.
Once a GS client has initialized its connection to Group Services, it can join a group and become a provider. All other GS clients that have already joined the group (those that have already become providers) are told as part of a join protocol about the new providers that wish to join. The existing providers can either accept new joiners unconditionally (by establishing a one-phase join protocol) or vote on the protocol (by establishing an n-phase protocol). During a vote, they can choose to approve the protocol and accept the new providers into the group, or reject the protocol and refuse to allow the new providers to join.
Group Services monitors the status of all the processes that are joined to a group. If either the process or the node on which a process is executing fails, Group Services initiates a failure protocol that informs the remaining providers in the group that one or more providers have been lost.
Join and failure protocols are used to modify the membership list of the group. Any provider in the group may also propose protocols to modify the state value of the group. All protocols are either unconditional (one-phase) protocols, which are automatically approved and not voted on, or conditional (n-phase) protocols, which are voted on by the providers.
During each phase of an n-phase protocol, each provider can take application-specific action and must vote to approve, reject, or continue the protocol. The protocol completes when it is either approved (the proposed changes become established in the group), or rejected (the proposed changes are dropped).
For more conceptual information about the Group Services subsystem, see the book PSSP Group Services Programming Guide and Reference.