IBM Books

Group Services Programming Guide and Reference


Overview of Group Services concepts

An application may consist of multiple processes that run on multiple nodes of an SP system. The set of nodes that is defined to Group Services is called a Group Services domain. A Group Services domain is the set of nodes that makes up a system partition. If there is only one system partition in the SP system, the Group Services domain consists of all of the nodes in the SP system. From the standpoint of the Group Services subsystem, the system partition in which it is running is "the system."

Group membership

Each group that is maintained by the Group Services subsystem is uniquely named. Any authorized process in a Group Services domain may create a new group. Any authorized process in the domain may ask to become a member of a group. This request is called a join request or joining the group. If the join request is successful, the process becomes a provider for the group.

Any authorized process in the domain can ask to monitor the group. This request is called a 'subscribe request' or 'subscribing to the group'. If the subscribe request is successful, the process becomes a subscriber for the group.

The term GS client refers to both providers and subscribers. A process that has registered to use the Group Services subsystem, but has not yet become a provider or subscriber, is also referred to as a GS client.

The domain control environment variable that is described in Group Services domains and in ha_gs_init subroutine must be set and exported in a GS client's environment to the name of the system partition in which the GS client and the Group Services daemon are running. On a node, this is the system partition to which the node belongs. On the control workstation, there is a Group Services daemon for each system partition. The value of the domain control environment variable identifies the system partition and the particular Group Services daemon to which the GS client will connect.

A group may have members on multiple nodes in the domain, and each node may have multiple members.

For each group, the Group Services subsystem maintains consistent group state data. A group's state consists of the membership list and the group state value:

Membership list

A group's membership list contains the list of providers in the group. Each provider is identified by a provider identifier. The Group Services subsystem maintains the list in the following order: the oldest provider (that is, the first provider to join the group) is at the head of the list, and the youngest is at the end. All of the group's providers and subscribers see the same ordering of the list.

The membership list is modified when providers join or leave the group. In addition to voluntarily leaving a group, a provider may leave involuntarily, due to the failure of the provider process itself or the failure of the node on which it is running. An involuntary leave is called a failure leave and is initiated by the Group Services subsystem. Finally, a provider may be expelled from the group at the request of a provider.

Group state value

The state value of a group is defined by the application that is using the GSAPI and is controlled by the providers in a way that is meaningful to the application. It is a byte field whose length may vary between 1 and 256 bytes. The Group Services subsystem does not interpret the state value.

The group state is available to surviving providers despite node, communication adapter, and network failures. However, the group state does not survive the dissolution of a group. If all of the providers fail, the group state is lost.

Changing membership and state value

Any provider in the group can ask Group Services to modify the state value and can also specify the level of consistency that is to be associated with the modification. Specifically, the provider can subject the proposed change to a voting protocol, or request that the change be approved without putting it to a vote. The voting protocol unifies the multi-phase commit and barrier synchronization abstractions.

A GS client that asks to become a provider of a group must have the correct authorization and must be admitted to the group by the current providers of the group. This is accomplished by the same voting protocol as the one used to mediate state changes.

A provider may leave a group in a number of ways. It may:

All changes to a group, in state value or in membership, appear to the providers and subscribers of the group to be logically serialized. This means that one change completes before another begins. The Group Services subsystem processes all outstanding membership changes before it accepts any proposals to change the state value. If one or more providers fail during an ongoing protocol invocation, Group Services runs a failure leave protocol to remove the failed providers from the group when the protocol completes.

Subscribers are notified of the approved results of a state value change or a provider membership change. However, they do not participate in approving a state value change, admitting a new provider, or removing a leaving provider. Thus, a subscriber cannot affect the group state. In addition, subscribers do not appear in any group membership lists; they are known only to the Group Services subsystem. The providers of a group and the other subscribers of the group are unaware of any of the subscribers to the group.

Group Services domains

The Group Services subsystem provides services within the boundaries of what it calls a domain. A Group Services domain includes an SP system control workstation and the set of nodes within an SP system partition. This means that an SP system control workstation can be within multiple Group Services domains. An application wishing to become a GS client on the control workstation or on an SP system node must set one or more environment variables to ensure that it is able to connect to the proper Group Services domain.

Group Services PSSP domain

An SP system is divided into one or more Group Services domains, based on the number of SP system partitions defined. Each of these domains is referred to as a "Group Services PSSP domain". To connect to the Group Services PSSP domain, a GS client must ensure that the environment variable HA_DOMAIN_NAME is set to the name of the SP system partition in which the GS client is running prior to the GS client invoking the ha_gs_init subroutine. This must be done on a node (which can only be in one Group Services PSSP domain) as well as on the control workstation (which will be in multiple Group Services PSSP domains, if there are multiple defined SP system partitions).

Refer to ha_gs_init subroutine for more information about connecting to the Group Services subsystem and the meaning of error codes that are returned.

Group Services HACMP/ES domain

In addition to the Group Services PSSP domain, if HACMP/ES is installed on a node, that node will also be part of a Group Services HACMP/ES domain, which is separate from that node's Group Services PSSP domain. The Group Services HACMP/ES domain consists of all nodes that are part of the HACMP/ES cluster. This may include SP system nodes, non-SP AIX workstations, and SP system nodes from a physically separate SP system.

Connecting to domains

A GS client may connect to only one domain, regardless of whether a node (or the control workstation) is part of multiple Group Services domains. All services provided by the Group Services subsystem are within a single domain only. A GS client gets information only about the nodes and groups that are in the Group Services domain to which it is connected.

If a GS client wishes to connect to the Group Services HACMP/ES domain on its node, it must set the HA_DOMAIN_NAME and HA_GS_SUBSYS environment variables before the GS client invokes the ha_gs_init subroutine. HA_DOMAIN_NAME must be set to the name of the HACMP/ES cluster. HA_GS_SUBSYS must be set to grpsvcs.

All of the Group Services interfaces and semantics work identically and are supported within both Group Services PSSP domains and Group Services HACMP/ES domains.

Group creation

Typically, an application defines one or more group names that are known to all of the processes that are part of the application.

During initialization, each process in the application asks to join the group as it starts up. The first join request creates the group and defines its attributes. The subsequent join requests result in new providers joining the group. Each subsequent join request also includes group attribute information, which must match the group's established attributes. Otherwise, the join request is rejected.

The attributes of a group are:

All of the topics related to these attributes are discussed in greater detail later in this chapter.

Mutability of group attributes

Certain group attributes are mutable This means that they can be dynamically changed by the group's providers, using the ha_gs_change_attributes asynchronous interface. See ha_gs_change_attributes subroutine.

The following group attribute fields are mutable (can be dynamically changed):

gs_client_version
Client-specified version number

gs_batch_control
Batch control setting for membership (join and failure protocols)

gs_num_phases
Phase control setting for membership (join and failure protocols)

gs_source_reflection_num_phases
Phase control setting for source-state reflection protocols

gs_group_default_vote

Group's base default vote for all N-phase protocols

gs_merge_control
Behavior of group in a merge situation

gs_time_limit
Voting time limit for N-phase join and failure protocols

gs_source_reflection_time_limit
Voting time limit for N-phase source-state reflection protocols

The following group attribute fields are not mutable (cannot be dynamically changed). To change these group attribute fields, the providers must all leave the group, then rejoin the group with the desired new attribute field.

gs_group_name
The name of the provider's group

gs_source_group_name
The name of the source group for the provider's group (see Source-target group relationships).

Responsiveness checks

Responsiveness checks allow the Group Services subsystem to periodically inspect the state of the GS client when there are no ongoing group activities. Group Services always monitors the GS client for exit. A responsiveness check allows Group Services to query the actual responsiveness of the GS client. When the group is active, that is, when a protocol is running, Group Services can determine the responsiveness of the GS client by the client's response to the running protocol. Accordingly, Group Services suspends responsiveness checking during ongoing protocols.

When the GS client initializes itself with Group Services, it must specify information about the protocol, if any, to be used to perform responsiveness checks for the GS client. It must also specify the path name of a callback routine to invoke if the GS client fails its responsiveness check

Responsiveness protocols

The GS client can specify one of the following responsiveness protocols.

No protocol
In this case, Group Services acts only if the GS client process exits.

A ping-like protocol
In this case, Group Services periodically sends a responsiveness notification to the GS client and expects a response. The notification calls the responsiveness callback routine specified by the GS client. Group Services expects the responsiveness callback routine to return a code that indicates whether the GS client is operational or has detected an internal problem that prevents its correct operation.

This protocol is available to both single-threaded and multi-threaded GS clients.

A counter-checking protocol
In this case, Group Services periodically checks an arithmetic counter that is specified by a multi-threaded client. If the counter is changing, Group Services assumes that the GS client is responsive. If the counter does not change within a time limit specified by the GS client, Group Services calls the responsiveness callback routine before assuming that the GS client is nonresponsive.

This protocol works properly only for multi-threaded GS clients, and a thread must be dedicated to performing Group Services operations. The dedicated thread should handle responsiveness checking as well as the receipt of notifications from Group Services.

Responsiveness callback routines

The responsiveness callback routine is intended to provide the Group Services subsystem with a means of quiescing a provider that fails a responsiveness check. The routine should perform any cleanup actions that are required by the GS client. It also allows the GS client to perform periodic validity checks on its own operation or its environment.

Handling nonresponsive providers

Group Services performs responsiveness checks once the GS client has initialized. If a responsiveness check fails and the GS client is a provider, Group Services places it in a list of nonresponsive providers. Then, Group Services sends an announcement notification that contains the list to all of the group's providers. Group Services takes no other direct action.

On receipt of the announcement notification, a provider could initiate an expel protocol to remove the nonresponsive providers from the group, if appropriate. For more information, see The expel protocol. Group Services tries to contact nonresponsive providers. If a previously nonresponsive provider responds, Group Services places it in a list of "rejuvenated" providers. Then, Group Services sends an announcement notification that contains the list to all of the group's providers.

Note that because Group Services continues to perform responsiveness checks for nonresponsive providers, the group can determine how quickly it should respond to announcement notifications. A group can expel a nonresponsive provider after receiving the first announcement notification, or it can wait to see if the provider becomes responsive again.

Protocols and voting

The Group Services subsystem uses a variety of protocols. A protocol is the mechanism that coordinates membership and state value changes within a group.

The GSAPI provides a flexible n-phase voting protocol to mediate provider joins and departures, and state value changes. Different applications have different synchronization and coordination requirements for membership and state changes. Programmers can customize their applications to meet these requirements by choosing the appropriate number of voting phases, as follows:

The votes of the providers cause a proposed change to be approved or rejected. If it is approved, Group Services sends a final notification that describes the change to all of the group's providers and subscribers. If it is rejected, Group Services sends the final notification only to the providers. The group state reverts to its value at the beginning of the protocol. When a protocol is proposed, the proposal indicates whether it is one-phase or n-phase.

Protocol categories

Protocols are grouped into four categories:

Membership change protocols

These protocols are used when a provider joins or leaves a group. If approved, the membership of the group changes. In addition, the group state value may also be changed during all phases of n-phase membership change protocols, as discussed in Submitting changes with voting responses.

Membership change protocols include:

The state value change protocol

A provider uses this protocol to change the state value of the group, but leave the membership unchanged. n-phase state value change protocols may also change the group state value during the voting phases. State value change protocols do not affect the group membership.

The provider-broadcast message protocol

If this protocol is one-phase, it allows a provider to broadcast a message to all other providers in the group, with no voting.

If this protocol is n-phase, it allows a provider to broadcast the message to the other providers in the group, and also initiates the standard voting phases. The group state value may be changed during each voting phase. Provider-broadcast message protocols do not affect the group membership.

The change-attributes protocol

If this protocol is one-phase, it allows a provider to change a group's attributes with no voting.

If this protocol is n-phase, it allows a provider to propose to change the group attributes and initiates the standard voting phases. Change-attributes protocols do not affect the group membership.

Voting on an n-phase protocol

When a provider receives an n-phase protocol proposal notification, it is asked to vote. At the start of every phase of voting, the Group Services subsystem informs all of the providers in the group of the proposed state value change, and the current phase number. Each provider then votes either to approve the proposed change (APPROVE), to request another round of voting (CONTINUE), or to reject it and end the protocol (REJECT). Voting can occupy any number of phases, based on the wishes of the providers.

For each phase, each provider must provide one of the following vote values:

APPROVE
The provider approves the proposed change. If all providers vote to APPROVE the proposal in the same voting phase, the change is approved and the group state is changed accordingly. If the vote tally indicates that the protocol should continue, the provider must continue to vote in each subsequent phase.

CONTINUE
The provider conditionally approves the proposed change, but wants to continue to another phase of voting. If at least one provider votes to CONTINUE, the protocol continues to another voting phase.

REJECT
the provider rejects the proposed change. Like CONTINUE, only one provider needs to vote to REJECT to reject the proposed change. A REJECT vote on a failure leave protocol requires special consideration, as described in Rejection of the Group Services subsystem-initiated protocols.

Voting can have one of the following outcomes:

Normally, providers vote explicitly by responding to the Group Services subsystem by calling the ha_gs_vote subroutine. However, if a provider fails before it submits a vote, or if it fails to vote within the group's voting time limit, the Group Services subsystem enters a default vote on behalf of that provider. The default vote is also called an implicit vote.

By default, the default vote is REJECT. However, the provider can set the default vote to APPROVE when it joins the group. The Group Services subsystem does not permit an implicit vote to CONTINUE, because it could lead to a non-terminating protocol.

After the proposal is approved or rejected, the Group Services subsystem notifies all of the providers of the outcome. The providers do not vote in this last phase of the protocol. Thus, unlike the other phases, in the last phase no information flows from the providers to the Group Services subsystem. Finally, and only if the proposal was approved, the Group Services subsystem informs the subscribers of the outcome of the vote.

In certain cases, an approved proposal also generates notifications related to source-target handling. For more information, see Source-target group relationships.

Specifying the provider's default vote value

By default, the Group Services subsystem assigns REJECT as the default value for each group. As part of its request to join a group, a provider may specify a default vote value as part of the group attributes. It may specify either REJECT or APPROVE. All providers must specify the same value. During each voting phase, any provider may specify a new default vote to be used for the group if any provider fails during this voting phase. The provider may specify either REJECT or APPROVE.

If no new default vote is specified, the current default vote carries over to the next phase. At the end of the protocol, the default vote reverts to the original value that was specified during the join of the providers.

If more than one provider specifies an updated default vote value with its vote, the Group Services subsystem arbitrarily chooses one of them. If different values are specified by different providers, it is not possible to predict which one the Group Services subsystem will choose.

As discussed in Submitting changes with voting responses, to ensure consistency, the group should ensure one of the following:

Approving and rejecting protocols

Every protocol must be either approved or rejected as described, based on the desires of the providers in the group. A protocol is approved when the providers vote to approve it. A protocol is rejected when the protocol is voted down or is ended for some reason.

In summary, a provider or the Group Services subsystem proposes the protocol. If necessary, voting proceeds for the desired number of phases.

If the protocol is approved, the updated information is broadcast to all providers and subscribers, as well as any appropriate target-groups. If the protocol is rejected, a notice of the rejection is broadcast to all providers. Subscribers receive no notification of a rejected protocol.

A one-phase protocol proposal is automatically approved. It cannot be rejected.

If a voluntary leave is rejected, whether by an explicit or implicit vote to REJECT, the protocol ends.

Proposing, voting, and phases for protocols

A protocol starts with a proposal. Either a provider or the Group Services subsystem itself can initiate the proposal.

Every protocol takes place in some number of phases. A one-phase protocol is an atomic multicast to the group members. An n-phase protocol is a mechanism that allows barrier synchronization. All providers in the group involved in the protocol proposal must arrive at the barrier (that is, submit a vote) before the protocol can proceed to the next phase. This guarantees that the group remains synchronized during the protocol. A provider's arrival at a barrier is signalled by its submission of a vote to approve, continue, or reject the proposal.

Each protocol proposal indicates whether it is a one-phase or an n-phase protocol. A one-phase protocol is a nonvoting protocol and is completed in a single phase, as described below.

A protocol that requires one or more voting phases is defined as an n-phase protocol. The exact number of phases is not defined in advance. Instead, the providers determine by their votes the exact number of voting phases.

The following sections describe the two types of protocols in more detail.

One-phase protocols

A one-phase protocol is a notification that the change proposed by the protocol is automatically approved.

For a membership change proposal, all providers and subscribers are notified of the updated membership. This is the join of a new provider or the leave of an old provider. The list of providers that are notified includes the providers that just joined, but does not include any providers that just left. Joins and leaves are not batched together in any one membership change proposal.

For a state value change proposal, all providers and subscribers are notified of the updated state.

For a provider-broadcast message proposal, all providers are notified of the message.

If a provider fails during the protocol, all remaining members receive the protocol notification. The Group Services subsystem immediately proposes a membership change protocol to handle the failed provider.

All providers and subscribers see a series of one-phase protocol notifications in the same order. However, any individual recipient may see a second or subsequent notification before all recipients have seen the first.

N-phase protocols

An n-phase protocol establishes a series of barrier-synchronization voting phases for the providers in the group. Any protocol may be proposed as an n-phase protocol

The proposal indicates that the protocol requires one or more voting phases, but it does not specify the exact number of phases to use. The voting results determine the actual number of voting phases.

The provider that proposes the protocol may specify a time limit for each voting phase. Each provider must register its vote within the given time limit. If a provider fails to register its vote in time, the Group Services subsystem does the following:

The Group Services subsystem starts the protocol by broadcasting the proposal to all providers, which starts the first phase, and ends each phase by tallying the votes. In response to the initial notification, each provider must vote. Each vote response contains:

If at least one provider votes to REJECT the proposal, the Group Services subsystem broadcasts to all providers a notification that the proposal was rejected. If no provider votes to reject the proposal, but at least one provider votes to CONTINUE the voting, Group Services broadcasts a notification that another vote is expected on the proposal. Once again, each provider must respond by voting.

Once all providers vote in the same phase to approve the proposal, the Group Services subsystem broadcasts to all providers and subscribers a notification of the approved change. This final broadcast is equivalent to the one-phase notification.

Failure of a provider during any phase of voting is handled by using the group's default vote for the provider. The Group Services subsystem automatically includes the default vote (REJECT or APPROVE) in the vote tally. Once the protocol completes (that is, is either approved or rejected), the Group Services subsystem immediately proposes a membership change to handle the failed provider.

The final notification of the protocol's rejection or approval also indicates whether any default votes were used during the protocol.

The voting phase allows each provider to take any action desired, such as running scripts, issuing commands to manipulate resources, or displaying graphics on the screen. The provider then submits its vote. If a provider fails during a voting phase, the Group Services subsystem enters the default vote into the tally on behalf of the failed provider.

Once each provider has submitted its vote, the Group Services subsystem tallies the votes. If all of the providers voted to APPROVE the protocol in the same voting phase, the voting ends and the proposal is approved. During the protocol, the providers determine the number of voting phases that are used by voting to CONTINUE the protocol. This mechanism allows the providers to adapt to unexpected occurrences during each protocol, rather than having to know in advance the exact number of phases that will be required.

Initiating the protocols

Either a provider or the Group Services subsystem itself initiates proposals.

The Group Services subsystem proposes protocols to handle a join, a failure leave or a cast-out of a provider, or a source-group reflection protocol when a source-group state value change needs to be reflected to its target-groups.

It is up to the providers to initiate proposals to voluntarily leave a group, to expel a provider, to change the state value of a group, or to initiate a provider-broadcast message.

Please note carefully the difference here between initiating the protocol proposal and notifying the providers that a protocol has been proposed. The Group Services subsystem always issues the notification. However, the Group Services subsystem actually initiates a proposal only for the cases listed above.

For a provider to initiate a protocol proposal, it is simply a matter of calling the proper GSAPI subroutine. The Group Services subsystem notifies the other providers in the group that a proposal has been made and proceeds as described in Proposing, voting, and phases for protocols, based on the number of phases and the nature of the proposal.

Group Services subsystem-initiated protocols

The protocols that the Group Services subsystem initiates are join, failure leave, and source-target proposal protocols. Once it initiates the protocols, the Group Services subsystem notifies the providers, and the protocols proceed in a manner quite similar to other proposals.

The protocols that the Group Services subsystem initiates cover the following situations:

The number of phases for these protocols is determined as follows:

Time limits for voting phases are determined as follows:

The notification procedure varies slightly, depending on the proposal that is made, as follows:

Membership changes may be batched, which means that multiple providers can be handled in a single join or leave protocol.

Once the Group Services subsystem has initiated the protocol, the providers invoke it in the usual manner based on the number of phases. One-phase protocols are a single notification. n-phase protocols proceed to the first voting phase.

Once voting has completed, the providers are notified of the result.

Approval of the Group Services subsystem-initiated protocols

In all cases, if the protocol is approved, Group Services performs the following actions:

Rejection of the Group Services subsystem-initiated protocols

If a join is rejected for any reason, the rejected GS clients receive a notification that their application to join the group has been rejected. The existing providers also receive the notification. The subscribers receive no notification.

The failure leave and cast-out proposals require some special handling for rejecting the protocols, because the failing providers must be removed in any case.

If any provider explicitly votes to REJECT a failure leave or cast-out proposal, the protocol stops. The membership list is updated to show the removal of the targeted provider. The providers and subscribers receive this updated list. The state value reverts to its value at the beginning of the protocol.

If any provider implicitly votes to REJECT a failure leave or cast-out proposal, the protocol stops. If failure leave requests are allowed to be batched, the Group Services subsystem immediately proposes another failure leave protocol, adding the newly-failed provider into the list of leaving providers.

If failure leave requests are not allowed to be batched, the Group Services subsystem handles this as an explicit vote to REJECT. To handle the newly-failed provider, Group Services initiates a failure leave protocol.

A rejection of a source-state reflection protocol simply ends the protocol, and the providers are notified that the protocol is rejected.

Provider-initiated protocols

Provider-initiated proposals include proposals made by a provider to perform the following:

A provider calls a GSAPI subroutine to propose one of these protocols, specifying either a one-phase or an n-phase protocol. If an n-phase protocol is specified for one of these protocols, the provider must also specify a voting phase time limit as well. However, if no time limit is desired, a time limit of 0 may be specified.

The Group Services subsystem checks the proposal for errors. If the proposal is syntactically invalid, the provider receives a synchronous syntax error code. If the group currently has a running protocol, the provider receives a synchronous error code that indicates a collision between competing protocols. (Only one protocol may run at a time.)

If the synchronous checks pass, the Group Services subsystem tentatively accepts the proposal and the provider receives a synchronous successful return code. However, if collision errors are detected asynchronously (because other providers or the Group Services subsystem itself submits a proposal at the same time), the Group Services subsystem returns an error code asynchronously.

If multiple providers submit proposals at the same time, only one proposal is accepted by the Group Services subsystem. The other proposals are returned to the providers that made them, asynchronously returning a collision error code.

Whichever proposal the Group Services subsystem chooses, the providers are notified. If the protocol is a one-phase protocol, the proposal is automatically approved. If the protocol is an n-phase protocol, the proposal notification requests a vote from the providers.

At the end of the protocol, the Group Services subsystem notifies the providers of the protocol's result, that is, whether the protocol was approved or rejected.

For voluntary leave protocols:

For expel protocols, see The expel protocol.

For state value change protocols:

For provider-broadcast message protocols:

For change-attributes protocols:

Note that any n-phase protocol can propose a change to the group state value. If the group state value change is accepted:

Submitting changes with voting responses

The voting response to each phase of an n-phase protocol may contain a proposed new group state value, a provider-broadcast message, and a proposed new default vote for the group.

These choices give providers quite a bit of flexibility in managing their actions during an n-phase protocol. When one or more of these items is submitted during a voting response, the Group Services subsystem broadcasts it to all providers as part of the notification for the next phase of the protocol.

Changing the state value during the voting phases of a protocol can be very useful. As an example, it would allow a group to update the state value during membership change protocols, which may be important in determining group quorum or active/inactive status.

Similarly, by submitting a provider-broadcast message with their voting response, instead of or along with an updated state value, the providers can pass data among themselves during the protocol, without having to actually manipulate the state value field.

Because each provider must issue a vote response, each provider could submit with its vote a proposed updated state value, a provider-broadcast message, or a new default vote. In case of multiple submissions, the Group Services subsystem chooses only one of the values to propagate to the providers for the next phase notification. The Group Services subsystem considers only the providers who do not specify a null pointer to a state value or message. For these, the Group Services subsystem arbitrarily chooses one of the responses it receives from the group. Because the providers cannot control which response is chosen, they should guarantee that one of the following is true:

If these rules are not followed, it is not possible to determine which response will be chosen to be propagated for the next phase.

The Voting Phase Time Limit

The voting phase time limit allows the providers to determine if their peers are not responding quickly enough during voting protocols.

Once the Group Services subsystem has delivered its notification for each voting phase, it sets a timer. If it has not received a voting response from the provider within that time, the Group Services subsystem assumes that the provider is not going to respond, and applies the group's default vote for this provider. Note that the default vote applies only to the currently running protocol. If the provider votes later, the vote is ignored, and the provider is given an error code that indicates that the time limit was exceeded.

The Group Services subsystem specifies that a default vote was applied because the time limit was exceeded, but does not specify, at this time, the providers that were slow. If the application of the default vote causes the protocol to be rejected, or the time limit is exceeded in the last voting phase of an approved protocol, Group Services sends a notification to the providers that lists the providers that exceeded the time limit. The Group Services subsystem takes no further action. However, a provider may initiate an expel protocol to remove any providers that exceeded the time limit, if appropriate.

The voting phase time limit is also used to time the invocation of deactivate scripts during expel protocols. For more details, see The expel protocol.

Simultaneous protocols

Because there may be multiple providers in a group, more than one provider may submit a proposal at the same time. However, the Group Services subsystem does not invoke more than one protocol at a time within a group. (Of course, multiple protocols may be running simultaneously in a domain, one for each of the groups in the domain.)

What are simultaneous proposals? There is always a delay time between the call of a GSAPI subroutine by a provider to initiate a protocol and the broadcast of any resultant notification for that subroutine. The lag time allows the Group Services subsystem to batch multiple join requests, because the Group Services subsystem may receive multiple such requests before it has actually broadcast a notification. In this case, the Group Services subsystem collects all of the joins and issues a single notification. Similarly, the Group Services subsystem batches together multiple failure leaves or cast-outs into a single protocol. In all other cases, it deals with proposals one at a time.

In general, the first proposal to be made after a running protocol completes is the one that is chosen to invoke next. If multiple providers all attempt to submit proposals, the Group Services subsystem chooses one arbitrarily.

For provider-initiated proposals, all proposals that are not chosen to be invoked immediately are returned to the providers, with an asynchronous collision error code. The notification of the collision may arrive before or after the protocol that was chosen begins. The provider may resubmit the proposal at a later time, if appropriate.

All the Group Services subsystem-initiated proposals remain pending until they have been invoked within the group. No provider-initiated proposals are accepted until all of the pending Group Services subsystem-initiated proposals have been invoked. A provider that attempts to submit a proposal receives a synchronous or asynchronous collision error code.

When choosing among multiple proposals, the Group Services subsystem chooses a proposal based on the following priority order:

  1. Failure leaves and cast-outs
  2. Source-state reflection
  3. Joins
  4. Leaves and expels
  5. State value change, provider-broadcast message, or change-attributes protocols.

Within these categories, if there are multiple simultaneous proposals of the chosen type, the Group Services subsystem arbitrarily chooses one of them, excluding those that may be batched together.

If batching is allowed, membership changes are batched. Joins are batched only with joins and failure leaves are batched only with failure leaves.

No provider is allowed to cycle invisibly. If a provider should fail and then restart and try to join the group, the Group Services subsystem ensures that the leave of that provider is proposed before the subsequent join of that provider.

A running protocol is always completed. The protocol could complete successfully or unsuccessfully. An unsuccessful completion could be caused by a provider voting to REJECT the protocol, by an explicit or implicit vote. The protocol might also end unsuccessfully if one or more providers fail to submit their votes within the specified time limit.

A rejected provider-initiated protocol is not automatically resubmitted. The providers must resubmit the protocol, if it is required.

Ending a protocol

The end of the protocol is signaled by the end of the voting phases for the protocol. In the case of a one-phase protocol, the end phase is the only phase.

In the case of an n-phase protocol, the voting phases can end in one of the following ways:

In all cases, the end phase consists of a broadcast of the results of the protocol just processed.

For approved proposals, a notification is sent to all providers and subscribers. The notification contains a flag that specifies whether any default votes were used to approve the protocol. Providers receive this information, but subscribers do not. The contents of the notification is also determined as follows:

For rejected proposals, a notification is sent only to providers. The notification contains the following information:

The expel protocol

The expel protocol allows a provider to propose the removal from the group of one or more providers. Some situations in which this could be useful include:

During the invocation of the expel protocol, Group Services runs a deactivate script against each provider that is being expelled. The deactivate script, which is specified by each GS client on initialization, is used to perform any cleanup actions that may be required.

The deactivate script does not need to be a shell script but can be any kind of executable file. For each provider that is targeted for expulsion, the Group Services daemon forks a child process that attempts to invoke the deactivate script on the provider's node. For details about the environment in which the deactivate script runs and the input and output specifications to which it must conform, see Deactivate scripts and ha_gs_expel subroutine.

The expel protocol is a provider-initiated protocol. Therefore, if it collides with another already-running protocol, Group Services returns it to the proposer. The proposer must resubmit the protocol; the protocol is not automatically queued.

A provider uses the ha_gs_expel subroutine to request an expel protocol. On input, the provider specifies the following information:

For each provider that is targeted for expulsion, Group Services runs the deactivate script that was specified by that provider when it initialized itself with Group Services. The deactivate script runs on the node on which the provider that is targeted for expulsion is running. It runs during the phase and uses the flag that was specified on the expel protocol. To be successful, the deactivate script must complete within the voting time limit for the phase. To invoke the deactivate script, Group Services acts as a substitute for each provider that is being expelled.

During the expel protocol, providers that are not being expelled treat this as a normal protocol and take any action they deem appropriate. If it is an n-phase protocol, their voting responses are tallied as if it were any other n-phase protocol.

If the value of the deactivate phase specifier is 0, no deactivate script is invoked during the protocol. If the protocol is approved, the providers that are targeted for expulsion are removed from the group. Because one-phase protocols are always approved, a one-phase expel protocol with a deactivate phase specifier of 0 simply removes the targeted providers from the group. If the protocol is rejected, the targeted providers are not removed from the group.

At the start of the voting phase given by a non-zero deactivate phase specifier, Group Services runs the deactivate script against each targeted provider. If at least one provider votes to reject the protocol before this phase, the targeted providers are not removed from the group and no deactivate scripts are invoked.

If the expel protocol is a one-phase protocol, and the value of the deactivate phase specifier is 1, the deactivate script is run immediately after the protocol begins running. Providers that are not targeted for expulsion receive the usual protocol approval notification, informing them that the targeted providers are now out of the group. Providers that are targeted for expulsion receive the protocol approval notification after the Group Services daemon has forked a child process to run the deactivate script. The Group Services daemon does not wait for the script to complete before it sends the notification. Therefore, it difficult to determine whether the provider will receive the notification before or after the script runs.

The exit code of the deactivate script is not inspected, and the result is not returned to the providers that remain in the group.

If a provider that is targeted for expulsion by a one-phase expel protocol fails after the protocol has begun, no failure protocol is initiated in the group for that provider.

When a deactivate script runs successfully, it is expected to exit with an exit code of 0. Group Services treats the successful completion of the deactivate script as a vote to approve the protocol. If the protocol requires more voting phases, Group Services continues to vote APPROVE for each subsequent voting phase.

When a deactivate script does not exit with a code of 0, Group Services enters the group's current default vote value as the provider's vote for the phase. If the protocol requires more voting phases, Group Services continues to enter the current default vote value as the provider's vote for each subsequent voting phase.

If the deactivate script is to be run in a future voting phase, Group Services enters a vote of CONTINUE as the provider's vote for each interim voting phase.

If one or more providers that are targeted for expulsion did not specify a deactivate script, or specified a script that could not be run, but a non-zero deactivate phase specifier was given, then for those providers, the group's default vote value is entered for this and each subsequent voting phase. However, for providers that did specify a valid deactivate script, the script is run and its result is used to drive the voting, as previously described.

When a provider fails after the expel protocol begins but before the Group Services daemon has forked a child process to run the deactivate script, Group Services passes a process ID of 0 to the deactivate script. The deactivate script is still run and the exit code is used to determine the vote for this provider, as previously described.

Group Services tallies the votes for voting phases in the normal manner. If the expel protocol is approved, the providers that are targeted for expulsion are removed from the group. Remaining providers and subscribers are notified.

Group Services sends the protocol approval notification to expelled providers that did not exit in the course of running the deactivate script. However, Group Services does not verify that such providers receive or process the notification. Because they are no longer in the group, expelled providers cannot submit protocols and do not receive notifications related to the group.

In the event that the protocol is rejected for any reason, the providers that are targeted for expulsion are not removed from the group. However, if the deactivate script causes a provider to exit, Group Services initiates a failure leave protocol for that provider.

When a single process is joined as providers to multiple groups, and one of those provider instances has been expelled from a group, the effect on the other instances is as follows:

If a single process is joined as providers to multiple groups, and more than one of the groups are simultaneously running expel protocols that target those providers (because the process is unresponsive, for example), the order in which deactivate scripts are run against the process is not defined by Group Services. Because each group's expel protocol proceeds independently, Group Services does not coordinate the invocation of the deactivate script for each group's protocol. If all groups approve their expel protocols and the process is killed, no failure leave protocols are invoked. If one or more groups reject their expel protocols, but the process is killed in the course of running the deactivate script, those groups initiate failure leave protocols to remove the failed provider.

Deactivate-on-failure handling

The same deactivate script will be run in the case of a local provider's process failure as well as in the case of the expel protocol. When a provider is failing, its group is forced into a failure leave protocol. Deactivate-on-failure handling allows recovery and clean-up actions on the failed provider's node, although the failed provider's process no longer exists.

In the case of an n-phase protocol, the results of the script's invocation will be used in subsequent voting for the protocol. For a one-phase protocol, the results will not be relayed to the remaining group members. Unlike expel, the group cannot specify the voting phase in which the script will be invoked. It is always run in the first phase. The failure leave protocol, with deactivate-on-failure handling, operates as follows:

Deactivate-on-failure handling with one-phase protocol

If the failure protocol is a one-phase protocol, the deactivate script is invoked immediately after the protocol begins running and the Group Services daemon does not wait for the script to complete. Non-failed providers receive the usual protocol approval notification, informing them that the failed providers are now out of the group.

The exit code of the deactivate script is not inspected, and the result is not returned to the providers that remain in the group.

Deactivate-on-failure handling with n-phase protocol

If the failure protocol is an n-phase protocol, the results of the deactivate script will be used to guide the vote submitted for the failed providers. Note that the deactivate script is always run during the first phase of the failure protocol.

If a deactivate script runs successfully, it is expected to exit with an exit code of 0. Group Services treats the successful invocation of the deactivate script as a vote to approve the protocol. If the protocol requires more voting phases, Group Services continues to vote APPROVE for each subsequent voting phase.

If a deactivate script does not exit with a code of 0, Group Services enters the group's current default vote value as the failure provider's vote for the phase. If the protocol requires more voting phases, Group Services continues to enter the current default vote value as the failed provider's vote for each subsequent voting phase.

If the group has specified a time limit for failure protocols, and the script does not complete within the time specified, the Group Services daemon treats this as a normal voting time out and applies the group's current default vote. If the voting phases continue in the protocol, the Group Services daemon will continue to apply the group's current default vote value for each subsequent voting phase.

Group Services tallies the votes for voting phases in the normal manner. If the failure protocol is approved, the failed providers are removed from the group. Remaining providers and subscribers are notified. If the failure protocol is rejected, there are special conditions that apply to rejection of any failure protocol.

Deactivate scripts

This section provides information about the invocation environment, input parameters, and exit codes of deactivate scripts.

Invocation environment for deactivate scripts

To handle a situation in which a provider must be expelled, or in which it is failing, a provider can specify a deactivate script on the ha_gs_init subroutine when it first registers with Group Services. The script may be a shell script or any kind of executable file that conforms to the input and output rules that are specified later in this section.

Group Services does not verify that a deactivate script actually exists on a node or that it can be run, until an expel protocol is invoked. If the specified deactivate script is not found or cannot be run, Group Services applies the group's default vote value for the phase in which the deactivate script should have been invoked, and for each subsequent voting phase, if there are any.

A valid deactivate script is run as follows. For each provider targeted by the expel protocol, the Group Services daemon on the provider's node forks a child process that tries to run the deactivate script, using the following environment:

Effective uid and gid

The forked process runs with the effective uid and gid of the targeted provider that it had when it registered with Group Services by its call to the ha_gs_init subroutine. If the provider changed its uid or gid after calling ha_gs_init, the deactivate script still uses the effective uid and gid from the time when ha_gs_init was called. A deactivate script with a set uid bit in its file permissions runs with those values.

Working directory

The forked process begins running in the current working directory of the targeted provider that it had when it registered with Group Services by its call to the ha_gs_init subroutine. If the provider changed its current working directory after calling ha_gs_init, the deactivate script still uses the current working directory that existed when ha_gs_init was called. A deactivate script that wants to run in another directory must change to that directory.

Environment variables

The forked process inherits the environment variables from the Group Services daemon's environment. Therefore, the deactivate script must not make any assumptions about the environment variables (for example, the path) or access to specific directories or file systems except for those that are normally accessible to the provider's effective uid and gid.

STDIN, STDOUT, and STDERR file descriptors

On input, the STDIN, STDOUT, and STDERR file descriptors are closed (not associated with any files). To perform input or output, the deactivate script must explicitly open any input or output file that it wants to use.

Input parameters for deactivate scripts

On input, Group Services supplies the following parameters to a deactivate script:

Exit codes for deactivate scripts

On output, the deactivate script must supply an exit code of 0 for a successful completion. Any other exit code indicates an unsuccessful completion. It is up to the deactivate script to decide what constitutes a successful completion.

On receipt of an exit code indicating a successful completion before the time limit expires, Group Services votes APPROVE for this voting phase of the protocol. On receipt of an exit code indicating an unsuccessful completion before the time limit expires, Group Services applies the group's default vote for this voting phase of the protocol, and each subsequent voting phase of the protocol.

If the deactivate script does not exit before the time limit expires, Group Services applies the group's default vote for this voting phase of the protocol, and each subsequent voting phase of the protocol.

Notifications

A GS client can receive several types of messages, called notifications, from Group Services. These include notifications for:

All messages are sent in a fault-tolerant-manner. That is, providers and subscribers are guaranteed to receive notifications despite failures.

Protocol proposal and ongoing protocol notifications

These notifications are sent to the providers of a group to indicate that an n-phase protocol has been proposed or is in progress. As a response to these notifications, the Group Services subsystem typically expects a vote.

Protocol proposal notifications are not sent for one-phase membership or state value changes because these proposals are automatically approved.

There are three types of proposals for which notifications are sent:

Membership change proposals
A membership change proposal notification is sent when a provider has requested to voluntarily join or leave a group, a provider has requested the expulsion of one or more providers from a group, a provider has left the group involuntarily either because the process itself failed, or because the node on which it was running failed. An involuntary leave is called a failure leave and is initiated by Group Services.

State value change proposals
A state value change proposal notification is sent when a provider has requested a change to the group's state value.

Provider-broadcast message proposals
A provider-broadcast message proposal notification is sent when a provider has issued a request to broadcast a message and may also initiate voting.

Attribute change proposals
Attribute change proposal notification is sent when a provider has issued a request to change the group's attribute.

The only GS clients that are concerned with protocol proposal and ongoing protocol notifications are providers. Subscribers do not participate in proposing, approving, or rejecting membership or state value changes for the group. Also, when subscribers join or leave the group, no notification is sent to any GS client.

Protocol approvals

Protocol approvals are sent to the providers of a group to indicate that a proposal has been approved. They are also sent to the subscribers of the group for membership changes and state value changes.

Note that a protocol approval notification is sent as the first and only notification for a one-phase protocol.

Protocol rejections

Protocol rejections are sent to the providers of a group to indicate that a proposed membership or state value change has been rejected. Subscribers are not notified when proposals are rejected.

Announcement notifications

Announcement notifications are sent to the providers of a group to announce an item of interest within the group. They include warnings that individual providers have not voted in time or not responded to a responsiveness check.

Responsiveness notifications

Responsiveness notifications are sent to of a group's providers to determine whether the provider is active. If a provider does not respond to this responsiveness check within the time limit it specified previously, Group Services sends an announcement notification to all providers.

An illustration of a multi-phase protocol

The figures that follow illustrate a state change protocol for a group with two providers, P1 and P2, and two subscribers, S1 and S2. P2 proposes a change to the group's state value, and specifies whether the change requires voting phases or is handled as a single broadcast. Figure 1 shows the sequence of events for a one-phase protocol. Figure 2 shows the sequence of events for a two-phase commit protocol. Figure 3 shows the sequence of events for a three-phase protocol.

Upon receipt of the state change proposal from P2, the Group Services subsystem sends a notification to all of the providers in the group, namely P1 and P2. If P2 requested a one-phase protocol, the change is approved, S1 and S2 are notified, and the protocol terminates. If P2 requested a multi-phase protocol, P1 and P2 are instructed to vote on the outcome of the protocol.

Figure 2 shows the invocation of a multi-phase protocol in which the providers vote to approve the change after one round of voting. Figure 3 shows the providers extending the voting to three rounds. When the change is approved, all of the providers and the subscribers, that is, P1, P2, S1, and S2, are informed of the change.

Note that if both P1 and P2 submit state change requests concurrently, the Group Services subsystem chooses one of the requests for invocation, and returns the other to its proposer.

The n-phase agreement protocol that the GSAPI provides is flexible and powerful enough to handle a variety of synchronization and coordination requirements:

If any provider is notified that the state change is approved, the GSAPI guarantees that all (non-failed) providers and subscribers are notified of the approved state change without regard to failures within the system.

One-phase and n-phase changes

It is the responsibility of the providers in a group to determine the level of consistency that is required for managing changes to the group membership and state value. As described previously, providers may use either one-phase or n-phase protocols. In all cases, all providers see all protocols in the same order. However, the level of consistency differs in an important way, as follows.

Assume that two proposals occur rapidly one after the other.

Subscribers have no choice but to receive the notifications of approved group changes in a loosely synchronous manner. The GSAPI guarantees that all subscribers to a group see the approved changes in the same order as do the group's providers. However, one subscriber may see multiple notifications before another subscriber has seen any.

Figure 1. A One-Phase Protocol

View figure.

Figure 2. A Two-Phase Commit Protocol

View figure.

Figure 3. Barrier Synchronization in a Multi-Phase Protocol

View figure.

Active protocol proposals

The GSAPI guarantees that only one protocol that affects the group's membership or state value is run at any time. If more than one proposal is submitted within the group simultaneously, the Group Services subsystem chooses one for invocation and returns the others to the providers that submitted them. It is the responsibility of a provider that receives a returned proposal to resubmit it for invocation, if appropriate.

When providers join or involuntarily leave a group, this processing is modified. In these cases, the membership protocol to deal with the join or involuntary leave request is held until the currently running protocol has been approved or rejected. The membership change protocol is then started immediately. Figure 4 shows how a new provider join request is delayed until the completion of an ongoing three-phase state change protocol.

Figure 4. The Serialization of a Pending Join Request

View figure.

Failures

When a node fails, Group Services assumes that all providers on that node have also failed. The GSAPI supports process failure detection by detecting the loss of a socket connection.

When a provider leaves due to either a node failure or the failure of the process itself, Group Services proposes a failure leave protocol for that provider. If the group had been using a one-phase protocol to handle joins, the failure leave is specified as one-phase. If the join had been n-phase, the failure leave is specified as n-phase.

As long as one provider is active, Group Services continues to keep the group going.

The Group Services subsystem itself has been designed to survive failures. These can be node failures, that lead to the loss of one or more Group Services processes, or network failures and communications adaptor failures, that hinder the communication between Group Services processes.

If the Group Services subsystem fails, any surviving GS client receives an announcement notification that the Group Services daemon has terminated suddenly and unexpectedly. The GS client can get the FFDC ID (First Failure Data Capture identifier) that is related to the cause of the Group Services subsystem failure when the FFDC ID exists. In addition, if a protocol is running, it is terminated. See Deactivate-on-failure handling and the FFDC Programming Guide and Reference for further information.

Provider actions during voting

As shown in Figure 5, a provider can perform any sequence of actions that it chooses between the time that it receives an ongoing protocol notification (that is, a request for a vote) and the time that it votes. However:

Figure 5. Actions during a Membership Change Protocol

View figure.

Subscribing to a group

When it is desirable for a process to monitor a group without playing a part in the control of the group's state, the GSAPI allows the process to subscribe to the group. A subscriber may subscribe to receive approved membership changes, approved state changes, or both.

Subscribers do not participate in any of the voting protocols. In fact, they are not notified that any such activity is taking place. If a state or membership change is not approved, no notification is sent to the subscribers.

Notifications to all subscribers to any single group are serialized, so all subscribers receive all notifications in the same order. However, it is not guaranteed that all subscribers will receive any one notification before any other subscribers receive subsequent notifications. No notifications or protocol proposals are made when subscribers join or leave a group.

Subscription allows one group to maintain a loose synchronization with one or more other groups. For example, a subscriber could be used to monitor and display information about the state of a number of groups and the members of those groups. As the subscribed-to groups change state or membership, the monitor can collect the changes and display or log the updated information.

Provider and subscriber tokens

The GSAPI uses integers that are called provider and subscriber tokens to identify providers and subscribers. These tokens are assigned, invalidated, and reassigned in a similar way in which file descriptors are assigned, invalidated, and reassigned as files are opened and closed.

For example, a GS client joins group foo and receives provider token 0. When the same client leaves group foo, Group Services invalidates provider token 0 and makes it available for reassignment. When the next GS client (which could be the same GS client or a different GS client) joins the next group (which could be the same group or a different group), Group Services assigns provider token 0 to that client.

As another example, a GS client subscribes to group bar and receives subscriber token 2. When the same client unsubscribes from group bar or becomes unsubscribed because group bar is dissolved, Group Services invalidates subscriber token 2 and makes it available for reassignment. When the next GS client (which could be the same GS client or a different GS client) subscribes to the next group (which could be the same group or a different group), Group Services assigns subscriber token 2 to that client.

Source-target group relationships

It is sometimes convenient to associate several groups with a single application, and to allow a process to be a member of multiple groups. Such relationships are not normally tracked by Group Services, except when the source-target facility is used. To understand this facility, consider the following scenario.

If a node crashes, all of the groups with providers on that node receive a membership change proposal notification simultaneously. The notification causes each group to begin reacting independently to the membership change. However, it may be better for some applications to wait until another group has completed processing this change. Such a relationship might exist, for example, between a disk recovery subsystem and a distributed database application. If the database is on a disk on the failed node, the database application must wait for the disk recovery subsystem to recover from the node failure before it can begin its recovery.

Although it is possible to deal with such relationships using subscriptions, they are loosely synchronized and may not provide the degree of timing control that is required. Instead, the source-target facility can be used. The source-target facility allows a target group to tie itself to a source group as follows. If a failure leads to the failure of a provider in both the source and the target groups, the source group completes its membership change protocol before the target group begins its membership change protocol. Thus, the providers in the target group can run with the knowledge that the providers in the source group have already handled the failure. This knowledge is particularly useful when the recovery of the target group depends on the completion of recovery by the source group.

In the recovery scenario just described, the disk recovery subsystem is defined as the source group, and the database application is defined as the target group.

With source-target groups, join and leave protocols work a little differently than with other groups. Here are some key differences.

Host and adapter membership groups

The Group Services subsystem provides several system-defined groups to which GS clients can subscribe for keeping track of hardware status. Refer to Description for a complete listing of the different system-defined groups that are available for subscription.

The host membership group

The Group Services subsystem keeps track of node status to determine when nodes are no longer reachable. A node that is fully isolated due to network or communications adapter failures is not distinguishable from a node that has failed. Accordingly, a fully isolated or failed node triggers such actions as notifications to groups that have one or more providers on the failed node or nodes.

The state of the nodes is reflected by the Group Services subsystem in a special system-defined group called the host membership group. The host membership group is represented by HA_GS_HOST_MEMBERSHIP. By subscribing to this group, a GS client can obtain information about the nodes that are currently active and any transitions that occur as nodes become active or fail.

A node appears active in the host membership group when the Group Services subsystem is active on that node. All such active nodes that can communicate with each other appear in this group.

The adapter membership groups

The Group Services subsystem also keeps track of the status of Ethernet and SP Switch adapters. The state of the adapters is reflected by the Group Services subsystem in two system-defined groups called the Ethernet adapter membership group and the SP Switch adapter membership group. By subscribing to these groups, a GS client can obtain adapter membership information which, for example, it uses to determine communication paths to nodes.

The view of adapter membership implies that all of the nodes in the membership are able to communicate with each other over the IP network to which the adapters are connected. On the SP, this means that for Ethernet membership (HA_GS_ENET_MEMBERSHIP), the view from any one node is the set of all other nodes reachable from that node via the SP Ethernet. If the SP Ethernet is sundered (broken such that only subsets of nodes can communicate with each other), a node's view of Ethernet membership will be only the subset of nodes with which it can communicate.

Within the Group Services PSSP domain, adapter membership information is available for only a single Ethernet adapter on the SP control workstation, even if multiple Ethernet adapters are connected to the SP nodes.

For the SP Switch (HA_GS_CSS_MEMBERSHIP), the hardware does not allow sundering; if a node is on the switch, it can communicate with any other node on the switch. Thus, a local node's view of cssMembership is the global view of all nodes that are on the switch. A node must look at the given membership to determine if it is listed.

If HACMP/ES is installed on a node, and a GS client is connected to the Group Services HACMP/ES domain, there may be more adapter membership groups than simply the two described above. This is because heartbeating may take place on additional networks for HACMP/ES if the networks are installed and are defined to HACMP/ES. In this case, the semantics for these adapter membership groups match those as described for the Ethernet membership group. When a GS client receives a subscription notification for an adapter membership group, it will indicate the set of nodes with which this node can communicate across the given adapter type.

In summary, the information presented to a GS client for adapter membership subscriptions is not globally consistent (except for the HA_GS_CSS_MEMBERSHIP group). Since each node is presented with the set of nodes to which it can communicate for the given adapter type, different nodes may see different views. This can occur if the different networks are not fully connected to all nodes in the domain, or if there are failures of various routers between sections of the networks.

For more information, see ha_gs_subscribe subroutine.

Quorum

Many applications require a form of quorum to ensure that the proper resources are available before the application begins operation. For example, one application may require a certain percentage of nodes to be up and running before it begins, while another requires particular nodes.

Because groups have significantly different requirements for quorum, the GSAPI does not provide a predefined quorum as part of its support. It is the responsibility of the application that is using the GSAPI to form groups that define and implement required quorum mechanisms. By manipulating the state information of the group, an application can build the required quorum mechanism.

Sundered networks

The Group Services subsystem provides a single group namespace within each system partition. Given the right set of multiple network failures, a system partition with multiple networks can become split. In the case of a sundered namespace, the nodes become split in such a way that they can no longer communicate with any nodes on the other side of the split. However, it is possible for each sundered portion to maintain enough information to reconstruct the groups that previously existed, at least those groups that still have members within any particular portion.

When a namespace is sundered, it is possible to get two instances of what should be one group. For example, in a sundered network, two nodes that own the two tails of a twin-tailed disk could end up on separate sides of the split. Because the processes of the subsystem coordinating the disk on each node would believe that the other process had disappeared, the process might want to activate its tail, which could lead to data corruption. As this example shows, it is important that each group determine if it needs a form of quorum, and use it to guide when a group is ready to perform its services.

Although the Group Services subsystem does not provide a quorum mechanism, it does provide some assistance to groups when a network is sundered. When a system partition is sundered, the providers receive membership protocol proposals from Group Services that all of the providers on the other side of the split have failed. The providers can then run those protocols as they normally would, taking into account such factors as quorum to protect resources as necessary.

If a sundered network becomes healed and Group Services discovers separate domains, it dissolves the smaller domain, which is defined as the domain with the smaller number of nodes. Group Services sends an announcement notification that it has terminated abnormally to the clients on the smaller domain. Upon receipt of the notification, the clients on the smaller domain can join the larger domain or perform any other appropriate recovery action.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]