IBM Books

Administration Guide


Primary node takeover

Error recovery on an SP switch is done locally. Detection and fault isolation are done at the link level while normal traffic flows across the rest of the switch fabric. Detected faults are forwarded to the active primary node for analysis and handling. When the primary node completes assessing the fault, the remaining nodes on the switch fabric are informed of status changes without disruption.

The primary node initializes the switch fabric and processes switch operations and errors reported from the switch fabric. The primary node is no longer a single point of failure on an SP system containing an SP switch.

The primary backup node passively listens for activity from the primary node. When the primary backup node detects that it has not been contacted by the primary node for approximately 7 minutes, it assumes the role of the primary node. This takeover involves reinitializing the switch fabric, selecting another primary backup, and updating the System Data Repository (SDR) without disruption.

The primary node also watches over the primary backup node. If the primary node detects that the primary backup node can no longer be contacted on the switch fabric, it selects a new primary backup node. The criteria to select a new primary backup node follows:

  1. First, select a node on a switch assembly other than the switch assembly to which the primary node is attached. |A p690 node can be a switch primary or primary backup node, like any |node in the system. If possible do not select, as both primary and |primary backup, two nodes that are LPARs within the same p690 |server.
  2. Second, if no other switch assembly exists, select a node attached to a switch chip other than the one to which the primary node is attached.
  3. If no other switch chip exists, select any available node on the switch chip to which the primary node is attached.

During the period of time between the failure or loss of the primary node and the takeover by the primary backup node, the switch fabric continues to function. Secondary failures are detected during the reinitialization of the switch fabric.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]