ITEM: V9475L

HOWTO Delay Deadman Switch in HACMP 2.1/3.1


Question:

Env: HACMP 3.1 2 node cluster at AIX 3.2.5.

Desc:  The customer wants to know how to extend the time before a
deadman time-out occurs.  The switch to which the customer is
referring is the cycles_to_fail switch (-f)

The following excerpt from the README2.1.UPDATE.U435385 explains the
various parameters of the Cluster Manager:

 HACMP/6000 Cluster Manager Startup Parameters (Switches)
 -----------------------------------------------------------------
 These switches are used to change the following variables and all
 require an integer as an argument.

 -n normal_ka_cycles       This is the number of microseconds
                           between keepalives.  The default value
                           is 500000, which is 1/2 second.
                           Whenever this parameter is changed,
                           speed_ka_cycles should also be changed.

 -f cycles_to_fail         This value is used to define the number
                           of missed keepalives that determine a
                           minor failure.  It is also used as the
                           number of cycles each node will wait for
                           synchronization prior to continuing. The
                           default value is 4.

 -a ifs_fails_til_netfail  The default value is 4.  This value is
                           used for determining the number of minor
                           failures permitted prior to failing a
                           remote interface.

 Here's how this relates to the failover time, and the limit imposed
 by the deadman switch.

 The timeout for the deadman switch in prior releases was a constant
 set to 5 seconds.  It has been changed to the following:

 zombie_timeout=(((normal_ka_cycle * (cycles_to_fail - 1) *
                 ifs_fails_til_netfail) - uS_PER_SECOND)/1000);

 So if you set you cycles_to_fail to be 4 (default) the deadman

 zombie_timeout=(((500000 * (4 - 1) * 4) - 1));

 OR:

 zombie_timeout=(((500,000 microseconds * (4 - 1) * 4) - 1 second));
             5 =  ( 0.5 seconds * 3 * 4 ) - 1 second) = 6 - 1 sec.

 A network failure would occur at 6 seconds:

 normal_ka_cycle * (cycles_to_fail - 1) * ifs_fails_til_netfail
    0.5 sec   *                 (4 - 1) *  4
 =  6 seconds

 This would also be a node failure if the keepalive packets were
 missed on all networks.

 If you set the cycles_to_fail to be 24, deadman will occur at:

 zombie_timeout=(((500000 * (24 - 1) * 4) - 1 second); or 45 sec.
 A node failure would occur at:
 normal_ka_cycle * (cycles_to_fail - 1) * ifs_fails_til_netfail
 500000 microsec. *      23               *        4            =
 46 seconds.


Support Line: HOWTO Delay Deadman Switch in HACMP 2.1/3.1 ITEM: V9475L
Dated: June 1995 Category: N/A
This HTML file was generated 99/06/24~13:30:34
Comments or suggestions? Contact us