ITEM: V9475L
HOWTO Delay Deadman Switch in HACMP 2.1/3.1
Question:
Env: HACMP 3.1 2 node cluster at AIX 3.2.5.
Desc: The customer wants to know how to extend the time before a
deadman time-out occurs. The switch to which the customer is
referring is the cycles_to_fail switch (-f)
The following excerpt from the README2.1.UPDATE.U435385 explains the
various parameters of the Cluster Manager:
HACMP/6000 Cluster Manager Startup Parameters (Switches)
-----------------------------------------------------------------
These switches are used to change the following variables and all
require an integer as an argument.
-n normal_ka_cycles This is the number of microseconds
between keepalives. The default value
is 500000, which is 1/2 second.
Whenever this parameter is changed,
speed_ka_cycles should also be changed.
-f cycles_to_fail This value is used to define the number
of missed keepalives that determine a
minor failure. It is also used as the
number of cycles each node will wait for
synchronization prior to continuing. The
default value is 4.
-a ifs_fails_til_netfail The default value is 4. This value is
used for determining the number of minor
failures permitted prior to failing a
remote interface.
Here's how this relates to the failover time, and the limit imposed
by the deadman switch.
The timeout for the deadman switch in prior releases was a constant
set to 5 seconds. It has been changed to the following:
zombie_timeout=(((normal_ka_cycle * (cycles_to_fail - 1) *
ifs_fails_til_netfail) - uS_PER_SECOND)/1000);
So if you set you cycles_to_fail to be 4 (default) the deadman
zombie_timeout=(((500000 * (4 - 1) * 4) - 1));
OR:
zombie_timeout=(((500,000 microseconds * (4 - 1) * 4) - 1 second));
5 = ( 0.5 seconds * 3 * 4 ) - 1 second) = 6 - 1 sec.
A network failure would occur at 6 seconds:
normal_ka_cycle * (cycles_to_fail - 1) * ifs_fails_til_netfail
0.5 sec * (4 - 1) * 4
= 6 seconds
This would also be a node failure if the keepalive packets were
missed on all networks.
If you set the cycles_to_fail to be 24, deadman will occur at:
zombie_timeout=(((500000 * (24 - 1) * 4) - 1 second); or 45 sec.
A node failure would occur at:
normal_ka_cycle * (cycles_to_fail - 1) * ifs_fails_til_netfail
500000 microsec. * 23 * 4 =
46 seconds.
Support Line: HOWTO Delay Deadman Switch in HACMP 2.1/3.1 ITEM: V9475L
Dated: June 1995 Category: N/A
This HTML file was generated 99/06/24~13:30:34
Comments or suggestions?
Contact us