HACMP v3.1

ITEM: RTA000065101



According to the announcement of HACMP V3.1, there is a new mechanism           
that enables to control the trigger time at which a heart-beat                  
connection is considered dead (Fast Fallover). It's settable via SMIT.          
How does this work ?                                                            
I know that in HACMP V2.1, a node is considered dead after 12 missed            
Keep-Alive packets, and then a further 9 Keep-alive packets are sent.           
Does this mechanism change those numbers and in what way ??                     
                                                                                
                                                                                
ANSWER                                                                          
With HACMP 3.1, you will be able, through SMIT, to set a Failure                
Detection Rate of Slow, Normal, or Fast independently for each network          
type that you have configured in your cluster. HACMP 3.1 will distinguish       
between "network types", such as Ethernet/IP, Token Ring, Target Mode           
SCSI or RS232, FDDI, etc. Each of these network types that you have in         
your cluster has its own Network Interface Module (or NIM) in each node,        
responsible for sending and receiving KAs, and detecting network                
failures. The effect of the Slow, Normal or Fast settings for the various       
network types are shown in the table below:                                     
                                                                                
                                    Slow       Normal        Fast               
                                    ----       ------        ----               
Ethernet/IP       KA Rate (secs)     1.0         0.5          0.5               
                   Missed KAs        12          12            9                
Token Ring        KA Rate (secs)     1.0         0.5          0.5               
                   Missed KAs        24          24           12                
TMSCSI/RS232      KA Rate (secs)     3.0         1.5          0.5               
                   Missed KAs         8           6            5                
SOCC              KA Rate (secs)     1.0         0.5          0.5               
                   Missed KAs        12          12            9               
FDDI              KA Rate (secs)     1.0         0.5          0.5               
                   Missed KAs        12          12            9                
SLIP              KA Rate (secs)     2.5         1.0          0.5               
                   Missed KAs        18          12           12                
                                                                                
Normal and Slow are the recommended settings. Slow is useful in configur-       
ations where temporary network or CPU load situations might otherwise           
cause false takeovers to occur. A Fast setting exposes you more to the          
danger of false takeovers. Node failure processing starts with the              
fastest network error detection.                                                
                                                                                
In HACMP 3.1, after a node detects an error, the nodes will send Message        
packets to each other to agree on the correct event to process next,            
similar to the "stabilization" period of HACMP 2.1. I do not know exactly       
at this point how long this stage takes, but I would expect it is              
similar to the stabilization period in HACMP 2.1.                               
                                                                                
Question........:                                                               
Thank you for your detailed answer.                                             
I would like to ask some more questions, please.                                
First, the environment is made of two RS/6000 - 380, without                    
any shared disks. Only IP, and maybe HW address takeover.                       
The most important consideration is the time it takes to do                     
switchover.                                                                     
So, the first question would be:                                                
1. Taking into account that there are only two machines in the                  
cluster, is there any way to avoid sending the 9 KA for stabilization,          
or, at least changing the number to a smaller one ?                             
2. In HACMP V2.1 there are some "command line parameters" that one              
may change ( see lpp.doc from PTF U431606), like:                              
 - normal_ka_cycles  dealing with the time between normal KA's                  
 - speed_ka_cycles   dealing with the time between stabilizing KA's             
It seems that those two should be set to the same value.                        
Is this true, or may I set the speed_ka_cycles so that the                      
stabilizing KA would be sent faster then the normal KA's?                       
Are those parameters still available in V3.1?                                   
Thanks and Regards... Liviu                                                     
                                                                                
ANSWER                                                                          
Actually the behavior of the cluster manager in HACMP 3.1 is different          
from that of HACMP 2.1 in how it detects and processes events. In HACMP         
2.1, you have two major stages of event detection and reaction:                 
  1. Event detection time - composed of the KA cycle time multiplied by         
      the "cyles to fail" parameter. These two parameters (cycle time           
      and cycles to fail) became configurable as of PTF U432018.               
  2. Synchronization time - 9 KA packets, which is not adjustable. This         
      is time to allow all cluster managers to agree before any cluster         
      events are generated.                                                     
                                                                                
In HACMP 3.1, you also have 2 stages of event detection and reaction, but       
they are slightly different.                                                    
  1. Event detection time - Also defined by KA cycle time multiplied by         
      cycles to fail. This is now adjustable by network type as described       
      in my previous response.                                                  
  2. Message time - I don't think there is really an official word for          
      this time, so I will call it "message time" for lack of a better          
      name. Instead of sending synchronization packets, each cluster            
      manager now sends "message packets" to every other node on all            
      network connections when it detects an event. This works as               
      as follows:                                                              
        a) New event message sent to all other nodes.                           
        b) Wait 2 seconds or until all nodes acknowledge.                       
            - Node down event is added for all nodes that do not                
              acknowledge.                                                      
        c) Each node votes on event to process. If there is disagreement,       
           the highest priority event is triggered, as follows:                 
             1. node failure                                                    
             2. network failure                                                 
             3. network join                                                    
             4. node join                                                       
        d) All nodes run their event scripts, with message packets sent         
           to synchronize execution (for instance in node joins).               
      This procedure is not adjustable by the user. The time taken will         
      vary depending on how many nodes there are, and how many events are       
      being detected. The shortest time it could take would be 2 seconds,      
      since if every other node does not acknowledge, the remaining node        
      will begin to process the highest priority event in its queue. The        
      time it takes for this though is for every node to vote on the next       
      event. Typically there will only be one event happening at a time,        
      so this should not be too long. In the case of a 2 node cluster, as       
      in your example, this time would really be 2 seconds at the most in       
      a node down. If the other node did not acknowledge, after 2 seconds       
      the remaining node starts processing the event. This new design           
      should speed up event processing in 2 node clusters.                      
                                                                                
There is another enhancement in 3.1's event processing that should speed        
up processing of local failures. Each cluster manager will check for and        
process local failures at 1/2 the error detection time. For instance, if        
a local network adapter fails, the local cluster manager will detect and        
begin to process that event in 1/2 the "cycles to fail". In this way, a        
local adapter swap can be processed and done before the other nodes             
officially detect the problem.                                                  
                                                                                
I hope this description helps to answer your two questions. The cluster         
manager event processing mechanism has changed with HACMP 3.1, so we            
really can't use exactly the same terms to describe it as in 2.1. I             
should just clarify something else from your second question though. As         
of HACMP 2.1, there was no longer a "speed_ka_cycles" parameter. If you         
read the U431606 memo_to_users more carefully, you should not see any           
mention of this. As of HACMP 2.1, there is only a normal KA cycle, used         
for both event detection and synchronization or stabilization. As for           
whether these parameters are still available in V3.1, the preceding will        
tell you that the KA cycle is adjustable by network type to either normal       
(recommended), slow (recommended), or fast (use at your own risk, and           
test¢¢).                                                                       
                                                                                
Thanks again. Very helpful¢¢                                                    
Still another question:                                                         
- As I understand, the Failure Detection Rate parameter settable                
  via SMIT determines the number of missed KA and ALSO the KA rate.             
  Can the KA rate (normal_ka_cycles in V2.1) be adjusted by a                   
  "command line option" of clstrmgr ?                                           
  What I mean is:                                                               
   Say I set the Failure Detection Rate to Fast (for ethernet).                 
   That means I have 9 KA at 0.5 seconds interval.                              
   Can I now further adjust this 0.5 sec interval ? Like to 0.3 sec?            
                                                                                
Thanks and Regards... Liviu                                                     
Thanks again¢¢¢                                                                 
One more question.                                                             
It appears that Target Mode SCSI is NOT supported over SCSI2 F/W                
in AIX 3.2, only over SCSI2 Fast.                                               
In AIX 4.1.1, Target Mode SCSI IS supported over SCSI2 F/W,                     
but HACMP is not there yet.                                                     
For my project is important to have TMSCSI over F/W.                            
Do you know if AIX 3.2.5 will have this capability ?                            
Thanks and Regards... Liviu                                                     
                                                                                
ANSWER                                                                          
I have not gotten HACMP 3.1 yet to try this myself, but apparently this         
is possible also, by using the command:                                         
   chssys -a'-n ######'                                                         
where ###### represents the number of microseconds desired in each KA           
cycle. A stop and restart of the cluster manager activates the change.          
                                                                               
Again, I must warn you, though. Speeding up the cycle exposes you much          
more to the possibility of false takeover, and undesired timeouts of the        
deadman switch causing a node to crash. It is provided by development as        
requested by some customers, but not recommended for most users. To read        
about the relationship of KA cycle and "cycles to fail" on the timeout          
of the deadman switch, read the "memo to users" of HACMP 2.1 PTF U432018        
or U431606.                                                                     
                                                                                
Yes, target mode support for SCSI-2 differential Fast/Wide has just been        
made available for AIX 3.2.5 in PTF U434679. You must be aware that it          
has just been made available for AIX, but has not yet been tested and           
certified with HACMP 2.1 by HACMP development as of December 1994. You          
can expect to get news of this when it is done through an update to the         
HAMATRIX PACKAGE on the MKTTOOLS disk, if you subscribe to it.                  
                                                                               
Meanwhile, the base support is there, and it should work with HACMP, but        
it hasn't officially been tested.                                               
                                                                                
S e a r c h - k e y w o r d s:                                                  
HACMP/6000 3.1 KA KEEPALIVE TAKEOVER TIME EVENT DETECTION ERROR                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                               


WWQA: ITEM: RTA000065101 ITEM: RTA000065101
Dated: 08/1995 Category: ITSAIHA6000
This HTML file was generated 99/06/24~12:43:24
Comments or suggestions? Contact us