Problems with HACMP Error Notification Function

ITEM: RTA000072490



Our customer tested the Error Notification Function with HACMP                  
and ran into the following problems.  Would you please give us                  
some advice?                                                                    
Testing configuration is as follows:                                            
                                                                                
 ==o=o=========================o=o=============  Ethernet_1                     
 ==|=|==o=o====================|=|==o=o==========  Ethernet_2                   
   | |  | |                    | |  | |                                         
  +O-O--O-O+                  +O-O--O-O+                                        
  |a b  c d|       |-----|    |e f  g h|                                        
  | Node 1 |====|-----|  |====| Node 2 |                                        
  |  (570) |====|9334 |=======|  (570) |                                        
  +--------+    |     |  |    +--------+                                        
                |     |--|                                                      
                |-----|                                                        
Where, 'a' through 'h' stand for ethernet adapters and the IP                   
address is assigned as follows:                                                 
                                                       
           a :      ent0      150.14.25.141(150.14.25.140 as boot)              
           b :      ent1      150.14.26.141                                     
           c :      ent2      150.14.30.141                                     
           d :      ent3      150.14.31.141                                     
           e :      ent0      150.14.25.142                                     
           f :      ent1      150.14.26.142                                     
           g :      ent2      150.14.30.142                                     
           h :      ent3      150.14.31.142                                     
                                                                                
AIX is Ver 3.2.4 and HACMP is Ver 1.2, and the cluster is con-                  
figured as Hot-Standby with dual ethernets.                                     
                                                                               
The problems are:                                                               
 1. We defined the error notification object for standby ether-                 
    net adapter (ent1) as follows:                                              
       Notification Object Name : ENT_ERR2_en1                                  
       Persistence across system restart? : Yes                                 
       Process ID for use by Notify Method : 0                                  
       Select Error Class :Hardware                                             
       Select Error Type :TEMP                                                  
       Match ALERTable errors? :None                                            
       Select Error ID Label :ENT_ERR2                                          
       Resource Name :ent1                                                      
       Resource Class :adapter                                                  
       Resource Type :ethernet                                                  
       Notify Method :/usr/sbin/cluster/stdbyerr1.rc $6                         
    However, the Notify Method was not called when we pulled out               
    the ether cable from the adapter 'ent1'.  (ENT_ERR2 error                   
    surely occured, because the Error Log catched the error for                 
    'ent1' adapter.)   But we blanked out the Resource Name 'ent1',             
    then the Notify Method was called successruly when the same                 
    failure occured.                                                            
    (The Notify Method contains only the statements to log message              
    into cluster.log file.)                                                     
    We tried other Resource Names like ent0 and ent3, but the                   
    result is the same as ent1 (Notify Method was never called.)                
    We also tried the Error Notification for SCSI adapter, and                  
    the Notify Methof was successfully called with the Resource                 
    Name (scsi1) defined as follows.                                            
       Notification Object Name : SCSI_ERR2_s1                                  
       Persistence across system restart? : Yes                                 
       Process ID for use by Notify Method : 0                                 
       Select Error Class :Hardware                                             
       Select Error Type :TEMP                                                  
       Match ALERTable errors? :None                                            
       Select Error ID Label :SCSI_ERR2                                         
       Resource Name :scsi1                                                     
       Resource Class :adapter                                                  
       Resource Type :hscsi                                                     
       Notify Method :/usr/sbin/cluster/diskerr1.rc $6                          
    Could you clarify why the Notify Method was not called for                  
    ethernet adapter failure when the Resource Name was set?                    
    Did we missed any settings for ethernet adapter error notifi-               
    cation?  Or, did we make any mistake?                                       
 2. At each time we set Error Notification from HACMP RAS menu,                 
    the /tmp/HACMP_AAnnnnnn (where, AA seems to be the first 2                  
    characters of Object name created, and nnnnnn is 6 digit                   
    number) created.                                                            
    What are these files for?  Is there any option to delete                    
    them automatically?  Or, should we remove them?  Is so,                     
    is there any problem?                                                       
                                                                                
                                                                                
                                                                                
ANSWER                                                                          
                                                                                
1. I was able to reproduce your problem here with an HACMP 2.1 cluster          
   running AIX 3.2.5. My system had the same behaviour as yours, meaning        
   the error notification method was executed if I left the resource            
   name blank, but was not executed when I filled in a resource name.           
   Even when you leave the resource name blank in the definition screen,        
   the resource name passed to the error notification method script in         
   the $6 argument is in the form that you used it, namely "ent#". This         
   looks like a defect in the AIX error notification facility, specific-        
   ally with ethernet resource errors. The same problem does not occur          
   with SCSI errors (as you described) or token-ring errors (as I found         
   in my testing). You are not doing anything wrong. Since you will want        
   to be kept informed of the resolution of this problem, your or your          
   customer should report this problem to the defect support channel.           
   In the meantime, as a workaround, since you have the resource name           
   correctly passed to your error notification method script, you can put       
   some logic in the script to test for a particular resource name (for         
   example, ent1) if you want to take action only for the failure of one        
   adapter.                                                                     
2. These files that you are seeing in the /tmp directory are just               
   temporary files that are left by the /usr/sbin/cluster/clacdNM script        
   which sets up error notification methods for you. You may erase them        
   without any problem. The HACMP 1.2 version of the script does not            
   properly remove this temporary file when it has finished the setup of        
   the error notification method. This is fixed in the HACMP 2.1 version        
   of the script. You may report it as a defect against HACMP 1.2 if you        
   wish.                                                                        
                                                                                
S e a r c h - k e y w o r d s:                                                  
HACMP ERROR NOTIFICATION ENT0 ENT1 METHOD LOG ETHERNET                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                               


WWQA: ITEM: RTA000072490 ITEM: RTA000072490
Dated: 01/1995 Category: ITSAIHA6000
This HTML file was generated 99/06/24~12:43:26
Comments or suggestions? Contact us