Problems with HACMP Error Notification Function
ITEM: RTA000072490
Our customer tested the Error Notification Function with HACMP
and ran into the following problems. Would you please give us
some advice?
Testing configuration is as follows:
==o=o=========================o=o============= Ethernet_1
==|=|==o=o====================|=|==o=o========== Ethernet_2
| | | | | | | |
+O-O--O-O+ +O-O--O-O+
|a b c d| |-----| |e f g h|
| Node 1 |====|-----| |====| Node 2 |
| (570) |====|9334 |=======| (570) |
+--------+ | | | +--------+
| |--|
|-----|
Where, 'a' through 'h' stand for ethernet adapters and the IP
address is assigned as follows:
a : ent0 150.14.25.141(150.14.25.140 as boot)
b : ent1 150.14.26.141
c : ent2 150.14.30.141
d : ent3 150.14.31.141
e : ent0 150.14.25.142
f : ent1 150.14.26.142
g : ent2 150.14.30.142
h : ent3 150.14.31.142
AIX is Ver 3.2.4 and HACMP is Ver 1.2, and the cluster is con-
figured as Hot-Standby with dual ethernets.
The problems are:
1. We defined the error notification object for standby ether-
net adapter (ent1) as follows:
Notification Object Name : ENT_ERR2_en1
Persistence across system restart? : Yes
Process ID for use by Notify Method : 0
Select Error Class :Hardware
Select Error Type :TEMP
Match ALERTable errors? :None
Select Error ID Label :ENT_ERR2
Resource Name :ent1
Resource Class :adapter
Resource Type :ethernet
Notify Method :/usr/sbin/cluster/stdbyerr1.rc $6
However, the Notify Method was not called when we pulled out
the ether cable from the adapter 'ent1'. (ENT_ERR2 error
surely occured, because the Error Log catched the error for
'ent1' adapter.) But we blanked out the Resource Name 'ent1',
then the Notify Method was called successruly when the same
failure occured.
(The Notify Method contains only the statements to log message
into cluster.log file.)
We tried other Resource Names like ent0 and ent3, but the
result is the same as ent1 (Notify Method was never called.)
We also tried the Error Notification for SCSI adapter, and
the Notify Methof was successfully called with the Resource
Name (scsi1) defined as follows.
Notification Object Name : SCSI_ERR2_s1
Persistence across system restart? : Yes
Process ID for use by Notify Method : 0
Select Error Class :Hardware
Select Error Type :TEMP
Match ALERTable errors? :None
Select Error ID Label :SCSI_ERR2
Resource Name :scsi1
Resource Class :adapter
Resource Type :hscsi
Notify Method :/usr/sbin/cluster/diskerr1.rc $6
Could you clarify why the Notify Method was not called for
ethernet adapter failure when the Resource Name was set?
Did we missed any settings for ethernet adapter error notifi-
cation? Or, did we make any mistake?
2. At each time we set Error Notification from HACMP RAS menu,
the /tmp/HACMP_AAnnnnnn (where, AA seems to be the first 2
characters of Object name created, and nnnnnn is 6 digit
number) created.
What are these files for? Is there any option to delete
them automatically? Or, should we remove them? Is so,
is there any problem?
ANSWER
1. I was able to reproduce your problem here with an HACMP 2.1 cluster
running AIX 3.2.5. My system had the same behaviour as yours, meaning
the error notification method was executed if I left the resource
name blank, but was not executed when I filled in a resource name.
Even when you leave the resource name blank in the definition screen,
the resource name passed to the error notification method script in
the $6 argument is in the form that you used it, namely "ent#". This
looks like a defect in the AIX error notification facility, specific-
ally with ethernet resource errors. The same problem does not occur
with SCSI errors (as you described) or token-ring errors (as I found
in my testing). You are not doing anything wrong. Since you will want
to be kept informed of the resolution of this problem, your or your
customer should report this problem to the defect support channel.
In the meantime, as a workaround, since you have the resource name
correctly passed to your error notification method script, you can put
some logic in the script to test for a particular resource name (for
example, ent1) if you want to take action only for the failure of one
adapter.
2. These files that you are seeing in the /tmp directory are just
temporary files that are left by the /usr/sbin/cluster/clacdNM script
which sets up error notification methods for you. You may erase them
without any problem. The HACMP 1.2 version of the script does not
properly remove this temporary file when it has finished the setup of
the error notification method. This is fixed in the HACMP 2.1 version
of the script. You may report it as a defect against HACMP 1.2 if you
wish.
S e a r c h - k e y w o r d s:
HACMP ERROR NOTIFICATION ENT0 ENT1 METHOD LOG ETHERNET
WWQA: ITEM: RTA000072490 ITEM: RTA000072490
Dated: 01/1995 Category: ITSAIHA6000
This HTML file was generated 99/06/24~12:43:26
Comments or suggestions?
Contact us