ITEM: AS7712L

Understanding Netview alarms from HACMP...



Question:

model:r24
HACMP alarms 

Response:
   
ABSTRACT:  HACMP failure doesn't create all snmp mib entries

ENVIRONMENT:   AIX 3.2.5   HACMP 3.1
               Machine model: R24

Desc:
  Q1:  According to /usr/include/cluster/clsnmp.h, are the traps for
       2.0 release intended to be concatenated to the 1.0 release or
       are the traps for 2.0 release intended to replace the 1.0
       release.
  A1:  The traps for 2.0 release are intended to replace the traps
       for 1.0 release.

  Q2: Where are the integer values of the traps in Netview for the
      each of the events defined?

For example, here are two lines from a /usr/OV/log/trapd.log netview
log file for the "SubState" event id with integer values of 16 and 32:

816906333  3  Mon Nov 20 16:25:33 1995 tamp_ndi                  u
Trap: generic 6 specific 11 args (1):  [1]
risc6000clsmuxpd.cluster.clusterSubState.0 (Integer): 16

816906396  3  Mon Nov 20 16:26:36 1995 tamp_ndi                  u
Trap: generic 6 specific 11 args (1):  [1]
risc6000clsmuxpd.cluster.clusterSubState.0 (Integer): 32

I believe that the the "SubState" event ids are defined in the
/usr/include/cluster/clsnmpd.h file as follows:

 /*
  * SMUX Cluster (Sub)states
  */
 \#define SMUX_UNSTABLE   0x10
 \#define SMUX_STABLE     0x20
 \#define SMUX_ERROR      0x40

Where the decimal equivalent of the hex values given above are as
follows:
  Hex   Dec
  0x10  16
  0x20  32
  0x40  64

Here is another example for the "State" event ids:

 815771741  3  Tue Nov 07 13:15:41 1995 ATL_NDS                   u
Trap: generic 6 specific 10 args (1):  [1]
risc6000clsmuxpd.cluster.clusterState.0 (Integer): 2

 815773319  3  Tue Nov 07 13:41:59 1995 ATL_NDS_2                 u
Trap: generic 6 specific 10 args (1):  [1]
risc6000clsmuxpd.cluster.clusterState.0 (Integer): 4

 815773319  3  Tue Nov 07 13:41:59 1995 ATL_NDS                   u
Trap: generic 6 specific 10 args (1):  [1]
risc6000clsmuxpd.cluster.clusterState.0 (Integer): 8

I believe that the the "State" event ids are defined in the
/usr/include/cluster/clsnmpd.h file as follows:

 /*
  * SMUX Common States
  */
 \#define SMUX_INVALID    0x00
 \#define SMUX_VALID      0x01
 \#define SMUX_UP         0x02
 \#define SMUX_DOWN       0x04
 \#define SMUX_UNKNOWN    0x08

Where the decimal equivalent of the hex values given above are as
follows:
  Hex   Dec
  0x01  1
  0x02  2
  0x04  4
  0x08  8

One event id that I cannot find a definition for is the "Primary"
event id.  Here are three lines from a /usr/OV/log/trapd.log netview
log file for a "Primary" event id with integer values of -1, 1, and 2:

 815788132  3      Tue Nov 07 17:48:52  ATL_NDS_2                 ?
[1] risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): 1

 815788132  3      Tue Nov 07 17:48:52  ATL_NDS_2                 ?
[1] risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): -1

 815788134  3      Tue Nov 07 17:48:54  ATL_NDS_2                 ?
[1] risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): 2

I was unable to determine where the integer values of the "Primary"
event id is defined.
Hello,
    I am trying to confirm this with development, but I believe the
reason that there are no \#defines in clsnmp.h for the clusterPrimary
event is that they reflect the node id, with -1 indicating a
transitional state. The trapd.log file entries for
 risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): 1
would thus indicate that node 1 is becoming the primary node.
The subsequent:
 risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): -1
followed by:
 risc6000clsmuxpd.cluster.clusterPrimary.0 (Integer): 2
would indicate that the primary ins changing from node 1 to node 2.
Does that make sense, given the sequence of events that was happening
on your cluster at the time these log enrtries were generated?
If I find out anything different or any additional details from
development, I will make another append to this pmr. In the meantime,
I thought this might be useful info for you.

Hello again,
     I heard back from development and what I said in my previous
append is accurate - The integer is the id of the cluster node
that is the "primary" node, and it is equal to -1 when the cluster
is not yet up/stable. I should point out, however, that starting with
HACMP 3.1, which has a distributed lock manager, the notion of
a "primary" cluster manager is no longer relevant. There are still
"remannts: of this notion in hacmp, however, as you have found.
I am curious as to whether or not you are using this information
or just trying to understand the information being generated by
clsmuxpd that Netview is capturing.


Support Line: Understanding Netview alarms from HACMP... ITEM: AS7712L
Dated: March 1996 Category: N/A
This HTML file was generated 99/06/24~13:30:25
Comments or suggestions? Contact us