IBM Books

Administration Guide


Event Management components

The Event Management subsystem consists of the following components:

Event Manager daemon
The central component of the Event Management subsystem.

Event Management API (EMAPI)
The application programming interface that EM clients use to obtain the services of the Event Management subsystem.

Resource Monitor API (RMAPI)
The application programming interface that resource monitors use to supply resource variables to the Event Manager daemon.

Resource monitors
Programs that monitor the state of system resources. Several resource monitors are shipped with the RSCT software.

Port numbers
TCP/IP port numbers that the Event Management subsystem uses for communications. The Event Management subsystem also uses Unix domain sockets.

Configuration data
Information that defines resource variables, resource monitors, and related data to the Event Management subsystem. This information is stored in the SDR.

Configuration utility
A program that translates the Event Management configuration data that is stored in the SDR to a binary format that is used by the Event Manager daemon and the RMAPI.

Configuration database
The Event Management Configuration Database (EMCDB) is the output of the configuration utility.

Control script
A shell script that is used to add, start, stop, and delete the Event Management subsystem, which operates under System Resource Controller (SRC) control.

Configuration load utility
A program that loads the default configuration data, that is shipped with the RSCT software, into the SDR.

Default configuration data
A file that is shipped with the RSCT software that contains the default configuration data for the Event Management subsystem.

Files and directories
Various files and directories that are used by the Event Management subsystem to maintain run-time data.

The sections that follow contain more details about each of these components.

The Event Manager daemon (haemd)

The Event Manager daemon is contained in the executable file /usr/sbin/rsct/bin/haemd. This daemon runs on each node of a domain. If a node is a member of more than one domain, there is one executing copy of the daemon per domain.

Note that on the SP system, the domain of the Event Management subsystem is an SP system partition. The domain name is the same as the system partition name. The control workstation is regarded as node 0 in each system partition and, therefore, in each domain. Unless otherwise stated, a reference to the "Event Management subsystem" is a reference to the Event Management subsystem in a single domain.

EM clients communicate with a single Event Manager daemon in the domain that contains the resources in which they are interested, as follows:

|An EM client that is running on a node that is outside of the domain |(SP system partition) that contains the resources in which it is interested |uses a TCP socket to communicate with the Event Manager daemon on the control |workstation that is in that domain. An EM client that is running on a |pSeries or RS/6000 server not connected to the SP Ethernet admin LAN is |considered to be on a node outside of any domain. In both of these |cases, the Event Management client must specify, explicitly or implicitly, the |name of a domain that contains the resources of interest. For an Event |Management client to run on any pSeries or RS/6000 server, the RSCT file sets |...clients.rte and ...clients.sp and |the PSSP file set ssp.clients must be installed.

Once an EM client has established communications with an Event Manager daemon, the EM client has access to all of the resources in that domain.

Resource monitors communicate with the Event Manager daemon, through the RMAPI, using Unix domain sockets. Resource monitors are always on the same node as the Event Manager daemon. Note that if a node is a member of multiple domains, just as there is one executing copy of the daemon per domain, there is also one executing copy of each resource monitor per domain.

The Event Management API (EMAPI)

The Event Management Application Programming Interface (EMAPI) is a shared library that an EM client uses to obtain the services of the Event Management subsystem. This shared library is supplied in two versions: one for non-thread safe programs and one for thread-safe programs. These libraries are referenced by the following path names:

These path names are actually symbolic links to /usr/sbin/rsct/lib/libha_em.a and /usr/sbin/rsct/lib/libha_em_r.a, respectively. The symbolic links are placed in /usr/lib for ease of use. For serviceability, the actual libraries are placed in the /usr/sbin/rsct/lib directory. These libraries are supplied as shared libraries, also for serviceability.

For details on the EMAPI, see Event Management Programming Guide and Reference.

The Resource Monitor API (RMAPI)

The Resource Monitor Application Programming Interface (RMAPI) is a shared library that a resource monitor uses to supply resource variables to the Event Manager daemon. This shared library is supplied only in a non-thread safe version. This library is referenced through the path name

which is a symbolic link to /usr/sbin/rsct/lib/libha_rr.a. The symbolic link is placed in /usr/lib for ease of use. For serviceability, the actual library is placed in the /usr/sbin/rsct/lib directory. This library is supplied as a shared library, also for serviceability.

For details on the RMAPI, see Event Management Programming Guide and Reference.

Resource monitors

A resource monitor conforms to one of the following programming models:

Note that a server type resource monitor may have multiple executing copies or instances. Each resource monitor instance may supply different instances of the same named resource variable or may supply different resource variables.

In addition, the Event Manager daemon itself performs some resource monitoring function. This function is considered to be a resource monitor with a connection type of internal.

The RSCT software supplies the following resource monitors:

IBM.PSSP.harmld
Supplies resource variables for the CSS, VSD, and LoadLeveler subsystems. This is a daemon with a connection type of server.

IBM.PSSP.harmpd
Supplies resource variables that represent the number of processes executing a particular program. These variables can be used to determine whether a particular system daemon is running. This is a daemon with a connection type of server.

IBM.PSSP.hmrmd
Supplies resource variables that represent the hardware state of the SP system. The resource information is obtained from the PSSP hardware monitoring subsystem (hardmon). This is a daemon with a connection type of server.

IBM.PSSP.pmanrmd
Supplies resource variables provided by the PSSP Problem Management subsystem. This is a command-based resource monitor with a connection type of client.

aixos
Supplies resource variables that represent AIX operating system resources. This is a daemon with a connection type of server.

IBM.PSSP.CSSLogMon
Supplies a resource variable that represents the state of CSS error log entries. This is a command-based resource monitor with a connection type of client.

IBM.PSSP.SDR
Supplies a resource variable that represents the modification state of SDR classes. This is a command-based resource monitor with a connection type of client.

There are also several internal resource monitors incorporated in the Event Manager daemon itself:

Membership
Supplies resource variables that represent

the Host Membership and Adapter Membership states. The Event Manager daemon obtains this information directly from the Group Services subsystem by subscribing

to the HostMembership, enMembership, and cssMembership system groups.

Response
Supplies resource variables that represent

the information in the host_responds and switch_responds SDR classes. These resource variables are provided for compatibility with earlier releases of PSSP.

Port numbers and sockets

The Event Management subsystem uses several types of communications:

Intra-domain port numbers

For communication between Event Manager daemons

within a domain, the Event Management subsystem uses a single UDP port number. This port number is recorded in the Syspar_ports SDR class. The class contains two attributes: subsystem and port. The class object whose subsystem attribute has a value of haem contains the port number for the Event Management subsystem. Note that the Syspar_ports class is a partitioned SDR class, that is, there is a set of objects for each system partition. When you configure the Event Management subsystem for a particular domain, you use the SDR data that corresponds to that domain, that is, system partition.

The Event Management port number is stored in the SDR so that, when the Event Management subsystem is configured on each node, the port number is fetched from the SDR. This ensures that the same port number is used by all Event Manager daemons in the same domain. Note that because an Event Manager daemon from each domain runs on the control workstation, the Event Management subsystem in each domain must have a unique port number.

This intra-domain port number is also set in the /etc/services file, using the service name haem.domain_name, where domain_name is the name of the domain. The /etc/services file is updated on all nodes in the domain and on the control workstation. The Event Manager daemon obtains the port number from the /etc/services file during initialization.

Remote port numbers

The Event Management subsystem also uses a single TCP port number

to accept communications from clients outside its domain. All Event Manager daemons that run on the control workstation use this port number to accept connections from EM clients outside their respective domains. Each daemon binds to a distinct IP address that is the IP address of its system partition.

The TCP port number is stored in the SP_ports SDR system class. The class contains the daemon, hostname, and port attributes. The class object whose daemon attribute has a value of haemd contains the port number for the Event Management subsystem(s). The hostname attribute is not used in this object.

This remote port number is also set in the /etc/services file on the control workstation, using the service name haemd.

Unix domain sockets

Unix domain sockets are used for communication:

These are connection-oriented sockets. The following socket names are used (domain_name is the name of the domain):

/var/ha/soc/em.clsrv.domain_name
Used by the EMAPI to connect to the Event Manager daemon.

/var/ha/soc/em.rmsrv.domain_name
Used by resource monitors to connect to the Event Manager daemon.

/var/ha/soc/haem/em.RMrmname.rminst.domain_name
Used by the Event Management daemon to connect to the resource monitor that is specified by rmname and rminst, where rmname is the resource monitor name and .rminst is the resource monitor instance number. This resource monitor has a connection type of server.

Configuration data

Configuration information for the Event Management subsystem is stored in the SDR as objects in the following classes:

EM_Resource_Variable
Contains the definitions of all resource variables. The resources with which these variables are associated are implied in the definitions.

EM_Resource_ID
Contains the definitions of all of the resource IDs that are associated with the resources that are specified in the EM_Resource_Variable class.

EM_Structured_Byte_String
Contains the definitions of the structured fields for all resource variables that have a data type of structured byte string (SBS).

EM_Resource_Class
Contains the definitions for the resource classes with which each resource variable is associated.

EM_Resource_Monitor
Contains the definitions for all resource monitors.

For details about the Event Management SDR classes and attributes, see Event Management Programming Guide and Reference.

These SDR classes are partitioned classes. Therefore, each Event Management subsystem domain has its own configuration data. This permits a different set of resources to be monitored in each domain, as appropriate for the workload within the domain.

The Event Management SDR objects are loaded from the default configuration data when the Event Management subsystem is added to a domain using the control script.

There are several attributes that you, as the system administrator, may wish to change. They are:

For more information, see Changing configuration data in the SDR.

If resource monitors supplied by other IBM products or by third parties are installed on the SP system, you must load the configuration data supplied with the resource monitors into the SDR. For more information, see Loading non-PSSP configuration data into the SDR.

The configuration utility (haemcfg)

The Event Management configuration utility is contained in the executable file /usr/sbin/rsct/bin/haemcfg. This utility is normally run on the control workstation in the environment of a system partition, that is, the value of the SP_NAME environment variable determines the partitioned SDR classes that are referenced. If SP_NAME is not set, the default system partition is assumed. Recall that a domain is equivalent to a system partition.

The haemcfg command converts the data stored in the Event Management SDR classes to a binary format called the Event Management Configuration Database (EMCDB). The EMCDB is placed in the staging file called /spdata/sys1/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain in which the utility is run.

A version string that uniquely identifies the EMCDB is placed both in the EMCDB itself and in the Syspar partitioned SDR class. The version string is set as the value of the haem_cdb_version attribute in the object whose syspar_name attribute value matches the current system partition. The version string is used to ensure that all of the Event Manager daemons and resource monitors that are running within the domain are using the same copy of the EMCDB.

If a previous copy of the em.domain_name.cdb file exists, it is renamed to /spdata/sys1/ha/cfg/em.domain_name.cdb.ssss,nnnn,v, where ssss,nnnn,v is the version string of the previous copy.

The haemcfg command is automatically invoked by the control script when the Event Management subsystem is added to a domain.

For details on the haemcfg command, see PSSP Command and Technical Reference.

The configuration database (EMCDB)

The Event Management Configuration Database (EMCDB) is produced by the haemcfg command from the information in the SDR. The format of the EMCDB is designed to permit quick loading of the database by the Event Manager daemon and the RMAPI. It also contains configuration data in an optimized format to minimize the amount of data that must be sent between Event Manager daemons and between an Event Manager daemon and its resource monitors.

When the SDR data is compiled, the EMCDB is placed in a staging file. When the Event Manager daemon on a node or the control workstation initializes, it automatically copies the EMCDB from the staging file to a run-time file on the node or the control workstation. The run-time file is called /etc/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain.

For more information, see Reading the EMCDB.

The control script (haemctrl)

The Event Management control script is contained in the executable file /usr/sbin/rsct/bin/haemctrl. This script is normally invoked by the syspar_ctrl script, which provides an interface to all of the system partition-sensitive subsystems.

If necessary, you can invoke haemctrl directly from the command line. Note that before you invoke haemctrl, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.

For more information about haemctrl and syspar_ctrl, see PSSP Command and Technical Reference.

The purpose of the haemctrl command is to add (configure) the Event Management subsystem to a domain. It can also be used to remove the subsystem from a domain, start the subsystem, stop it, and clean the subsystem from all domains.

For more information, see Configuring Event Management.

The configuration load utility (haemloadcfg)

The configuration load utility is contained in the executable file /usr/sbin/rsct/install/bin/haemloadcfg. The purpose of this program is to load the default configuration data into the SDR. Normally, it is invoked by the haemctrl script, but you can also use it to load additional configuration data into the SDR. Note that before you invoke haemloadcfg, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.

By default, this utility does not replace existing objects in the SDR. Input data is matched with existing objects based on key attributes. The key attributes for each of the Event Management SDR classes are:

SDR Class
Key Attributes

EM_Resource_Variable
rvName

EM_Resource_ID
riResource_name, riElement_name

EM_Structured_Byte_String
sbsVariable_name, sbsField_name

EM_Resource_Class
rcClass

EM_Resource_Monitor
rmName

Note that the way in which the haemloadcfg command handles

existing SDR objects is different from the way in which the SDRCreateObjects command handles them. The SDRCreateObjects command creates a new object as long as the attributes, taken as a group, are unique.

The haemloadcfg command is actually a shell script

that invokes the /usr/sbin/rsct/bin/loadsdr program. The purpose of this program is to batch load any number of objects into SDR classes. However, the loadsdr command cannot load arbitrary SDR classes. It can load only the Event Management SDR classes.

In this release of Event Management subsystem, the EM_Resource_ID SDR class replaces the EM_Instance_Vector class from prior releases. In the EM_Resource_Variable SDR class, the rvExpression and rvIndex_element attributes replace the rvPredicate and rvIndex_vector attributes, respectively. The loadsdr program automatically migrates SDR data that may be present from prior releases to the new forms.

The default configuration data (haemloadlist)

The default Event Management configuration data for PSSP is supplied in the /usr/sbin/rsct/install/config/haemloadlist file. The data in this file is loaded into each system partition to which the Event Management subsystem is added by the haemctrl script, using the haemloadcfg command.

Files and directories

The Event Management subsystem uses the following directories:

The /var/ha/lck directory (lock files)

In the /var/ha/lck/haem directory, the em.RMrmname.domain_name file is used to manage one or more running instances of a resource monitor. In this file name, rmname is the name of the resource monitor and domain_name is the name of the domain.

The em.RMrmname.rminstSHM.domain_name file in the /var/ha/lck directory is used by the Event Management daemon to manage a shared memory segment used by the daemon and resource monitor instance specified by rmname and rminst where rmname is the resource monitor name and rminst is the resource monitor instance number. The file contains the shared memory segment ID.

The em.haemd.domain_name file in the /var/ha/lck directory is used to ensure a single running instance of an Event Manager daemon in a domain.

The em.RMrmnameSPMI file in the /var/ha/lck directory is used by the RMAPI to create a shared memory key when the resource monitor attaches to the shared memory segment managed by the SPMI library.

The /var/ha/log directory (log files)

In the /var/ha/log directory, the em.trace.domain_name file contains trace output from the Event Manager daemon. In the file name, domain_name is the name of the domain.

The em.msgtrace.domain_name file contains message trace output from the Event Manager daemon.

The em.default.domain_name file contains any error messages from the Event Manager daemon that cannot be written to the AIX Error Log. Normally, all daemon error messages are written to the AIX Error Log.

In addition to error messages that cannot be written to the AIX Error Log, the /var/ha/log/em.default.domain_name file also contains error messages that result from repetitive operational errors. Therefore, both the AIX Error Log and the /var/ha/log/em.default.domain_name file must be examined when performing problem determination on the Event Management subsystem.

The size of the /var/ha/log/em.default.domain_name file is examined every two minutes. If the size of the file exceeds 256K, the file is renamed to /var/ha/log/em.default.domain_name.last and a new default file is created. No more than two copies of this file are kept: the "current" em.default.domain_name file and the "last" file.

The /var/ha/log/em.default.domain_name.n file is used to record additional error information if the Event Manager daemon cannot start a resource monitor. The error information includes the name of the resource monitor that could not be started.

The /var/ha/run directory (daemon working files)

In the /var/ha/run directory, a directory called haem.domain_name is created, where domain_name is the domain name. This directory is the current working directory for the Event Manager daemon. If the Event Manager daemon abnormally terminates, the core dump file is placed in this directory. Whenever the Event Manager daemon starts, it renames any core file to core.last.

If the Event Manager daemon detects an error in the shared memory segment used by the daemon and a resource monitor instance, it creates a dump file containing the first 4096 bytes of the shared memory segment in this directory. The dump file is named rzdump.RMrmname.rminst.time where rmname is the resource monitor name, rminst is the resource monitor instance, and time is the time stamp.

This directory also contains the working directories of any resource monitors that are started by the Event Manager daemon. Each directory has the name of its resource monitor.

Finally, this directory contains the Rcache_local and Rcache_remote directories. These directories contain the registration cache for local and remote client registration requests, respectively.

For each EM client that establishes communications with the Event Manager daemon, a cache subdirectory is created in the appropriate registration cache directory. This subdirectory has a name of the form p,s,n,q,i, where:

Within each subdirectory are several files with numeric names. Each file contains a registration request. The file name is the event command group ID of the group of events within the request.

The /etc/ha/cfg directory (run-time emcdb and related files)

The /etc/ha/cfg directory contains the run-time EMCDB file for each system partition. The file is called /etc/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain.

In addition, there is a file called em.domain_name.cdb_vers, where domain_name is the name of the domain. This file is created by the Event Manager daemon and contains the version string of the EMCDB file used by the daemon. This file is also used by the RMAPI to ensure that it is using the same EMCDB as the Event Manager daemon.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]