The Event Management subsystem consists of the following components:
The sections that follow contain more details about each of these components.
The Event Manager daemon is contained in the executable file /usr/sbin/rsct/bin/haemd. This daemon runs on each node of a domain. If a node is a member of more than one domain, there is one executing copy of the daemon per domain.
Note that on the SP system, the domain of the Event Management subsystem is an SP system partition. The domain name is the same as the system partition name. The control workstation is regarded as node 0 in each system partition and, therefore, in each domain. Unless otherwise stated, a reference to the "Event Management subsystem" is a reference to the Event Management subsystem in a single domain.
EM clients communicate with a single Event Manager daemon in the domain that contains the resources in which they are interested, as follows:
|An EM client that is running on a node that is outside of the domain |(SP system partition) that contains the resources in which it is interested |uses a TCP socket to communicate with the Event Manager daemon on the control |workstation that is in that domain. An EM client that is running on a |pSeries or RS/6000 server not connected to the SP Ethernet admin LAN is |considered to be on a node outside of any domain. In both of these |cases, the Event Management client must specify, explicitly or implicitly, the |name of a domain that contains the resources of interest. For an Event |Management client to run on any pSeries or RS/6000 server, the RSCT file sets |...clients.rte and ...clients.sp and |the PSSP file set ssp.clients must be installed.
Once an EM client has established communications with an Event Manager daemon, the EM client has access to all of the resources in that domain.
Resource monitors communicate with the Event Manager daemon, through the RMAPI, using Unix domain sockets. Resource monitors are always on the same node as the Event Manager daemon. Note that if a node is a member of multiple domains, just as there is one executing copy of the daemon per domain, there is also one executing copy of each resource monitor per domain.
The Event Management Application Programming Interface (EMAPI) is a shared library that an EM client uses to obtain the services of the Event Management subsystem. This shared library is supplied in two versions: one for non-thread safe programs and one for thread-safe programs. These libraries are referenced by the following path names:
These path names are actually symbolic links to /usr/sbin/rsct/lib/libha_em.a and /usr/sbin/rsct/lib/libha_em_r.a, respectively. The symbolic links are placed in /usr/lib for ease of use. For serviceability, the actual libraries are placed in the /usr/sbin/rsct/lib directory. These libraries are supplied as shared libraries, also for serviceability.
For details on the EMAPI, see Event Management Programming Guide and Reference.
The Resource Monitor Application Programming Interface (RMAPI) is a shared library that a resource monitor uses to supply resource variables to the Event Manager daemon. This shared library is supplied only in a non-thread safe version. This library is referenced through the path name
which is a symbolic link to /usr/sbin/rsct/lib/libha_rr.a. The symbolic link is placed in /usr/lib for ease of use. For serviceability, the actual library is placed in the /usr/sbin/rsct/lib directory. This library is supplied as a shared library, also for serviceability.
For details on the RMAPI, see Event Management Programming Guide and Reference.
A resource monitor conforms to one of the following programming models:
Note that a server type resource monitor may have multiple executing copies or instances. Each resource monitor instance may supply different instances of the same named resource variable or may supply different resource variables.
In addition, the Event Manager daemon itself performs some resource monitoring function. This function is considered to be a resource monitor with a connection type of internal.
The RSCT software supplies the following resource monitors:
There are also several internal resource monitors incorporated in the Event Manager daemon itself:
the Host Membership and Adapter Membership states. The Event Manager daemon obtains this information directly from the Group Services subsystem by subscribing
to the HostMembership, enMembership, and cssMembership system groups.
the information in the host_responds and switch_responds SDR classes. These resource variables are provided for compatibility with earlier releases of PSSP.
The Event Management subsystem uses several types of communications:
For communication between Event Manager daemons
within a domain, the Event Management subsystem uses a single UDP port number. This port number is recorded in the Syspar_ports SDR class. The class contains two attributes: subsystem and port. The class object whose subsystem attribute has a value of haem contains the port number for the Event Management subsystem. Note that the Syspar_ports class is a partitioned SDR class, that is, there is a set of objects for each system partition. When you configure the Event Management subsystem for a particular domain, you use the SDR data that corresponds to that domain, that is, system partition.
The Event Management port number is stored in the SDR so that, when the Event Management subsystem is configured on each node, the port number is fetched from the SDR. This ensures that the same port number is used by all Event Manager daemons in the same domain. Note that because an Event Manager daemon from each domain runs on the control workstation, the Event Management subsystem in each domain must have a unique port number.
This intra-domain port number is also set in the /etc/services file, using the service name haem.domain_name, where domain_name is the name of the domain. The /etc/services file is updated on all nodes in the domain and on the control workstation. The Event Manager daemon obtains the port number from the /etc/services file during initialization.
The Event Management subsystem also uses a single TCP port number
to accept communications from clients outside its domain. All Event Manager daemons that run on the control workstation use this port number to accept connections from EM clients outside their respective domains. Each daemon binds to a distinct IP address that is the IP address of its system partition.
The TCP port number is stored in the SP_ports SDR system class. The class contains the daemon, hostname, and port attributes. The class object whose daemon attribute has a value of haemd contains the port number for the Event Management subsystem(s). The hostname attribute is not used in this object.
This remote port number is also set in the /etc/services file on the control workstation, using the service name haemd.
Unix domain sockets are used for communication:
These are connection-oriented sockets. The following socket names are used (domain_name is the name of the domain):
Configuration information for the Event Management subsystem is stored in the SDR as objects in the following classes:
For details about the Event Management SDR classes and attributes, see Event Management Programming Guide and Reference.
These SDR classes are partitioned classes. Therefore, each Event Management subsystem domain has its own configuration data. This permits a different set of resources to be monitored in each domain, as appropriate for the workload within the domain.
The Event Management SDR objects are loaded from the default configuration data when the Event Management subsystem is added to a domain using the control script.
There are several attributes that you, as the system administrator, may wish to change. They are:
Change this attribute to add or replace a default expression for a resource variable.
Change this attribute to increase or decrease the time between observations of a resource variable value by the Event Manager daemon.
For more information, see Changing configuration data in the SDR.
If resource monitors supplied by other IBM products or by third parties are installed on the SP system, you must load the configuration data supplied with the resource monitors into the SDR. For more information, see Loading non-PSSP configuration data into the SDR.
The Event Management configuration utility is contained in the executable file /usr/sbin/rsct/bin/haemcfg. This utility is normally run on the control workstation in the environment of a system partition, that is, the value of the SP_NAME environment variable determines the partitioned SDR classes that are referenced. If SP_NAME is not set, the default system partition is assumed. Recall that a domain is equivalent to a system partition.
The haemcfg command converts the data stored in the Event Management SDR classes to a binary format called the Event Management Configuration Database (EMCDB). The EMCDB is placed in the staging file called /spdata/sys1/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain in which the utility is run.
A version string that uniquely identifies the EMCDB is placed both in the EMCDB itself and in the Syspar partitioned SDR class. The version string is set as the value of the haem_cdb_version attribute in the object whose syspar_name attribute value matches the current system partition. The version string is used to ensure that all of the Event Manager daemons and resource monitors that are running within the domain are using the same copy of the EMCDB.
If a previous copy of the em.domain_name.cdb file exists, it is renamed to /spdata/sys1/ha/cfg/em.domain_name.cdb.ssss,nnnn,v, where ssss,nnnn,v is the version string of the previous copy.
The haemcfg command is automatically invoked by the control script when the Event Management subsystem is added to a domain.
For details on the haemcfg command, see PSSP Command and Technical Reference.
The Event Management Configuration Database (EMCDB) is produced by the haemcfg command from the information in the SDR. The format of the EMCDB is designed to permit quick loading of the database by the Event Manager daemon and the RMAPI. It also contains configuration data in an optimized format to minimize the amount of data that must be sent between Event Manager daemons and between an Event Manager daemon and its resource monitors.
When the SDR data is compiled, the EMCDB is placed in a staging file. When the Event Manager daemon on a node or the control workstation initializes, it automatically copies the EMCDB from the staging file to a run-time file on the node or the control workstation. The run-time file is called /etc/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain.
For more information, see Reading the EMCDB.
The Event Management control script is contained in the executable file /usr/sbin/rsct/bin/haemctrl. This script is normally invoked by the syspar_ctrl script, which provides an interface to all of the system partition-sensitive subsystems.
If necessary, you can invoke haemctrl directly from the command line. Note that before you invoke haemctrl, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.
For more information about haemctrl and syspar_ctrl, see PSSP Command and Technical Reference.
The purpose of the haemctrl command is to add (configure) the Event Management subsystem to a domain. It can also be used to remove the subsystem from a domain, start the subsystem, stop it, and clean the subsystem from all domains.
For more information, see Configuring Event Management.
The configuration load utility is contained in the executable file /usr/sbin/rsct/install/bin/haemloadcfg. The purpose of this program is to load the default configuration data into the SDR. Normally, it is invoked by the haemctrl script, but you can also use it to load additional configuration data into the SDR. Note that before you invoke haemloadcfg, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.
By default, this utility does not replace existing objects in the SDR. Input data is matched with existing objects based on key attributes. The key attributes for each of the Event Management SDR classes are:
Note that the way in which the haemloadcfg command handles
existing SDR objects is different from the way in which the SDRCreateObjects command handles them. The SDRCreateObjects command creates a new object as long as the attributes, taken as a group, are unique.
The haemloadcfg command is actually a shell script
that invokes the /usr/sbin/rsct/bin/loadsdr program. The purpose of this program is to batch load any number of objects into SDR classes. However, the loadsdr command cannot load arbitrary SDR classes. It can load only the Event Management SDR classes.
In this release of Event Management subsystem, the EM_Resource_ID SDR class replaces the EM_Instance_Vector class from prior releases. In the EM_Resource_Variable SDR class, the rvExpression and rvIndex_element attributes replace the rvPredicate and rvIndex_vector attributes, respectively. The loadsdr program automatically migrates SDR data that may be present from prior releases to the new forms.
The default Event Management configuration data for PSSP is supplied in the /usr/sbin/rsct/install/config/haemloadlist file. The data in this file is loaded into each system partition to which the Event Management subsystem is added by the haemctrl script, using the haemloadcfg command.
The Event Management subsystem uses the following directories:
In the /var/ha/lck/haem directory, the em.RMrmname.domain_name file is used to manage one or more running instances of a resource monitor. In this file name, rmname is the name of the resource monitor and domain_name is the name of the domain.
The em.RMrmname.rminstSHM.domain_name file in the /var/ha/lck directory is used by the Event Management daemon to manage a shared memory segment used by the daemon and resource monitor instance specified by rmname and rminst where rmname is the resource monitor name and rminst is the resource monitor instance number. The file contains the shared memory segment ID.
The em.haemd.domain_name file in the /var/ha/lck directory is used to ensure a single running instance of an Event Manager daemon in a domain.
The em.RMrmnameSPMI file in the /var/ha/lck directory is used by the RMAPI to create a shared memory key when the resource monitor attaches to the shared memory segment managed by the SPMI library.
In the /var/ha/log directory, the em.trace.domain_name file contains trace output from the Event Manager daemon. In the file name, domain_name is the name of the domain.
The em.msgtrace.domain_name file contains message trace output from the Event Manager daemon.
The em.default.domain_name file contains any error messages from the Event Manager daemon that cannot be written to the AIX Error Log. Normally, all daemon error messages are written to the AIX Error Log.
In addition to error messages that cannot be written to the AIX Error Log, the /var/ha/log/em.default.domain_name file also contains error messages that result from repetitive operational errors. Therefore, both the AIX Error Log and the /var/ha/log/em.default.domain_name file must be examined when performing problem determination on the Event Management subsystem.
The size of the /var/ha/log/em.default.domain_name file is examined every two minutes. If the size of the file exceeds 256K, the file is renamed to /var/ha/log/em.default.domain_name.last and a new default file is created. No more than two copies of this file are kept: the "current" em.default.domain_name file and the "last" file.
The /var/ha/log/em.default.domain_name.n file is used to record additional error information if the Event Manager daemon cannot start a resource monitor. The error information includes the name of the resource monitor that could not be started.
In the /var/ha/run directory, a directory called haem.domain_name is created, where domain_name is the domain name. This directory is the current working directory for the Event Manager daemon. If the Event Manager daemon abnormally terminates, the core dump file is placed in this directory. Whenever the Event Manager daemon starts, it renames any core file to core.last.
If the Event Manager daemon detects an error in the shared memory segment used by the daemon and a resource monitor instance, it creates a dump file containing the first 4096 bytes of the shared memory segment in this directory. The dump file is named rzdump.RMrmname.rminst.time where rmname is the resource monitor name, rminst is the resource monitor instance, and time is the time stamp.
This directory also contains the working directories of any resource monitors that are started by the Event Manager daemon. Each directory has the name of its resource monitor.
Finally, this directory contains the Rcache_local and Rcache_remote directories. These directories contain the registration cache for local and remote client registration requests, respectively.
For each EM client that establishes communications with the Event Manager daemon, a cache subdirectory is created in the appropriate registration cache directory. This subdirectory has a name of the form p,s,n,q,i, where:
Within each subdirectory are several files with numeric names. Each file contains a registration request. The file name is the event command group ID of the group of events within the request.
The /etc/ha/cfg directory contains the run-time EMCDB file for each system partition. The file is called /etc/ha/cfg/em.domain_name.cdb, where domain_name is the name of the domain.
In addition, there is a file called em.domain_name.cdb_vers, where domain_name is the name of the domain. This file is created by the Event Manager daemon and contains the version string of the EMCDB file used by the daemon. This file is also used by the RMAPI to ensure that it is using the same EMCDB as the Event Manager daemon.