This chapter provides information about monitoring exceptions with the exmon program.
The Exception Monitoring program, exmon, works with the filtd daemon described in "Data Reduction and Alarms with filtd" . The filtd daemon can generate exception packets based upon alarm conditions defined in the filtd configuration file. The xmservd daemon, on the host where filtd runs, forwards the exception packets from filtd to any remote Data Consumer program that subscribes to exception packets.
The exmon program is designed to provide a convenient facility for monitoring exceptions as they are detected on remote hosts. It does so by allowing its user to register subscriptions for exception packets from all or selected hosts in the network and to monitor the exception status in a graphical window. Like any other remote Data-Consumer program, exmon uses the Remote Statistics Interface (RSi) to communicate with remote hosts. Unlike traditional Data-Consumer programs, however, exmon consumes exception packets rather than data feed packets containing metrics values.
The exmon program is started from the command line. It does not accept command line arguments. When started, its first action is to broadcast an invitation to all xmservd daemons in the network, according to the file $HOME/Rsi.hosts . For details about this file, refer to "How Data-Suppliers are Identified" . From the responses to the invitation, a list of host names is displayed for the user to select the hosts of interest. After the user selects the hosts, the exmon main window is displayed.
Note: With the Local Performance Toolbox option, available with Version 2.2 or later, only the local host will be used. Any discussion of remote hosts apply only to the local host if the Local Performance Toolbox is installed.
At the top of the main window is a menu bar. Below the menu bar is a graphical window that contains the monitoring window.
The layout of the exmon monitoring window is that of a matrix with eleven column headings and a variable number of rows, each with an identifier. Row identifiers are host names that are displayed in a column along the left side of the matrix. Only hosts that have reported exceptions are shown. Initially, you see no host names.
The time stamp carried by the last received exception for each host is displayed beside the host name. Along the top of the matrix the 11 column headings each represent one of the 11 possible "exception severity codes" that can be associated with an exception when it is defined to filtd. Even though the filtd daemon refers to the codes as describing severity, in reality they are no more than identifiers. Because of this, the exmon program refers to the code of an exception as the exception identifier (or exception ID).
As exception packets arrive from remote systems, the matrix starts to become populated. When an exception arrives, the first action is to determine the host name of the host that generated the exception. If the host name is not already displayed in the matrix, the existing rows are moved down and the new host name line is added as the top row. If the host name is already displayed, it is moved to the top of the matrix, pushing all others down.
The matrix cell corresponding to the host that generated an exception (now at the top of the list) and the exception ID of the exception is determined. The cell contains the count of exceptions with this exception ID from the named host. The count is incremented by one to always contain the total number of such exceptions received in the lifetime of this execution of exmon.
Matrix cells are assigned colors depending on a coloring scheme as specified in the exmon X resource file and the exception count in each matrix cell. This is described in "Coloring Scheme" .
The menu bar in the main window defines the following pulldown menus:
Note: With the Performance Toolbox Local feature, available with Version 2.2 or later, only the local host is available for selection.
As exmon receives exception packets from hosts, it adds a line to the host exception log files for those hosts. An exception log is created for each host that sent exception packets to exmon.
Exception logs are named after the host that sent the exception packet and have an extension of ".log " . Exception logs are kept in the directory $HOME/FiltLogs for each exmon user. For example, the exception log file of user donnau that contains exceptions from host snook would be /home/donnau/FiltLogs/snook.log .
If exmon is active more than once for a user at the same time, each active exmon will add exceptions to the log files of hosts as exceptions are received. This causes the log files to contain multiple copies of exception packets. To avoid this, never run multiple copies of exmon or make sure each copy is monitoring different hosts.
A user selects the Read Log entry from the File pulldown menu of the main window to view an exception log file. A file selection window is displayed with a list of all the exception log files available. The exception log to view is selected from the list in the selection window. After selection of the exception log file, the Show File window is displayed. This window displays the entries in the exception log file in five columns:
The exception log entries are shown in a scrollable list. One or more lines in the list can be selected. When a line has been selected it can either be marked as read, or for deletion. This is done from the menu bar at the top of the Show File window. This menu bar has two pulldown menus:
A user selects the Delete Log entry from the File pulldown menu of the main window to delete an exception log file. A file selection window is displayed with a list of all the exception log files available. The exception log to delete is selected from the list in the selection window.
After selection of the exception log, the user must click on the OK button to delete the exception log file and refresh the selection window. To close the file selection window, click on the Cancel button.
Selection of hosts to monitor for exceptions is done from a selection window. This window has a list of hosts to choose from and, at the top, a button with the text Click here when selection complete. Similar looking selection lists are used to add hosts to the ones already being monitored and to remove hosts that are no longer to be monitored.
Note: With the Performance Toolbox Local feature, available with Version 2.2 or later, only the local host is available for selection.
In either case, select the hosts from the displayed list by moving the mouse pointer to the first host you want, then press mouse button 1 and move the mouse while holding the button down. When all hosts you want are selected, release the mouse button. If you want to select hosts that are not adjacent in the list, press and hold the Ctrl key on the keyboard while you select. When all hosts are selected, release the key. After all selections have been made, click with mouse button 1 on the button at the top of the window.
The host selection window is first displayed when you start exmon. From this first window you select the initial list of hosts to monitor. When you later want to add or delete hosts, use the Hosts pulldown menu in the main window and select Add Hosts or Delete Hosts as required.
The user is presented with a selection list of all hosts on the network that have responded to invitations from exmon. If a host you want to monitor does not show up in the initial selection window, it might be because the host is down, because its xmservd daemon can not be activated, or because its xmservd daemon doesn't respond fast enough. In either case, the Add Hosts selection window can be used to refresh the list of hosts and, potentially, make it possible to select the host later.
Note: With the Performance Toolbox Local feature, available with Version 2.2 or later, only the local host is available for selection.
If for some reason you don't want to continue the monitoring of one or more hosts, you can stop monitoring them by selecting them from the Delete Hosts window. This window shows a list of the hosts that are currently being monitored by exmon. When you select a host from the list and click on the Click here when selection complete button, no more exceptions are received from that host. The selected host also is deleted from the matrix in the main window and its exception log files are closed but retained.
Every five minutes, exmon attempts to reconnect to any hosts that has dropped out for one reason or another.
When exmon generates the list of hostnames available for monitoring, it uses the RSi API to solicit for host responses on the network. Every response to this solicitation carries the uname (simple, unqualified hostname) of the responding host and the IP address of the host. Because the hosts respond by sending their uname, identical hostnames will be returned for each network adapter that received the solicitation for hosts with multiple adapters. This requires special action in exmon to prevent exceptions from a host with multiple adapters from being counted multiple times.
In addition to the situation where hosts have multiple adapters, the uname usage could also create confusion when two or more hosts have the same uname, which could be because the hosts exist in different IP domains or because a host had its simple hostname changed but not its uname.
The exmon program provides ways to handle these conflicts. The method used depends on the implementation of nameservice in your network. The term "nameservice" means the mechanism that is used to find the IP address or hostname in your network, this can be done with the host command. The nameservice is normally provided by a domain name server, by Network Information Services (NIS, formerly Yellow Pages), by the /etc/hosts file, or by a combination of these.
The following example shows how different situations are handled. They are based on a network with the following hosts, some of which have multiple network adapters:
Host uname | IP address | Hostname from nameservice |
banana | 9.11.22.33 | banana.west.com |
banana | 130.4.5.6 | undefined |
banana | 192.10.10.10 | banana.west.com |
banana | 7.7.7.7 | banana.east.com |
orange | 9.11.22.44 | lemon.west.com |
orange | 9.11.22.55 | orange.west.com |
lime | 7.8.8.8 | lime.east.com |
lime | 9.11.22.66 | lime.west.com |
Assuming all these hosts and all their network interfaces receive and respond to solicitation, the list of hosts presented by exmon will look like this:
130.4.5.6 (banana) 130.4.5.6 banana.east.com 7.7.7.7 banana.west.com 9.11.22.33 lemon.west.com 9.11.22.44 lime.east.com 7.8.8.8 lime.west.com 9.11.22.66 orange.west.com 9.11.22.55
For the host with uname banana, the adapter with IP address 192.10.10.10 does not show up. That is because the hostname is exactly the same as for the adapter with IP address 9.11.22.33 . This generally means that there are two network adapters on the same machine. The IP address chosen is what the nameservice returns when you do a name lookup on the hostname. The adapter whose IP address was not defined with the nameservice shows up with its IP address followed by the uname of the host. Finally, the adapter that has a different host/domain name assigned to it shows up on a line by itself.
Because all of banana's adapters received the solicitation and responded to it, exception packets will be sent once on each interface. Consequently, exmon will receive four exception packets for each exception that occurs on banana. The one received from IP address 192.10.10.10 is discarded because exmon was able to determine that the host was the same as the host with IP address 9.11.22.33 . All other exception packets will be processed by exmon provided they have elected to monitor the host entry.
In all the other cases, exmon will detect that the seemingly identical hosts (judging from their unames) are indeed entities that should be treated as separate. In general, unless there are compelling reasons to assign different hostsnames to multiple interfaces in one host, it is an advantage to explicitly assign the same hostname to all interfaces. Whether you select different hostnames for each adapter or not, you should register each IP address with the nameserver.
As an exception monitoring program, exmon has the function to alert its user if some exception is reported from one of the monitored hosts. When this happens, the user might want to look into the causes of the exception.
To facilitate this, the exmon program gives the user the ability to execute commands for any of the hosts that reported exceptions. Commands are defined in a configuration file, exmon.cf as explained below. To execute a command for a host, move the pointer to one of the host names displayed in the exmon monitoring window and click the left button. This causes a dialog box to pop open and display the commands the user can execute for that host. To execute a command, the user selects one of the lines and then clicks the OK button.
The list of commands that are displayed in the dialog box is kept in the user definable exmon.cf file. The exmon program first looks for this file in the user's home directory. If the file is not found there, it is looked for in the /etc/perf/ directory. If the file is not there, then the /usr/lpp/perfmgr directory is searched. If the file can not be located in any of the direcories, the program will be missing important information and may terminate or provide reduced function. A sample configuration file is provided in the /usr/lpp/perfmgr directory.
The configuration file may contain comment lines that begin with the character # (number sign). All other lines are assumed to define commands and have the general format:
command_name:command_line
The configuration file may contain a maximum of 50 commands. Each command line should end with an ampersand (&) character. This executes the command in the background. If the command is not executed in the background, then exmon execution is suspended until the command has terminated.
The following is an example of an exmon configuration file:
#Sample Exmon Config File #Display processes for host with 3dmon 3dmon Processes:3dmon -h%s -cProc & #chmon for selected host chmon:aixterm -n chmon -T chmon -e ksh -c "(sleep 1; chmon %s)" & #Start xmperf for host xmperf:xmperf -h%s & #xmpeek statistics xmpeek:aixterm -n xmpeek -T xmpeek -e ksh -c "(xmpeek %s; read)" &
The X Windows resource file for exmon defines resources you can use to enhance the appearance and behavior of exmon and is installed as /usr/lib/X11/app-defaults/EXmon . The following behavior can be modified with the exmon resource file.
The user decides whether a separate color should be assigned to the cells belonging to each of the exception IDs, or whether the cells of all exception IDs should follow a common coloring scheme that assigns colors depending on the number of exceptions received in each of the cells. To select the first coloring scheme, set the X resource RangeDisplay to false, otherwise set it to true. If the resource is set to true, the coloring scheme in the section entitled "Value Ranges" is in effect; otherwise the coloring scheme described in "Exception Colors" is used.
Each exception identifier has a different color associated with it. Colors are defined through the X resources ValueColor0 through ValueColor10.
The user can define the exception monitor to change colors when the number of exceptions for an individual exception ID is within a certain range. The definition of color ranges applies across all exception IDs. Two limits between value ranges are defined with the X resources ValueRange1 and ValueRange2 to create a total of three ranges. The X resources RangeColor1 through RangeColor3 are used to define the colors to use when the exception count in a grid cell falls within a range. To illustrate, consider this example where a user defines the following resources:
*RangeDisplay: true *ValueRange1: 5 *ValueRange2: 10 *RangeColor1: green *RangeColor2: yellow *RangeColor3: red
With this resource setting, if the number of exceptions for a particular exception ID is less than or equal to 5, then the color displayed is green. When the number of exceptions is within the range 6-10, then the color displayed is yellow. When the number of exceptions has increased beyond 10, then the color displayed is red.
The filtd daemon allows you to define severity codes for each alarm. These codes are sent as part of any generated exception packets. Obviously, these codes and exceptions can be defined to represent anything you want them to represent. By setting the ExceptionText0 through ExceptionText10 resources, you can define the column headings displayed in the exmon main window for the exception IDs. A maximum of 7 characters is displayed for each column heading.