This chapter describes planning for monitoring your system, tracking system events, and using and modifying the predefined scripts, expressions, commands, and responses packaged with this application. These predefined elements and how to use them are described in detail in Components Provided for Monitoring.
First, select conditions to monitor that would have a severe impact on your system. These conditions might include:
When you have determined the resource problems you want to monitor, review the predefined conditions and identify the conditions you want to use. They can be displayed in the Monitoring application in the Conditions plug-in. See Getting Started with the Monitoring Application and Components Provided for Monitoring for more information. Use the lscondition command to view all conditions from the command line.
If a predefined condition deviates from your requirements in some way, you can edit it, use it as a template to create your own customized condition, or create your own condition.
Once you have selected conditions for monitoring, you need to plan one or more responses to be taken for the event and the optional rearm event.
A set of predefined responses comes installed with your system (see Predefined Responses). Each response has one or more actions associated with it. Each action can be activated or deactivated to fit your particular work environment and schedule.
The Responses plug-in's Response Properties dialog has an Add Action option where you can choose predefined actions for responding to an event or rearm event. You can also specify a command or a script to be run as an action.
The predefined actions are:
You can also write your own commands to correct or mitigate conditions using the Run program option.
You might specify different actions based on when the monitored condition occurs. For example, you could have one set of actions to respond to a condition during working hours and another set to respond to a condition on nights and weekends. To be notified of events when you are away from your terminal, your actions must include e-mail, broadcasting, or logging. You can also view events in the Events plug-in.
This section describes how to start using the Monitoring application. You can use the Web-based System Manager or the command line to do the following:
The following scenario demonstrates how to start monitoring your system using the Web-based System Manager user interface. Once you are familiar with the procedure, you can create customized conditions and responses, and take advantage of the more advanced Monitoring features.
To associate a response with a condition using the Web-based System Manager, do the following:
To associate a response with a condition from the command line, use the following commands (see the man pages or the AIX Commands Reference for detailed usage information):
To view events in the Web-based System Manager, select the Events plug-in. You can also view and sort events using the Audit Log.
To view events from the command line, use the lsaudrec command to view the audit log. You can use the notifyevents predefined script to log events to a file.
To stop monitoring from Web-based System Manager, select the condition and click the Stop Monitoring toolbar icon.
To stop monitoring from the command line, use the stopcondresp command.
The following scenarios demonstrate most frequently performed monitoring tasks from the command line interface. See the AIX Commands Reference at http://www.ibm.com/servers/aix/library or the command man pages for detailed usage information.
Name Monitoring Status "/tmp space used" "Not monitored" "var space used" "Monitored" (more conditions listed...)
Name "Critical notification" "Warning notification" "Informational notification" "Remove unwanted files" (more responses listed...)
Condition Response State "/tmp space used" "Broadcast event on-shift" Active "/tmp space used" "E-mail root anytime" Not Active
startcondresp "/tmp space used" "critical notification" "remove unwanted files"
stopcondresp "/tmp space used"
To stop monitoring the condition "/tmp space used" with a specific response, "critical notification," enter:
stopcondresp "/tmp space used" "critical notification"
mkcondition -c "/tmp space used" "my test condition"
lsaudrec
For a complete list of predefined commands, scripts, and utilities, see Predefined Commands, Scripts, Utilities, and Files.
For viewing information about monitoring events, rearm events, actions, and errors that have occurred, see the following table:
Available Information | How to Find It |
---|---|
The Monitoring application's Events plug-in lets you view a list of all the events, rearm events, and error events that have occurred during the current Web-based System Manager session. | To view events for your current session:
|
You can view a list of current events, which are the most recent events that are currently true for their respective monitored conditions. Note: In the current events view, you only see the latest event that is true for each monitored condition. | To view current events for your current session:
|
If you have specified a notification mechanism such as logging as an
action for an event, the log file will receive an entry each time that action
is taken.
Entries are logged during the entire period in which the condition is monitored, whether or not Web-based System Manager is running. You can browse the log file without running Web-based System Manager. | To view your log file use the alog command. |
The Web-based System Manager Session Log contains all Monitoring messages issued during the current Web-based System Manager session that did not require a user response. | To view the Session Log:
|
The audit log is a system-wide facility for recording information about
the operation of the system. It can include information about normal
operation as well as errors.
Logging activity occurs independent of Web-based System Manager and continues whether or not a session is active. | To view the Audit Log in Web-based System Manager,
To view the Audit Log from a command line, issue lsaudrec. |
Audit log records include the following:
The administrator can use the audit log to track activity that may not be visible otherwise because the activity is related to subsystems running in the background. The audit log is accessible from Web-based System Manager or the command line.
To list audit log records, use the Audit Log toolbar icon in the Events plug-in or the lsaudrec command. To remove records use the Audit Log toolbar icon in theEvents plug-in or the rmaudrec command. For details see the Monitoring application online help or the command man pages. Commands are also documented in the AIX Commands Reference at http://www.ibm.com/servers/aix/library.
You can write your own scripts to use as actions for responses. The AIX Commands Reference contains information about predefined scripts that are provided with the Event Response resource manager. The following scripts are provided: logevent, notifyevent, and wallevent (You can also use existing operating system commands and user-written scripts in the definition of an action.)
The logevent, notifyevent, and wallevent scripts are examples of the types of actions that system administrators can use to respond to events. The logevent script appends a formatted string containing the specifics of an event to a user-specified file. Only the latest 65536 bytes are kept in the file. When the file size reaches its maximum, the oldest logged event is overwritten by the newest event. The alog command is used to read the user-specified log file. The notifyevent script captures the event information and sends the event information via UNIX mail to a specified userid. The wallevent script broadcasts a message to all users who are logged in.
For a full description of these scripts, see the man pages or the AIX Commands Reference.
You can use these scripts as-is or treat them as templates by copying and modifying them to create new scripts that suit your needs. For example, to use the wallevent script as a template for a page event command, do the following:
For a command to run in response to an event or a rearm event defined by a condition, the command must be included as an action in an Event Response resource. When an Event Response resource is defined, specify the entire path name for a script that is used within an action.
This is set up implicitly for you when you use the Monitoring application as follows:
Test any scripts or commands that you have created or modified before you use them as actions in production.
Once the Event Response resource manager (ERRM) has subscribed to RMC to monitor a condition and that condition occurs, the ERRM executes commands in the user's operating system environment. The Event Response resource contains a list of commands to be executed. Before each command is run, the following environment variables are established for the command to use (see Event Response Resource Manager for a detailed description of the ERRM):
The following data types are represented with this environment variable as a decimal string: CT_INT32, CT_UINT32, CT_INT64, CT_UINT64, CT_FLOAT32, and CT_FLOAT64.
CT_CHAR_PTR is represented as a string for this environment variable.
CT_BINARY_PTR is represented as a hexadecimal string separated by spaces.
CT_SD_PTR is enclosed in square brackets and has individual entries within the SD that are separated by commas. Arrays within an SD are enclosed within braces {}. For example, ["My Resource Name",{1,5,7},{0,9000,20000},{7000,11000,25000}] See the definition of ERRM_SD_DATA_TYPES for an explanation of the data types that these values represent.
(See "Resource Handle" on page *** for a definition and an example of a resource handle.)
The information in this section is for advanced users who want to:
Permissible data types and operators are described and the order of precedence for the operators is included. RMC uses these functions to match a selection string against the persistent attributes of a resource and to implement the evaluation of an event expression or a rearm expression.
An expression is similar to a C language statement or the WHERE
clause of a SQL query. It is composed of variables, operators, and
constants. The C and SQL syntax styles may be intermixed within a
single expression. The following table relates the SQL terminology to
RMC terminology:
RMC | SQL |
---|---|
attribute name | column name |
select string | WHERE clause |
operators | predicates, logical connectives |
resource class | table |
For SQL syntax, the following restrictions apply:
The term variable is used in this context to mean the column name or
attribute name in an expression. Variables and constants in an
expression may be one of the following data types that are supported by the
RMC subsystem:
Symbolic Name | Description |
---|---|
CT_INT32 | Signed 32-bit integer |
CT_UINT32 | Unsigned 32-bit integer |
CT_INT64 | Signed 64-bit integer |
CT_UINT64 | Unsigned 64-bit integer |
CT_FLOAT32 | 32-bit floating point |
CT_FLOAT64 | 64-bit floating point |
CT_CHAR_PTR | Null-terminated string |
CT_BINARY_PTR | Binary data - arbitrary-length block of data |
CT_RSRC_HANDLE_PTR | Resource handle - an identifier for a resource that is unique over space and time (20 bytes) |
In addition to the base data types, aggregates of the base data types may be used as well. The first aggregate data type is similar to a structure in C in that it can contain multiple fields of different data types. This aggregate data type is referred to as structured data (SD). The individual fields in the structured data are referred to as structured data elements or simply elements. Each element of a structured data type may have a different data type, which can be one of the base types in the preceding table or any of the array types discussed in the next section, except for the structured data array.
The second aggregate data type is an array. An array contains zero or more values of the same data type, such as an array of CT_INT32 values. Each of the array types has an associated enumeration value (CT_INT32_ARRAY, CT_UINT32_ARRAY). Structured data may also be defined as an array but is restricted to have the same elements in every entry of the array.
Literal values can be specified for each of the base data types as follows:
Entries of an array can be accessed by specifying a subscript as in the C programming language. The index corresponding to the first element of the array is always zero; for example, List [2] references the third element of the array named List. Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. For example, if List is an integer array, then List[2]+4 produces the sum of 4 and the current value of the third entry of the array.
"0xabcd 0x01020304050607090a0b0c0d0e0f1011121314"
"0x4018 0x0001 0x00000000 0x0069684c 0x00519686 0xaf7060fc"
Variable names refer to values that are not part of the expression but are accessed during the execution of the expression. For example, when RMC processes an expression, the variable names are replaced by the corresponding persistent or dynamic attributes of each resource.
Entries of an array may be accessed by specifying a subscript as in 'C'. The index corresponding to the first element of the array is always 0 (for example, List[2] refers to the third element of the array named List). Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. A subscripted value may be used wherever the base data type of the array is used. For example, if List is an integer array, then "List[2]+4" produces the sum of 4 and the current value of the third entry of the array.
The elements of a structured data value can be accessed by using the following syntax:
<variable name>.<element name>
For example, a.b
The variable name is the name of the table column or resource attribute, and the element name is the name of the element within the structured data value. Either or both names may be followed by a subscript if the name is an array. For example, a[10].b refers to the element named b of the 11th entry of the structured data array called a. Similarly, a[10].b[3] refers to the fourth element of the array that is an element called b within the same structured data array entry a[10].
Variable names refer to values that are not part of an expression but are accessed during the execution of the expression. When used to select a resource, the variable name is a persistent attribute. When used to generate an event, the variable name is a dynamic attribute. When used to select audit records, the variable name is the name of a field within the audit record.
A variable name is restricted to include only 7-bit ASCII characters that are alphanumeric (a-z, A-Z, 0-9) or the underscore character (_). The name must begin with an alphabetic character. When the expression is used by the RMC subsystem for an event or a rearm event, the name can have a suffix that is the '@' character followed by 'P', which refers to the previous observation.
Constants and variables may be combined by an operator to produce a result that in turn may be used with another operator. The resulting data type or the expression must be a scalar integer or floating-point value. If the result is zero, the expression is considered to be FALSE; otherwise, it is TRUE.
The set of operators that can be used in strings is summarized in the
following table:
Operator | Description | Left Data Types | Right Data Types | Example | Notes |
---|---|---|---|---|---|
+ | Addition | Integer floats | Integer floats | "1+2" results in 3 | None |
- | Subtraction | Integer floats | Integer floats | "1.0-2.0" results in -1.0 | None |
* | Multiplication | Integer floats | Integer floats | "2*3" results in 6 | None |
/ | Division | Integer floats | Integer floats | "2/3" results in 1 | None |
- | Unary minus | None | Integer floats | "-abc" | None |
+ | Unary plus | None | Integer floats | "+abc" | None |
.. | Range | Integers | Integers | "1..3" results in 1,2,3 | Shorthand for all integers between and including the two values |
% | Modulo | Integers | Integers | "10%2" results in 0 | None |
| | Bitwise OR | Integers | Integers | "2|4" results in 6 | None |
& | Bitwise AND | Integers | Integers | "3&2" results in 2 | None |
~ | Bitwise complement | None | Integers | ~0x0000ffff results in 0xffff0000 | None |
^ | Exclusive OR | Integers | Integers | 0x0000aaaa^0x0000ffff results in 0x00005555 | None |
>> | Right shift | Integers | Integers | 0x0fff>>4 results in 0x00ff | None |
<< | Left shift | Integers | Integers | "0x0ffff<<4" results in 0xffff0 | None |
==
| Equality | All but SDs | All but SDs |
"2==2" results in 1
| Result is true (1) or false (0) |
!=
| Inequality | All but SDs | All but SDs |
"2!=2" results in 0
| Result is true (1) or false (0) |
> | Greater than | Integer floats | Integer floats | "2>3" results in 0 | Result is true (1) or false (0) |
>= | Greater than or equal | Integer floats | Integer floats | "4>=3"=1 | Result is true (1) or false (0) |
< | Less than | Integer floats | Integer floats | "4<3" results in 0 | Result is true (1) or false (0) |
<= | Less than or equal | Integer floats | Integer floats | "2<=3" results in 1 | Result is true (1) or false (0) |
=~ | Pattern match | Strings | Strings | "abc"="~a.*" results in 1 | Right operand is interpreted as an extended regular expression |
!~ | Not pattern match | Strings | Strings | "abc"!~"a.*" results in 0 | Right operand is interpreted as an extended regular expression |
=?
| SQL pattern match | Strings | Strings | "abc"=? "a%" results in 1 | Right operand is interpreted as a SQL pattern |
!?
| Not SQL pattern match | Strings | Strings | "abc"!? "a%" results in 0 | Right operand is interpreted as a SQL pattern |
|<
| Contains any | All but SDs | All but SDs | "{1..5}|<{2,10}" results in 1 | Result is true (1) if left operand contains any value from right operand |
><
| Contains none | All but SDs | All but SDs | "{1..5}><{2,10}" results in 1 | Result is true (1) if left operand contains no value from right operand |
&< | Contains all | All but SDs | All but SDs | "{1..5}&<{2,10}" results in 0 | Result is true (1) if left operand contains all values from right operand |
||
| Logical OR | Integers | Integers | "(1<2)||(2>4)" results in 1 | Result is true (1) or false (0) |
&&
| Logical AND | Integers | Integers | "(1<2)&&(2>4)" results in 0 | Result is true (1) or false (0) |
!
| Logical NOT | None | Integers | "!(2==4)" results in 1 | Result is true (1) or false (0) |
When integers of different signs or size are operands of an operator, standard
C style casting is implicitly performed. When an expression with
multiple operators is evaluated, the operations are performed in the order
defined by the precedence of the operator. The default precedence can
be overridden by enclosing the portion or portions of the expression to be
evaluated first in parentheses (). For example, in the expression
"1+2*3", multiplication is normally performed before addition to produce a
result of 7. To evaluate the addition operator first, use parentheses
as follows: "(1+2)*3". This produces a result of 9. The
default precedence rules are shown in the following table. All
operators in the same table cell have the same or equal precedence.
Operators | Description |
---|---|
. | Structured data element separator |
~ | Bitwise complement |
! NOT not |
Logical not
|
- | Unary minus |
+ | Unary plus |
* | Multiplication |
/ | Division |
% | Modulo |
+ | Addition |
- | Subtraction |
- | Subtraction |
<< | Left shift |
>> | Right shift |
< | Less than |
<= | Less than or equal |
> | Greater than |
>= | Greater than or equal |
== | Equality |
!= | Inequality |
=? LIKE like | SQL match |
!? | SQL not match |
=~ | Reg expr match |
!~ | Reg expr not match |
?= | Reg expr match (compat) |
|< IN in | Contains any |
>< NOT IN not in | Contains none |
&< | Contains all |
& | Bitwise AND |
^ | Bitwise exclusive OR |
| | Bitwise inclusive OR |
&& | Logical AND |
|| | Logical OR |
, | List separator |
Two types of pattern matching are supported; extended regular expressions and that which is compatible with the standard SQL LIKE predicate. This type of pattern may include the following special characters:
Some examples of the types of expressions that can be constructed follow:
Name =~'tr.*0'
Name LIKE 'tr%0'
IntList|<{1,3,5..7}
IntList in (1,3,5..7)
(Name LIKE "tr%0")&&(IntList|<(1,3,5..7))
(Name=~'tr.*0') AND (IntList IN {1,3,5..7})