Error Logging Overview

AIX Version 4.3 Problem Solving Guide and Reference

Error Logging Overview

The error logging process begins when an operating system module detects an error. The error-detecting segment of code then sends error information to either the errsave and errlast kernel service or the errlog application subroutine, where the information is in turn written to the /dev/error special file. This process then adds a time stamp to the collected data. The errdemon daemon constantly checks the /dev/error file for new entries, and when new data is written, the daemon conducts a series of operations.

Before an entry is written to the error log, the errdemon daemon compares the label sent by the kernel or application code to the contents of the Error Record Template Repository. If the label matches an item in the repository, the daemon collects additional data from other parts of the system.

To create an entry in the error log, the errdemon daemon retrieves the appropriate template from the repository, the resource name of the unit that caused the error and detail data. Also, if the error signifies a hardware-related problem and hardware vital product data (VPD) exists, the daemon retrieves the VPD from the Object Data Manager. When you access the error log, either through SMIT or with the errpt command, the error log is formatted according to the error template in the error template repository and presented in either a summary or detailed report. Most entries in the error log are attributable to hardware and software problems, but informational messages can also be logged.

The diag command uses the error log in part to diagnose hardware problems. To correctly diagnose new system problems, the system deletes hardware-related entries older than 90 days from the error log. The system deletes software-related entries 30 days after they are logged.

Terms to help you use the error logging facility include the following:

error ID	A 32-bit CRC hexadecimal code used to identify a particular failure. Each error record template has a unique error ID.
error label	The mnemonic name for an error ID.
error log	The file that stores instances of errors and failures encountered by the system.
error log entry	A record in the system error log that describes a hardware failure, a software failure, or an operator message. An error log entry contains captured failure data.
error record template	A description of what will be displayed when the error log is formatted for a report, including information on the type and class of the error, probable causes, and recommended actions. Collectively, the templates comprise the Error Record Template Repository.