The error facility allows a device driver to have entries recorded in the system error log. These error log entries record any software or hardware failures that need to be available either for informational purposes or for fault detection and corrective action. The device driver, using the errsave kernel service, adds error records to the special file /dev/error.
The errdemon daemon then picks up the error record and creates an error log entry. When you access the error log either through SMIT (System Management Interface Tool) or with the errpt command, the error record is formatted according to the error template in the error template repository and presented in either a summary or detailed report. See the Flow of the Error Logging Facility figure for an illustration of this.
Follow three precoding steps before initiating the error logging process. It is beneficial to understand what services are available to developers, and what the customer, service personnel, and defect personnel see.
The first precoding step is to review the error-logging documentation and determine whether a particular error should be logged. Do not use system resources for logging information that is unimportant or confusing to the intended audience.
It is, however, a worse mistake not to log an error that merits logging. You should work in concert with the hardware developer, if possible, to identify detectable errors and the information that should be relayed concerning those errors.
The next step is to determine the text of the message. Use the errmsg command with the -w flag to browse the system error messages file for a list of available messages. If you are developing a product for wide-spread general distribution and do not find a suitable system error message, you can submit a request to your supplier for a new message or follow the procedures that your organization uses to request new error messages. If your product is an in-house application, you can use the errmsg command to define a new message that meets your requirements.
Finally, determine the correct level of thresholding. Each error to be logged, regardless of whether it is a software or hardware error, can be limited by thresholding to avoid filling the error log with duplicate information.
Side effects of runaway error logging include overwriting existing error log entries and unduly alarming the end user. The error log is not unlimited in size. When its size limit is reached, the log wraps. If a particular error is repeated needlessly, existing information is overwritten, possibly causing inaccurate diagnostic analyses. The end user or service person can perceive a situation as more serious or pervasive than it is if they see hundreds of identical or nearly identical error entries.
You are responsible for implementing the proper level of thresholding in the device driver code.
The error log currently equals 1MB. As shipped, it cleans up any entries older than 30 days. In order to ensure that your error log entries are actually informative, noticed, and remain intact, test your driver thoroughly.
The first task is to select the error text. After browsing the contents of the system message file, three possible paths exist for selecting the error text. Either all of the desired messages for the new errors exist in the message file, none of the messages exist, or a combination of errors exists.
SET E E859 "The wagon wheel is broken."
The second step is to construct your error record templates. An error record template defines the text that appears in the error report. Each error record template has the following general form:
Error Record Template +LABEL: Comment = Class = Log = Report = Alert = Err_Type = Err_Desc = Probable_Causes = User_Causes = User_Actions = Inst_Causes = Inst_Actions = Fail_Causes = Fail_Actions = Detail_Data = <data_len>, <data_id>, <data_encoding>
Each field in this stanza has well-defined criteria for input values. See the errupdate command for more information. The fields are:
An example of an error record template is:
+ MISC_ERR: Comment = "Interrupt: I/O bus timeout or channel check" Class = H Log = TRUE Report = TRUE Alert = FALSE Err_Type = UNKN Err_Desc = E856 Prob_Causes = 3300, 6300 User_Causes = User_Actions = Inst_Causes = Inst_Actions = Fail_Causes = 3300, 6300 Fail_Actions = 0000 Detail_Data = 4, 8119, HEX *IOCC bus number Detail_Data = 4, 811A, HEX *Bus Status Register Detail_Data = 4, 811B, HEX *Misc. Interrupt Register
Construct the error templates for all new errors to be added in a file suitable for entry with the errupdate command. Run the errupdate command with the -h flag and the input file. The new errors are now part of the error record template repository. A new header file is also created (file.h) in the same directory in which the errupdate command was run. This header file must be included in the device driver code at compile time. Note that the errupdate command has a built-in syntax checker for the new stanza that can be called with the -c flag.
The third step in coding error logging is to put the error logging calls into the device driver code. The errsave kernel service allows the kernel and kernel extensions to write to the error log. Typically, you define a routine in the device driver that can be called by other device driver routines when a loggable error is encountered. This function takes the data passed to it, puts it into the proper structure and calls the errsave kernel service. The syntax for the errsave kernel service is:
#include <sys/errids.h> void errsave(buf, cnt) char *buf; unsigned int cnt;
buf | Specifies a pointer to a buffer that contains an error record as described in the sys/errids.h header file. |
cnt | Specifies a number of bytes in the error record contained in the buffer pointed to by the buf parameter. |
The following sample code is an example of a device driver error logging routine. This routine takes data passed to it from some part of the main body of the device driver. This code simply fills in the structure with the pertinent information, then passes it on using the errsave kernel service.
void errsv_ex (int err_id, unsigned int port_num, int line, char *file, uint data1, uint data2) { dderr log; char errbuf[255]; ddex_dds *p_dds; p_dds = dds_dir[port_num]; log.err.error_id = err_id; if (port_num = BAD_STATE) { sprintf(log.err.resource_name, "%s :%d", p_dds->dds_vpd.adpt_name, data1); data1 = 0; } else sprintf(log.err.resource_name,"%s",p_dds->dds_vpd.devname); sprintf(errbuf, "line: %d file: %s", line, file); strncpy(log.file, errbuf, (size_t)sizeof(log.file)); log.data1 = data1; log.data2 = data2; errsave(&log, (uint)sizeof(dderr)); /* run actual logging */ } /* end errlog_ex */
The data to be passed to the errsave kernel service is defined in the dderr structure which is defined in a local header file, dderr.h. The definition for dderr is:
typedef struct dderr { struct err_rec0 err; int data1; /* use data1 and data2 to show detail */ int data2; /* data in the errlog report. Define */ /* these fields in the errlog template */ /* These fields may not be used in all */ /* cases. */ } dderr;
The first field of the dderr.h header file is comprised of the err_rec0 structure, which is defined in the sys/err_rec.h header file. This structure contains the ID (or label) and a field for the resource name. The two data fields hold the detail data for the error log report. As an alternative, you could simply list the fields within the function.
You can also log a message into the error log from the command line. To do this, use the errlogger command.
After you add the templates using the errupdate command, compile the device driver code along with the new header file. Simulate the error and verify that it was written to the error log correctly. Some details to check for include:
The error logging process begins when a loggable error is encountered and the device driver error logging subroutine sends the error information to the errsave kernel service. The error entry is written to the /dev/error special file. Once the information arrives at this file, it is time-stamped by the errdemon daemon and put in a buffer. The errdemon daemon constantly checks the /dev/error special file for new entries, and when new data is written, the daemon collects other information pertaining to the resource reporting the error. The errdemon daemon then creates an entry in the /var/adm/ras/errlog error logging file.