Error logging is automatically started during system initialization by the rc.boot script and is automatically stopped during system shutdown by the shutdown script. The error log analysis performed by the diagnostics (diag command) analyzes hardware error entries up to 90 days old. If you remove hardware error entries less than 90 days old, you can limit the effectiveness of this error log analysis.
To manage error logging efficiently, see:
The errclear, errdead, errlogger, errmsg, and errpt commands are part of the optionally installable Software Service Aids package (bos.sysmgt.serv_aid). You need the Software Service Aids package to generate reports from the error log or delete entries from the error log. You can install the Software Service Aids package on your system or you can transfer your system's error log file to a system that has the Software Service Aids package installed.
Determine the path to your system's error log file by running the following command:
/usr/lib/errdemon -l
There are a number of ways to transfer the file to another system. For example, you can copy the file to a remotely mounted file system using the cp command; you can copy the file across the network connection using the rcp, ftp, or tftp commands; or you can copy the file to removable media using the tar or backup command and restore the file onto another system.
You can format reports for an error log copied to your system from another system by using the -i flag of the errpt command. The -i flag allows you to specify the path name of an error log file other than the default. Likewise, you can delete entries from an error log file copied to your system from another system by using the -i flag of the errclear command.
You can customize the name and location of the error log file and the size of the internal error buffer to suit your needs.
You can also control the logging of duplicate errors.
To list the current settings, run /usr/lib/errdemon -l. The values for the error log file name, error log file size, and buffer size that are currently stored in the error log configuration database display on your screen.
To list the current settings, run /usr/lib/errdemon -l. The values for the error log file name, error log file size, buffer size, and duplicate handling values that are currently stored in the error log configuration database display on your screen.
To change the filename used for error logging run the /usr/lib/errdemon -i FileName command. The specified file name is saved in the error log configuration database and the error daemon is immediately restarted.
To change the maximum size of the error log file enter:
/usr/lib/errdemon -s LogSize
The specified log file size limit is saved in the error log configuration database and the error daemon is immediately restarted. If the log file size limit is smaller than the size of the log file currently in use, the current log file is renamed by appending .old to the file name and a new log file is created with the specified size limit. The amount of space specified is reserved for the error log file and is not available for use by other files. Therefore, you should be careful not to make the log excessively large. But, if you make the log too small, important information may be overwritten prematurely. When the log file size limit is reached, the file wraps, that is, the oldest entries are overwritten by new entries.
To change the size of the error log device driver's internal buffer, enter:
/usr/lib/errdemon -B BufferSize
The specified buffer size is saved in the error log configuration database and, if it is larger than the buffer size currently in use, the in-memory buffer is immediately increased. If it is smaller than the buffer size currently in use, the new size is put into effect the next time the error daemon is started after the system is rebooted. The buffer cannot be made smaller than the hard-coded default of 8KB. The size you specify is rounded up to the next integral multiple of the memory page size (4KBs). The memory used for the error log device driver's in-memory buffer is not available for use by other processes (the buffer is pinned).
You should be careful not to impact your system's performance by making the buffer excessively large. But, if you make the buffer too small, the buffer may become full if error entries are arriving faster than they are being read from the buffer and put into the log file. When the buffer is full, new entries are discarded until space becomes available in the buffer. When this situation occurs, an error log entry is created to inform you of the problem, and you should correct the problem by enlarging the buffer.
By default, starting with AIX 5.1, the error daemon eliminates duplicate errors. It does this by looking at each error that is logged. An error is a duplicate if it is identical to the previous error, and occurs within the approximate time interval specified with /usr/lib/errdemon -t time-interval. The default time value is 100, .1 seconds. The value is in milliseconds.
The -m maxdups flag controls how many duplicates can build up before a duplicate entry is logged. The default value is 1000. If an error, followed by 1000 occurrences of the same error, is logged, a duplicate error will be logged at that point rather than waiting for the time interval to expire or a unique error.
Thus if, for example, a device handler starts logging many identical errors rapidly, most will not appear in the log. Rather, the first occurrence will be logged as it is today. Subsequent occurrences will not be logged immediately, just counted. When the time interval expires, the maxdups value is reached, or when another error is logged, an alternate form of the error is logged giving the times of the first and last duplicate and how many duplicates there were.
Note: The time interval refers to the time since the last error, not the time since the first occurrence of this error, (i.e.) it is reset each time an error is logged. Also note that to be a duplicate, an error must exactly match the previous error. If, for example, anything about the detail data is different from the previous error, then that error is considered unique and logged as a separate error.
Entries are removed from the error log when the root user runs the errclear command, when the errclear command is automatically invoked by a daily cron job, and when the error log file wraps as a result of reaching its maximum size. When the error log file reaches the maximum size specified in the error log configuration database, the oldest entries are overwritten by the newest entries.
The system is shipped with a crontab file to delete hardware errors older than 90 days and other errors older than 30 days. To display the crontab entries for your system, enter:
crontab -l Command
To change these entries, enter:
crontab -e Command
The errclear command can be used to selectively remove entries from the error log. The selection criteria you may specify include the error id number, sequence number, error label, resource name, resource class, error class, and error type. You must also specify the age of entries to be removed. The entries that match the selection criteria you specified and are older than the number of days you specified will be removed.
You can disable logging or reporting of a particular event by modifying the Log or the Report field of the error template for the event. You can use the errupdate command to change the current settings for an event.
To list all events for which logging is currently disabled, enter:
errpt -t -F Log=0
Events for which logging is disabled are not saved in the error log file.
To list all events for which reporting is currently disabled, enter:
errpt -t -F Report=0
Events for which reporting is disabled are saved in the error log file when they occur, but they are not displayed by the errpt command.
You can use the errupdate command to change the current settings for an event. The necessary input to the errupdate command can be in a file or from standard input.
In the following example, standard input is used. To disable the reporting of the ERRLOG_OFF event (error id number 192AC071), enter the following lines to run the errupdate command:
errupdate <Enter> =192AC071: <Enter> Report=False <Enter> <Ctrl-D> <Ctrl-D>
Refer to Chapter 4, Error Notification in AIX 5L Version 5.1 General Programming Concepts: Writing and Debugging Programs.
The errlogger command allows the system administrator to record messages in the error log. Whenever you perform a maintenance activity, such as clearing entries from the error log, replacing hardware, or applying a software fix, it is a good idea to record this activity in the system error log.
Some applications use syslog for logging errors and other events. Some administrators find it desirable to be able to list error log messages and syslog messages in a single report. This can be accomplished by redirecting the syslog messages to the error log. You can do this by specifying errlog as the destination in the syslog configuration file (/etc/syslog.conf). See the syslogd daemon for more information.
You can log error log events in the syslog file by using the logger command with the concurrent error notification capabilities of error log. For example, to log system messages (syslog), add an errnotify object with the following contents:
errnotify: en_name = "syslog1" en_persistenceflg = 1 en_method = "logger Msg from Error Log: `errpt -l $1 | grep -v 'ERROR_ID TIMESTAMP'`"
For example, create a file called /tmp/syslog.add with these contents, then run the command odmadd /tmp/syslog.add (you must be logged in as root to do this).
For more information about concurrent error notification, see the Chapter 4, Error Notification in AIX 5L Version 5.1 General Programming Concepts: Writing and Debugging Programs.