Error logging tasks and information to assist you in using the error logging facility include:
To obtain a report of all errors logged in the 24 hours prior to the failure, enter:
errpt -a -s mmddhhmmyy | pg
where mmddhhmmyy represents the month, day, hour, minute, and year 24 hours prior to the failure.
An error log report contains the following information:
Note: Not all errors will generate information for each of the following categories.
LABEL | Predefined name for the event. |
ID | Numerical identifier for the event. |
Date/Time | Date and time of the event. |
Sequence Number | Unique number for the event. |
Machine ID | Identification number of your system processor unit. |
Node ID | Mnemonic name of your system. |
Class | General source of the error. The possible error classes are:
|
Type | Severity of the error that has occurred. Five types of errors are possible:
|
Resource Name | Name of the failing resource (for example, a device name of hdisk0 .) |
Resource Class | General class of the failing resource (for example, a device class of disk ). |
Resource Type | Type of the failing resource (for example, a device type of 355mb ). |
Location Code | Path to the device. There may be up to four fields, which refer to drawer, slot, connector, and port, respectively. |
VPD | Vital product data. The contents of this field, if any, vary. Error log entries for devices typically return information concerning the device manufacturer, serial number, Engineering Change levels, and Read Only Storage levels. |
Description | Summary of the error. |
Probable Cause | Listing of some of the possible sources of the error. |
User Causes | List of possible reasons for errors due to user mistakes. An improperly inserted disk and external devices (such as modems and printers) that are not turned on are examples of user-caused errors. |
Recommended Actions | Description of actions for correcting a user-caused error. |
Install Causes | List of possible reasons for errors due to incorrect installation or configuration procedures. Examples of this type of error include hardware and software mismatches, incorrect installation of cables or cable connections becoming loose, and improperly configured systems. |
Recommended Actions | Description of actions for correcting an installation-caused error. |
Failure Causes | List of possible defects in hardware or software.
Note: A failure causes section in a software error log usually indicates a software defect. Logs that list user or install causes or both, but not failure causes, usually indicate that the problem is not a software defect. If you suspect a software defect, or are unable to correct user or install causes, report the problem to your software service department. |
Recommended Actions | Description of actions for correcting the failure. For hardware errors, PERFORM PROBLEM DETERMINATION PROCEDURES is one of the recommended actions listed. For hardware errors, this will lead to running the diagnostic programs. |
Detailed Data | Failure data that is unique for each error log entry, such as device sense data. See "Interpreting Device Sense Data" . |
Reporting may be turned off for some errors. To show which errors have reporting turned off, enter:
errpt -t -F report=0 | pg
If reporting is turned off for any errors, enable reporting of all errors using the errupdate command.
Logging may also have been turned off for some errors. To show which errors have logging turned off, enter:
errpt -t -F log=0 | pg
If logging is turned off for any errors, enable logging for all errors using the errupdate command. Logging all errors is useful if it becomes necessary to recreate a system error.
The following are sample error report entries that are generated by issuing the errpt -a command.
An error-class value of H and an error-type value of PERM indicate that the system encountered a problem with a piece of hardware (the SCSI adapter device driver) and could not recover from it.
LABEL: SCSI_ERR1 ID: 0502F666
Date/Time: Jun 19 22:29:51 Sequence Number: 95 Machine ID: 123456789012 Node ID: host1 Class: H Type: PERM Resource Name: scsi0 Resource Class: adapter Resource Type: hscsi Location: 00-08 VPD: Device Driver Level.........00 Diagnostic Level............00 Displayable Message.........SCSI EC Level....................C25928 FRU Number..................30F8834 Manufacturer................IBM97F Part Number.................59F4566 Serial Number...............00002849 ROS Level and ID............24 Read/Write Register Ptr.....0120
Description ADAPTER ERROR
Probable Causes ADAPTER HARDWARE CABLE CABLE TERMINATOR DEVICE
Failure Causes ADAPTER CABLE LOOSE OR DEFECTIVE
Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES CHECK CABLE AND ITS CONNECTIONS
Detail Data SENSE DATA 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
An error-class value of H and an error-type value of PEND indicate that a piece of hardware (the Token Ring) may become unavailable soon due to numerous errors detected by the system.
LABEL: TOK_ESERR ID: AF1621E8
Date/Time: Jun 20 11:28:11 Sequence Number: 17262 Machine Id: 123456789012 Node Id: host1 Class: H Type: PEND Resource Name: TokenRing Resource Class: tok0 Resource Type: Adapter Location: TokenRing
Description EXCESSIVE TOKEN-RING ERRORS
Probable Causes TOKEN-RING FAULT DOMAIN
Failure Causes TOKEN-RING FAULT DOMAIN
Recommended Actions REVIEW LINK CONFIGURATION DETAIL DATA CONTACT TOKEN-RING ADMINISTRATOR RESPONSIBLE FOR THIS LAN
Detail Data SENSE DATA 0ACA 0032 A440 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 2080 0000 0000 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 78CC 0000 0000 0005 C88F 0304 F4E0 0000 1000 5A4F 5685 1000 5A4F 5685 3030 3030 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
An error-class value of S and an error-type value of PERM indicate that the system encountered a problem with software and could not recover from it.
LABEL: DSI_PROC ID: 20FAED7F
Date/Time: Jun 28 23:40:14 Sequence Number: 20136 Machine Id: 123456789012 Node Id: 123456789012 Class: S Type: PERM Resource Name: SYSVMM
Description Data Storage Interrupt, Processor
Probable Causes SOFTWARE PROGRAM
Failure Causes SOFTWARE PROGRAM
Recommended Actions IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data Data Storage Interrupt Status Register 4000 0000 Data Storage Interrupt Address Register 0000 9112 Segment Register, SEGREG D000 1018 EXVAL 0000 0005
An error-class value of S and an error-type value of TEMP indicate that the system encountered a problem with software. After several attempts, the system was able to recover from the problem.
LABEL: SCSI_ERR6 ID: 52DB7218
Date/Time: Jun 28 23:21:11 Sequence Number: 20114 Machine Id: 123456789012 Node Id: host1 Class: S Type: INFO Resource Name: scsi0
Description SOFTWARE PROGRAM ERROR
Probable Causes SOFTWARE PROGRAM
Failure Causes SOFTWARE PROGRAM
Recommended Actions IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data SENSE DATA 0000 0000 0000 0000 0000 0011 0000 0008 000E 0900 0000 0000 FFFF FFFE 4000 1C1F 01A9 09C4 0000 000F 0000 0000 0000 0000 FFFF FFFF 0325 0018 0040 1500 0000 0000 0000 0000 0000 0000 0000 0000 0800 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
An error class value of O indicates that an informational message has been logged.
LABEL: OPMSG ID: AA8AB241
Date/Time: Jul 16 03:02:02 Sequence Number: 26042 Machine Id: 123456789012 Node Id: host1 Class: O Type: INFO Resource Name: OPERATOR
Description OPERATOR NOTIFICATION
User Causes errlogger COMMAND
Recommended Actions REVIEW DETAILED DATA
Detail Data MESSAGE FROM errlogger COMMAND hdisk1 : Error log analysis indicates a hardware failure.
The following is an example of a summary error report generated using the errpt command. One line of information is returned for each error entry.
Note: "Error Identifiers for the Error Log" lists the possible error log entries by error label.
ERROR_ IDENTIFIER TIMESTAMP T CL RESOURCE_NAME ERROR_DESCRIPTION 192AC071 0101000070 I 0 errdemon Error logging turned off 0E017ED1 0405131090 P H mem2 Memory failure 9DBCFDEE 0101000070 I 0 errdemon Error logging turned on 038F2580 0405131090 U H scdisk0 UNDETERMINED ERROR AA8AB241 0405130990 I O OPERATOR OPERATOR NOTIFICATION
To learn more about interpreting device sense data, refer to "Hardware Error Report Service Aid" . The Hardware Error Report Service Aid prints a field-by-field description of the sense data for most error log entries.
Use the following procedure to create an error report of software or hardware problems.
errpt -aThe errpt command generates an error report from entries in the system error log.
If the error log does not contain entries, error logging has been turned off. Activate the facility by entering:
/usr/lib/errdemon
Note: You must have root user access to run this command.
The errdemon daemon starts error logging and writes error log entries in the system error log. If the daemon is not running, errors are not logged.
errpt -N hdisk1
smit errptSelect 1 to send the error report to standard output or 2 to send the report to the printer.
Select yes to display or print error log entries as they occur; otherwise, select no .
Specify the appropriate device name in the Select resource names option (such as hdisk1 ).
This procedure describes how to stop the error logging facility. Ordinarily, you would not want to turn off the error logging facility. Instead, you should clean the error log of old or unnecessary entries. For instructions about cleaning the error log, refer to "Cleaning an Error Log".
You should turn off the error logging facility when installing or experimenting with new software or hardware. This way the error logging daemon does not use CPU time to log problems you know you are causing.
Note: You must have root user authority to use the command in this procedure.
Enter the errstop command to turn off error logging:
errstop
The errstop command stops the error logging daemon from logging entries.
This procedure describes how to strip old or unnecessary entries from your error log. Cleaning is normally done for you as part of the daily cron command.
If it is not done automatically, you should probably clean the error log yourself every couple of days after you have examined the contents to make sure there are no significant errors.
You can also clean up specific errors. For example, if you get a new disk and you do not want the old disk's errors in the log to confuse you, you can clean just the disk errors.
Delete all entries in your error log by doing either of the following:
errclear -d S 0The errclear command deletes entries from the error log that are older than a certain number of days. The 0 in the previous example means that you want to delete entries for all days.
smit errclear
ls /var/adm/ras/errlog | backup -ivp
ls /var/adm/ras/errlog | backup -ivpf/dev/rmt0OR
Note: You need root user authority to use the snap command.
snap -a -o /dev/rfd0The snap command in this example uses the -a flag to gather all information about your system configuration. The -o flag copies the compressed tar file to the device you name. /dev/rfd0 names your disk drive.
Enter the following command to gather all configuration information in a tar file and copy it to tape:
snap -a -o /dev/rmt0/dev/rmt0 names your tape drive.
See the snap command in AIX Version 4.3 Commands Reference for more information.
Go back to Error Logging Facility.