About Filesystem Serviceability Enhancements in AIX V4
Contents
About This Document
Related Documentation
Serviceability Enhancements
New Filesystem Error Log Entries
About This Document
This document discusses serviceability enhancements made to the base filesystem
services in AIX since its initial release.
Information in this document is applicable to AIX Versions 4.x.
For more in-depth coverage of this subject, the following IBM publications
are recommended:
- AIX Version 4.3 System Management Guide: Operating System and Devices (SC23-2525)
- AIX Version 4.3 Commands Reference (reference for fsck), Volume 1
The AIX and RS/6000 product documentation library is also available:
http://www.rs6000.ibm.com/resource/
In order to troubleshoot filesystem-related problems more effectively,
several enhancements have been made to the base AIX operating system through
the following APARs.
AIX RELEASE |
APAR |
4.1.5 |
IX83878 |
4.2.1 |
IX82819 and IX84402 |
4.3.2 |
IX86362 |
The enhancements are evident in three areas.
First, there is a new error code, ECORRUPT (return code 89). This value
is returned to an application from a filesystem system call when non-fatal
(for example, directory) filesystem corruption is detected.
Second, new filesystem error log entries have been added. There are
two classes of new entries.
- Informational: These do not of themselves constitute an error condition.
They are meant to alert the administrator to possible problems; they may
require attention to prevent a problem from occurring.
- Diagnostic: These indicate that a problem has occurred. They are intended to
pinpoint the resource on which a problem has occurred and what kind of problem
has occurred. A service representative may need to be contacted to pursue
these when they occur.
In the case of the ECORRUPT return code, examine the system error log
to gather more information about what may have caused the corruption--such
as a hardware problem or, possibly, downlevel microcode or drivers for
the underlying storage subsystem. Unmount the source filesystem involved with this
return code, and run fsck against it to try to resolve
corruption problems.
In AIX 4.1.5 and 4.3.2, the message returned by perror() for ECORRUPT
is Reserved errno was encountered.
In AIX 4.2.1 and all future releases beyond AIX 4.3.2, the message
returned by perror() for ECORRUPT is Invalid filesystem control data
detected.
Third, the system dump facility has been enhanced to include extensive
filesystem dump information only when a filesystem-related crash has
been detected. In the case of a filesystem-related dump, more information
will be present in the dump to help diagnose the cause. If the crash
is not filesystem related, the amount of filesystem-related information
present in the dump will be reduced. This will generally result in smaller
dumps.
When you receive one of the new error log entries in your
error report, refer to the following list for an explanation
of each new error type and possible actions to take.
Error Description: SPECFS_DDINTPRI |
Error ID: AE26DD07 |
This error is logged when a call to a device driver returns with system
interrupts disabled. A system crash will accompany this. The owner of the
device driver should investigate the cause.
Error Description: JFS_USER_HARDLINK |
Error ID: 5ECE4A58 |
This error is logged when a hard link is created to or removed from a directory.
This error is intended as a warning since such operations can result in
filesystem corruption or strange filesystem behavior.
Only an entity with superuser privileges can perform these kinds of hard links.
Error Description: JFS_USER_WRITEMOUNT |
Error ID: D73189F6 |
This error is logged when a process writes directly to a logical volume
while a filesystem on that logical volume is mounted.
Unmount the filesystem and run fsck to ensure filesystem
integrity. fsck can also be run with the -d option specifying a block
(as indicated from the full error entry) to determine what file, if any, may
have been affected.
Error Description: JFS_FS_FULL |
Error ID: 369D049B |
This error is logged when no free space exists within a filesystem. This
error is logged only once per filesystem mount.
Error Description: JFS_FS_FRAGMENTED |
Error ID: 5DFED6F1 |
This error is logged when insufficient contiguous free space exists to
fulfill an allocation request within a filesystem. This error should only
apply to filesystems created with a fragment size less than 4K or a
large-file-enabled filesystem.
Run the defragfs command on the affected filesystem to defragment
the filesystem free space.
Error Description: JFS_FS_NOINODES |
Error ID: 8988389F |
This error is logged when all of the inodes allocated to a filesystem are
exhausted. To resolve the problem, remove or relocate inactive files within the filesystem
to another filesystem. Alternatively, increase the filesystem
size to supply more inodes. If more inodes are needed
within the specified size of the filesystem, then the filesystem may need
to be recreated with a smaller NBPI value to increase the number of inodes
for this filesystem.
Error Description: JFS_KERNHEAP_LOW |
Error ID: 83F4B3CB |
This error is logged if a kernel memory request is unsuccessful. This error
will be logged no more than once per 24-hour period of system uptime. Contact an
appropriate service representative.
Error Description: JFS_KERNHEAP_DELAY |
Error ID: 7975092C |
This error is logged if a kernel memory request had a previous failure
but succeeded on a retry. This error will be logged no more than once per
24-hour period of system uptime. This error is merely a warning, but may
result in a JFS_KERNHEAP_LOW error if the condition continues. Contact an
appropriate service representative.
Error Description: JFS_LOG_WRAP |
Error ID: 061675CF |
This error is logged if transactions written to a JFS log device caused
the beginning of the log to be overwritten. The system will have crashed
as a result. The solution is either to move filesystems that use that
log to other JFS log devices to attempt to distribute the amount of JFS
log activity, or to increase the size of the affected JFS log. Use the
major/minor numbers in the error report to identify the log device in the
/dev directory.
NOTE: To increase the size of a JFS log device, all filesystems that use
the log must be unmounted. The logical volume that the log device exists
on can then be extended with extendlv to the appropriate size ( up to
a maximum of 256meg). Afterwards, run logform on the log device to format
it to use all of the space in the logical volume.
Error Description: JFS_LOG_WAIT |
Error ID: CF71B5B3 |
This error is logged if transactions written to a JFS log device are reaching
a threshold that may result in a JFS_LOG_WRAP condition. This entry, which is
intended as a warning, is
logged if this condition occurs more than 10 times in a one-hour period.
Use the solution given for JFS_LOG_WRAP
to alleviate or eliminate the condition.
Error Description: JFS_LOG_WRITE_ERR |
Error ID: 902CE5A8 |
This error is logged when an input/output error occurs on the underlying
device on which the JFS log exists. The error could indicate a hardware problem, possibly
a missing disk, or backlevel microcode or drivers. Use the major/minor number
given in the error report entry to identify the log device as listed in
the /dev directory. Also, examine the error report for any disk or LVM
errors.
Error Description: JFS_LOG_EXCEPTION |
Error ID: 00A1E866 |
Ultimately, this error is much the same as the JFS_LOG_WRITE_ERR entry.
Follow the same solution steps.
Error Description: JFS_COMP_CORRUPTION |
Error ID: 4B6DA1F5 |
This error is logged when the decompression code for a compressed filesystem
fails due to corrupted file data. This error is logged once per filesystem
mount. It is likely to be due to a hardware problem. Ensure that the latest drivers
and microcode are installed for the underlying storage device, and check
the error report for any errors related to the storage subsystem.
Error Description: JFS_META_CORRUPTION |
Error ID: 684A365B |
This error is logged when filesystem corruption is detected. Examine the system
error report for any errors
on the storage subsystem. Ensure that the latest drivers and microcode are installed
for the storage subsystem components. Unmount the affected filesystem and
run fsck against it. If a system crash occurred when this
error was logged, the dump should be examined to help determine the cause.
Contact your appropriate service representative if you need further assistance.
Error Description: JFS_META_EXCEPTION |
Error ID: 0EC00096 |
This error is logged when filesystem corruption is detected. Examine the system
error report for any errors
on the storage subsystem. Ensure that the latest drivers and microcode are installed
for the storage subsystem components. Unmount the affected filesystem and
run fsck against it. If a system crash occurred when this
error was logged, the dump should be examined to help determine the cause.
Contact your appropriate service representative if you need further assistance.
Error Description: JFS_META_WRITE_ERR |
Error ID: D2A1B43E |
This error is logged when filesystem corruption is detected. Examine the system
error report for any errors
on the storage subsystem. Ensure that the latest drivers and microcode are installed
for the storage subsystem components. Unmount the affected filesystem and
run fsck against it. If a system crash occurred when this
error was logged, the dump should be examined to help determine the cause.
Contact your appropriate service representative if you need further assistance.
Error Description: JFS_FSCK_REQUIRED |
Error ID: CD546B25 |
This error is logged when filesystem corruption is detected. This will
normally accompany another error entry such as JFS_KERNHEAP_LOW, JFS_LOG_WRITE_ERR,
JFS_META_CORRUPTION, JFS_META_EXCEPTION, or JFS_META_WRITE_ERR. Unmount the
affected filesystem and
run fsck against it.