This document discusses how to resolve issues with local JFS filesystems that have run out of available space.
Information in this document applies to AIX Versions 4.x.
For more in-depth coverage of this subject, the following IBM publications are recommended:
http://www.rs6000.ibm.com/resource/
To resolve an out-of-filesystem-space issue, complete these steps:
Once the above steps have been completed, the situation should be resolved, or the reason for the problem should be understood.
If these steps do not resolve the issue, filesystem corruption MAY be involved. Unmount the filesystem and run a full fsck against it to verify that no corruption problems exist.
The df command is used to get filesystem status information. The relevant field to consider is %Used.
%Used = percentage of total filesystem space currently allocated
Example:
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd4 12288 68 99% 1823 23% / /dev/hd2 409600 20436 96% 16181 16% /usr /dev/hd9var 8192 6088 26% 163 8% /var /dev/hd3 12288 11340 8% 87 3% /tmp /dev/hd1 57344 13872 76% 1459 11% /home
In this example, most of the available free space in the root filesystem / is allocated.
At this point, we have determined free-space problems on the / filesystem. The next issue is to determine what kind of space problem exists.
Keep in mind that the mounts are hierarchical and have a bottom-up precedent. In other words, a filesystem mounted below a second filesystem cannot access data in the second filesystem. For example, if you have a mount entry called /myfilesystem/mydata immediately followed by a mount entry called /myfilesystem, then the /myfilesystem mount point cannot access the /myfilesystem/mydata filesystem and any data that resides there.
There are two commands generally used to determine how and where filesystem allocation is placed: df and du.
df uses the space in a filesystem that is currently unallocated to determine the space that is used in a filesystem. For instance, if you have a filesystem that consists of 8192 512-byte blocks, and 4096 of those blocks are currently not allocated to anything, then the total space being used by the filesystem would be 4096 512-byte blocks.
Allocated Storage = Total Storage - Unallocated Storage
df is inherently the most reliable command to report filesystem usage, because df reports information based on the filesystem as a whole.
du is a file-oriented command. It reports the space allocated to a specified file or directory. du must have a destination parameter, and is not isolated to a filesystem. For instance, running du / would give allocation information for all files in /. This would include all files in the / filesystem and any other filesystem mounted under /, such as /tmp, /var, and /usr. You could use the -x option of du to keep the operations within the filesystem, but there are cases where the results of using this option may be incomplete.
du will only report space taken by files. It will not report space taken by filesystem metadata, such as inodes, inode maps, or disk maps. inode/disk maps and other reserved areas for filesystem use will take up a negligable portion of the filesystem space, but the areas reserved for inodes can be substantial, and is ultimately based on the NBPI (Number of Bytes Per Inode) chosen when the filesystem was created. Each inode uses 128 bytes of filesystem space, so the amount of space taken for inode use will be the percentage defined below:
(128 / NBPI) * 100
By default, a filesystem will use a NBPI of 4096, so the general overhead for a filesystem will be about 3%.
To determine what the NBPI is for the filesystem in question, issue the lsfs command with the -q option on the mount point of the filesystem.
Example:# lsfs -q / Name Nodename Mount Pt VFS Size Options Auto Accounting /dev/hd4 -- / jfs 81920 -- yes no (lv size: 24576, fs size: 24576, frag size: 4096, nbpi: 4096, compress: no, bf : false, ag: 8)
du will only show allocated information about files it can reference. There are two cases where du may not show information about allocated storage.
To address the situation presented in case 1 in the previous section, mount the primary mount point of the desired filesystem on a secondary mount point. This has the effect of negating any filesystem mounted under the primary mount point.
Example:
mount / /mnt
In the above example, we mounted the / filesystem over /mnt. The effect is that if we go into /mnt, we see all the information about the / filesystem and no other filesystem mounted under /. If we run cd /mnt/tmp, then we are actually in the directory /tmp in the / filesystem, and not in the /tmp filesystem. Also, if we run du -sk /mnt, it should closely match the %Used for / from the df command. If it does not, then this indicates that case 2 may be occuring (see the next section). For now, we will proceed with case 1.
We can now investigate disk usage accurately. First, go into the root directory of this filesystem.
cd /mnt
Run the du command to get an accurate accounting of the space that can be seen for all accessible files in this filesystem.
# du -sk /mnt 11778 /mnt
This will report the accounted space taken for files in kilobytes. If you add the overhead of the filesystem described above, this figure should closely match that given by the df command.
Example:# df -vk / Filesystem 1024-blocks Used Free %Used Iused Ifree %Iused Mounted on /dev/hd4 12288 12220 68 99% 1823 1249 23% /
The overhead in this case will be (128/4096) * 12288K = 368K
The total space that can be accounted for is 11778K + 368K = 12146K versus the reported space taken from df of 12220K. This outcome is accurate and indicates that any space seen is accounted for in a file somewhere in the filesystem. If the difference between the two is large, then this indicates case 2 is more likely to be occurring. The next section, "Resolving Space Taken by Open Files That Have Been Deleted" addresses the situation in case 2. If not, continue with the following steps.
Run the following command on the new mount point of this filesystem to get a sorted disk usage of the filesystem's root directory.
ls -A . | while read name; do du -sk $name; done | sort -nr
Example:
# ls -A . | while read name; do du -sk $name; done | sort -nr 2168 etc 192 lpp 168 sbin 40 dev 28 export 12 smit.log 4 var 4 usr 4 tmp 4 tftpboot 4 src 4 smit.script 4 mnt 4 .sh_history 4 .profile 0 unix 0 u 0 lib 0 bootrec 0 bin
This command sorts disk usage for all files in the current directory by size, in decreasing order. If the file we suspect happens to be a directory, we can then change into that directory, and re-run the preceding command to determine what is taking up space within that directory. Continue these steps until you find the desired file or files, at which point you can take appropriate actions.
In case 2, there are files within the filesystems that are opened by applications but have been removed from the filesystem tree. This behavior is documented in the unlink() system call as follows.
When all links to a file are removed and no process has the file open, all resources associated with the file are reclaimed, and the file is no longer accessible. If one or more processes have the file open when the last link is removed, the directory entry disappears.
However, the removal of the file contents is postponed until all references to the file are closed.
You can use the fuser command with the -dV flag on the full path to the device on which the filesystem resides. This will display files that have been removed but are still open. It will also report the inode number and size of such files. Using the process ID returned for these files, you can instruct the source application to close these files, or you can exit the application. Once this has occurred, and fuser no longer shows this deleted file, the space will be returned to the filesystem for general use.
NOTE: Using the flags given for fuser requires enhancements for fuser to be installed for the appropriate release of AIX, as listed in the following table. You may need other fixes in addition to these to reliably perform the operations of this document. Please refer to the list at the end of this document.
APAR Description AIX Level ---- ----------- --------- IX78943 ENHANCEMENTS TO FUSER 4.1 IX78941 ENHANCEMENTS TO FUSER 4.2 IX78523 ENHANCEMENTS TO FUSER 4.3
If the filesystem had a shared library that was deleted and the process that used the library is no longer active, the library will still be open on the loader list. fuser will not detect these situations, but they can be remedied by running the slibclean command. This will flush any shared libraries from the loader list that are no longer active, and if they were deleted, the space will then be reclaimed.
APAR Description AIX Level ---- ----------- --------- IX78066 FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP 4.3 IX78873 MALLOC FAILED ERRORS FROM FUSER 4.3 IX76061 DEFRAGFS AND FSCK INCORRECTLY REPORT BAD BIT MAP 4.2 IX77541 FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP 4.2