This section discusses file system tuning:
To review basic information about file systems, see AIX 5L Version 5.1 System Management Concepts: Operating Systems and Devices.
This section discusses the multiple file systems supported on AIX 5.1.
Journaled File System (JFS) is the default file system for AIX 4.3.3 and earlier, as well as for AIX 5.1 running under a 32-bit kernel.
A journaling file system allows for quick file system recovery after a crash has occurred by logging the metadata of files. By enabling file-system logging, the system records every change in the metadata of the file into a reserved area of the file system. The actual write operations are performed after the logging of changes to the metadata has been completed.
Enhanced JFS (also known as JFS2) is another native AIX journaling file system that was introduced in AIX 5.1. Enhanced JFS is the default file system for 64-bit kernel environments. Due to address space limitations of the 32-bit kernel, Enhanced JFS is not recommended for use in 32-bit kernel environments.
The General Parallel File System (GPFS) is a high-performance, shared-disk file system that can provide fast data access to all nodes in a server cluster. Parallel and serial applications access files using standard UNIX file system interfaces, such as those in AIX.
GPFS provides high performance by striping I/O across multiple disks, high availability through logging, replication, and both server and disk failover.
For more information, see the IBM Redbook entitled GPFS on AIX Clusters: High Performance File System Administration Simplified at http://www.redbooks.ibm.com/redbooks/SG246035.html.
The Network File System (NFS) is a distributed file system that allows users to access files and directories located on remote computers and treat those files and directories as if they were local. For example, users can use operating system commands to create, remove, read, write, and set file attributes for remote files and directories.
Performance tuning and other issues regarding NFS are found in the chapter Monitoring and Tuning NFS Use.
A RAM disk is a simulated disk drive that resides in memory. RAM disks are designed to have significantly higher I/O performance over physical drives, and are typically used to overcome I/O bottlenecks with nonpersistent files. Do not use RAM disks for persistent data, as all data is lost if the system crashes or reboots.
To set up a RAM disk, use the mkramdisk command. The following example illustrates how to set up a RAM disk that is approximately 20 MB in size and how to create a file system on that RAM disk:
mkramdisk 40000 ls -l /dev | grep ram mkfs -V jfs /dev/ramdiskx mkdir /ramdiskx mount -V jfs -o nointegrity /dev/ramdiskx /ramdiskx
where x is the logical RAM disk number. To remove a RAM disk, use the rmramdisk command. RAM disks are also removed when the machine is rebooted.
Beginning with AIX 5.2, the previous maximum size limit of 2GB has been removed and the maximum size of a RAM disk is now just limited by the amount of available system memory.
This section discusses some of the main differences between JFS and Enhanced JFS.
AIX 5.1 offers two different types of kernels, a 32-bit kernel and a 64-bit kernel. The 32-bit and 64-bit kernels have common libraries, commands and utilities, and header files. However, the 64-bit kernel offers a degree of scaling for 64-bit hardware that the 32-bit kernel cannot.
JFS is optimized for 32-bit kernel, and will run on 32-bit and 64-bit hardware. Enhanced JFS is optimized to run on the 64-bit kernel, allowing it to take advantage of 64-bit functionality.
For a description of kernel address space issues and differences, see POWER-based-Architecture-Unique Timer Access.
Before writing actual data, a journaling file system logs the metadata, thus incurring an overhead penalty that slows write throughput. One way of improving performance under JFS is to disable metadata logging by using the nointegrity mount option. Note that the enhanced performance is achieved at the expense of metadata integrity. Therefore, use this option with extreme caution because a system crash can make a file system mounted with this option unrecoverable.
In contrast to JFS, Enhanced JFS does not allow you to disable metadata logging. However, the implementation of journaling on Enhanced JFS makes it more suitable to handle metadata-intensive applications. Thus, the performance penalty is not as high under Enhanced JFS as it is under JFS.
An index node, or i-node, is a data structure that stores all file and directory properties. When a program looks up a file, it searches for the appropriate i-node by looking up a file name in a directory. Because these operations are performed very often, the mechanism used for searching is of particular importance.
JFS employs a linear organization for its directories, thus making searches linear as well. In contrast, Enhanced JFS employs a binary tree representation for directories, thus greatly accelerating access to files.
The main advantage of using Enhanced JFS over JFS is scaling. Enhanced JFS provides the capability to store much larger files than the existing JFS. The maximum size of a file under JFS is 64 gigabytes. Under Enhanced JFS, AIX currently supports files up to 16 terabytes in size, although the file system architecture is set up to eventually handle file sizes of up to 4 petabytes.
Another scaling issue relates to accessing a large number of files. The following illustration shows how Enhanced JFS can improve performance for this type of access.
For this example, we are creating, deleting, and searching directories with unique, 10-byte file names. The results show that creating and deleting files is much faster under Enhanced JFS than under JFS. Performance for searches was approximately the same for both file system types.
A second example shows how results for create, delete, and search operations are generally much faster on Enhanced JFS than on JFS when using non-unique file names. In this second example, file names were chosen to have the same first 64-bytes, appended by 10-byte unique names. The following illustration shows the result for this test.
On a related note, beginning with AIX 5.2, caching of long (greater than 32 characters) file names is supported in both the JFS and Enhanced JFS name caches. This improves the performance of directory operations, such as ls and find on directories with numerous long file name entries.
Cloning with a system backup (mksysb) from a 64-bit enabled JFS2 system to a 32-bit system will not be succesful.
Unlike the JFS file system, the JFS2 file system will not allow the link() API to be used on its binary type directory. This limitation may cause some applications that operate correctly on a JFS file system to fail on a JFS2 file system.
The following table summarizes the differences between JFS and Enhanced JFS.
Function | JFS | Enhanced JFS |
---|---|---|
Optimization | 32-bit kernel | 64-bit kernel |
Maximum file system size | 1 terabyte | 4 petabytes1 |
Maximum file size | 64 gigabytes | 4 petabytes1 |
Number of I-nodes | Fixed at file system creation | Dynamic, limited by disk space |
Large file support | As mount option | Default |
Online defragmentation | Yes | Yes |
namefs | Yes | Yes |
DMAPI | Yes | No |
Compression | Yes | No |
Quotas | Yes | No |
Deferred update | Yes | No |
Direct I/O support | Yes | Yes |
This section discusses situations that can potentially inhibit JFS and Enhanced JFS performance.
Because write operations are performed after logging of metadata has been completed, write throughput can be affected. For a description of how to avoid performance penalties associated with logging, see Monitoring and Tuning Disk I/O.
The Journaled File System supports fragmented and compressed file systems as a means of saving disk space. On average, data compression saves disk space by about a factor of two. However, the fragmentation and compression might incur performance loss associated with increased allocation activity. For a description of how compression and fragmentation might affect performance, see Monitoring and Tuning Disk I/O.
To enhance performance, both JFS and Enhanced JFS allow for online defragmentation. The file system can be mounted and is accessible while the defragmentation process is underway.
This section discusses policies and mechanisms that you can use to enhance file system performance under AIX.
Release-behind is a mechanism under which pages are freed as soon as they are either committed to permanent storage (by writes) or delivered to an application (by reads). This solution addresses a scaling problem when performing sequential I/O on very large files whose pages may not need to be re-accessed in the near future.
When writing a large file without using release-behind, writes will go very fast whenever there are available pages on the free list. When the number of pages drops to minfree, VMM uses its Least Recently Used (LRU) algorithm to find candidate pages for eviction. As part of this process, VMM needs to acquire a lock that is also being used for writing. This lock contention can cause a sharp performance degradation.
You enable this mechanism by specifying either the rbr flag (release-behind, sequential-read), the rbw flag (release-behind, sequential-write), or the rbrw flag (release-behind, sequential-read and -write) when issuing the mount command.
A side-effect of using the release-behind mechanism is that you will notice an increase in CPU utilization for the same read or write throughput rate without release-behind. This is due to the work of freeing pages, which would normally be handled at a later time by the LRU daemon. Also note that all file page accesses result in disk I/O since file data is not cached by VMM.
JFS allows you to defer updates of data into permanent storage. Delayed write operations save extra disk operations for files that are often rewritten. You can enable this feature by opening the file with the deferred update flag, O_DEFER. This feature caches the data, allowing faster read and write operations from other processes.
When writing to files that have been opened with this flag, data is not committed to permanent storage until a process issues the fsync command, forcing all updated data to be committed to disk. Also, if a process issues a synchronous write operation on a file, that is, the process has opened the file with the O_SYNC flag, the operation is not deferred even if the file was created with the O_DEFER flag.
Both JFS and Enhanced JFS offer support for Direct I/O access to files. This access method bypasses the file cache and transfers data directly from disk into the user space buffer, as opposed to using the normal cache policy of placing pages in kernel memory. For a description on how to tune Direct I/O, see Monitoring and Tuning Disk I/O.
The following table summarizes tunable parameters for JFS and Enhanced JFS file systems.
Function | JFS Tuning Parameter | Enhanced JFS Tuning Parameter |
---|---|---|
Sets the maximum amount of memory for caching files | vmtune -P maxperm | vmtune -t maxclient (less than or equal to maxperm) |
Sets the minimum amount of memory for caching files | vmtune -p minperm | No equivalent |
Sets a hard limit on memory for caching files | vmtune -h strict_maxperm | vmtune -t maxclient (always a hard limit) |
Sets the maximum pages used for sequential read ahead | vmtune -R maxpgahead | vmtune -Q j2_maxPageReadAhead |
Sets the mimimum pages used for sequential read ahead | vmtune -r minpgahead | vmtune -q j2_minPageReadAhead |
Sets the maximum number of pending I/Os to a file | chdev -l sys0 -a maxpout maxpout | chdev -l sys0 -a maxpout maxpout |
Sets the minimum number of peding I/Os to a file at which programs blocked by maxpout may proceed | chdev -l sys0 -a minpout minpout | chdev -l sys0 -a minpout minpout |
Sets the amount of modified data cache for a file with random writes | vmtune -W maxrandwrt | vmtune -J j2_maxRandomWrite vmtune -z j2_nRandomCluster |
Controls the gathering of I/Os for sequential write behind | vmtune -C numclust | vmtune -j j2_nPagesPerWriteBehindCluster |
Sets the number of file system bufstructs | vmtune -b numfsbufs | vmtune -Z j2_nBufferPerPagerDevice |