When you are processing normal I/O to JFS files, the I/O goes from the application buffer to the VMM and from there to the JFS. The contents of the buffer could get cached in RAM through the VMM's use of real memory as a file buffer cache. If the file cache hit rate is high, then this type of cached I/O is very effective by improving performance of JFS I/O. But applications that have poor cache hit rates or applications that do very large I/Os may not get much benefit from the use of normal cached I/O. In operating system version 4.3, direct I/O was introduced as an alternative I/O method for JFS files.
Direct I/O is only supported for program working storage (local persistent files). The main benefit of direct I/O is to reduce CPU utilization for file reads and writes by eliminating the copy from the VMM file cache to the user buffer. If the cache hit rate is low, then most read requests have to go to the disk. Writes are faster with normal cached I/O in most cases. But if a file is opened with O_SYNC or O_DSYNC (see Using sync/fsync Calls), then the writes have to go to disk. In these cases, direct I/O can benefit applications because the data copy is eliminated.
Another benefit is that it allows applications to avoid diluting the effectiveness of caching of other files, When a file is read or written, that file competes for space in the file cache which could cause other file data to get pushed out of the cache. If an application developer knows that certain files have poor cache-utilization characteristics, then only those files could be opened with O_DIRECT.
For direct I/O to work efficiently, the I/O request should be appropriate for the type of file system being used. The finfo() and ffinfo() subroutines can be used to query the offset, length, and address alignment requirements for fixed block size file systems, fragmented file systems, and bigfile file systems (direct I/O is not supported on compressed file systems). The information queried are contained in the structure diocapbuf as described in /usr/include/sys/finfo.h.
To avoid consistency issues, if there are multiple calls to open a file and one or more of the calls did not specify O_DIRECT and another open specified O_DIRECT, the file stays in the normal cached I/O mode. Similarly, if the file is mapped into memory through the shmat() or mmap() system calls, it stays in normal cached mode. If the last conflicting, non-direct access is eliminated, then the JFS will move the file into direct I/O mode (either by using the close(), munmap(), or shmdt() subroutines). Changing from normal mode to direct I/O mode can be expensive because all modified pages in memory will have to be flushed to disk at that point.
Direct I/O does not enhance raw I/O performance. Raw I/O performance is still slightly faster than direct I/O, but if you want the benefits of a JFS and enhanced performance, direct I/O is highly recommended.
Even though the use of direct I/O can reduce CPU usage, it typically results in longer elapsed times, especially for small I/O requests, because the requests would not be cached in memory.
Direct I/O reads cause synchronous reads from the disk, whereas with normal cached policy, the reads may be satisfied from the cache. This can result in poor performance if the data was likely to be in memory under the normal caching policy. Direct I/O also bypasses the VMM read-ahead algorithm because the I/Os do not go through the VMM. The read-ahead algorithm is very useful for sequential access to files because the VMM can initiate disk requests and have the pages already be resident in memory before the application has requested the pages. Applications can compensate for the loss of this read-ahead by using one of the following methods:
Direct I/O writes bypass the VMM and go directly to the disk, so that there can be a significant performance penalty; in normal cached I/O, the writes can go to memory and then be flushed to disk later (write-behind). Because direct I/O writes do not get copied into memory, when a sync operation is performed, it will not have to flush these pages to disk, thus reducing the amount of work the syncd daemon has to perform.
In the following example, performance is measured on an RS/6000 E30, 233
MHz, with AIX 4.3.1 (your performance will vary, depending on
system configuration). KBPS is the throughput in kilobytes per second,
and %CPU is CPU usage in percent.
# of 2.2 GB SSA Disks
# of PCI SSA Adapters | 1
1 | 2
1 | 4
1 | 6
1 | 8
1 |
Sequential read throughput, using normal I/O | |||||
KBPS | 7108 | 14170 | 18725 | 18519 | 17892 |
%CPU | 23.9 | 56.1 | 92.1 | 97.0 | 98.3 |
Sequential read throughput, using direct I/O | |||||
KBPS | 7098 | 14150 | 22035 | 27588 | 30062 |
%CPU | 4.4 | 9.1 | 22.0 | 39.2 | 54.4 |
Sequential read throughput, using raw I/O | |||||
KBPS | 7258 | 14499 | 28504 | 30946 | 32165 |
%CPU | 1.6 | 3.2 | 10.0 | 20.9 | 24.5 |
Direct I/O requires substantially fewer CPU cycles than regular I/O. I/O-intensive applications that do not get much benefit from the caching provided by regular I/O can enhance performance by using direct I/O. The benefits of direct I/O will grow in the future as increases in CPU speeds continue to outpace increases in memory speeds.
What types of programs are good candidates for direct I/O? Programs that are typically CPU-limited and perform lots of disk I/O. "Technical" codes that have large sequential I/Os are good candidates. Applications that do numerous small I/Os will typically see less performance benefit, because direct I/O cannot do read ahead or write behind. Applications that have benefited from striping are also good candidates.