Kernel Extensions and Device Support Programming Concepts

Asynchronous I/O Subsystem

Synchronous I/O occurs while you wait. Applications processing cannot continue until the I/O operation is complete.

In contrast, asynchronous I/O operations run in the background and do not block user applications. This improves performance, because I/O operations and applications processing can run simultaneously.

Using asynchronous I/O will usually improve your I/O throughput, especially when you are storing data in raw logical volumes (as opposed to Journaled file systems). The actual performance, however, depends on how many server processes are running that will handle the I/O requests.

Many applications, such as databases and file servers, take advantage of the ability to overlap processing and I/O. These asynchronous I/O operations use various kinds of devices and files. Additionally, multiple asynchronous I/O operations can run at the same time on one or more devices or files.

Each asynchronous I/O request has a corresponding control block in the application's address space. When an asynchronous I/O request is made, a handle is established in the control block. This handle is used to retrieve the status and the return values of the request.

Applications use the aio_read and aio_write subroutines to perform the I/O. Control returns to the application from the subroutine, as soon as the request has been queued. The application can then continue processing while the disk operation is being performed.

A kernel process (kproc), called a server, is in charge of each request from the time it is taken off the queue until it completes. The number of servers limits the number of disk I/O operations that can be in progress in the system simultaneously.

The default values are minservers=1 and maxservers=10. In systems that seldom run applications that use asynchronous I/O, this is usually adequate. For environments with many disk drives and key applications that use asynchronous I/O, the default is far too low. The result of a deficiency of servers is that disk I/O seems much slower than it should be. Not only do requests spend inordinate lengths of time in the queue, but the low ratio of servers to disk drives means that the seek-optimization algorithms have too few requests to work with for each drive.

Note

Asynchronous I/O will not work if the control block or buffer is created using mmap (mapping segments).

In AIX 5.2 there are two Asynchronous I/O Subsystems. The original AIX AIO, now called LEGACY AIO, has the same function names as the posix compliant POSIX AIO. The major differences between the two involve different parameter passing. Both subsytems are defined in the /usr/include/sys/aio.h file. The _AIO_AIX_SOURCE macro is used to distinguish between the two versions.

Note

The _AIO_AIX_SOURCE macro used in the /usr/include/sys/aio.h file must be defined when using this file to compile an aio application with the LEGACY AIO function definitions. The default compile using the aio.h file is for an application with the new POSIX AIO definitions. To use the LEGACY AIO function defintions do the following in the source file:

#define _AIO_AIX_SOURCE 
#include <sys/aio.h>

or when compiling on the command line, type the following:

xlc ... -D_AIO_AIX_SOURCE ... classic_aio_program.c

For each aio function there is a legacy and a posix definition. LEGACY AIO has an additional aio_nwait function, which although not a part of posix definitions has been included in POSIX AIO to help those who want to port from LEGACY to POSIX definitions. POSIX AIO has an additional aio_fsync function, which is not included in LEGACY AIO. For a list of these functions, see Asynchronous I/O Subroutines.

How Do I Know if I Need to Use AIO?

Using the vmstat command with an interval and count value, you can determine if the CPU is idle waiting for disk I/O. The wa column details the percentage of time the CPU was idle with pending local disk I/O.

If there is at least one outstanding I/O to a local disk when the wait process is running, the time is classified as waiting for I/O. Unless asynchronous I/O is being used by the process, an I/O request to disk causes the calling process to block (or sleep) until the request has been completed. Once a process's I/O request completes, it is placed on the run queue.

A wa value consistently over 25 percent may indicate that the disk subsystem is not balanced properly, or it may be the result of a disk-intensive workload.

Note

AIO will not relieve an overly busy disk drive. Using the iostat command with an interval and count value, you can determine if any disks are overly busy. Monitor the %tm_act column for each disk drive on the system. On some systems, a %tm_act of 35.0 or higher for one disk can cause noticeably slower performance. The relief for this case could be to move data from more busy to less busy disks, but simply having AIO will not relieve an overly busy disk problem.

SMP Systems

For SMP systems, the us, sy, id and wa columns are only averages over all processors. But keep in mind that the I/O wait statistic per processor is not really a processor-specific statistic; it is a global statistic. An I/O wait is distinguished from idle time only by the state of a pending I/O. If there is any pending disk I/O, and the processor is not busy, then it is an I/O wait time. Disk I/O is not tracked by processors, so when there is any I/O wait, all processors get charged (assuming they are all equally idle).

How Many AIO Servers Am I Currently Using?

The following command will tell you how many AIO Servers (aios) are currently running (you must run this command as the "root" user):

 pstat -a | grep aios | wc -l

If the disk drives that are being accessed asynchronously are using either the Journaled File System (JFS) or the Enhanced Journaled File System (JFS2), all I/O will be routed through the aios kprocs.

If the disk drives that are being accessed asynchronously are using a form of raw logical volume management, then the disk I/O is not routed through the aios kprocs. In that case the number of servers running is not relevant.

However, if you want to confirm that an application that uses raw logic volumes is taking advantage of AIO, you can disable the fast path option via SMIT. When this option is disabled, even raw I/O will be forced through the aios kprocs. At that point, the pstat command listed in preceding discussion will work. You would not want to run the system with this option disabled for any length of time. This is simply a suggestion to confirm that the application is working with AIO and raw logical volumes.

At releases earlier than AIX 4.3, the fast path is enabled by default and cannot be disabled.

How Many AIO Servers Do I Need?

Here are some suggested rules of thumb for determining what value to set maximum number of servers to:

The first rule of thumb suggests that you limit the maximum number of servers to a number equal to ten times the number of disks that are to be used concurrently, but not more than 80. The minimum number of servers should be set to half of this maximum number.
Another rule of thumb is to set the maximum number of servers to 80 and leave the minimum number of servers set to the default of 1 and reboot. Monitor the number of additional servers started throughout the course of normal workload. After a 24-hour period of normal activity, set the maximum number of servers to the number of currently running aios + 10, and set the minimum number of servers to the number of currently running aios - 10.
In some environments you may see more than 80 aios KPROCs running. If so, consider the third rule of thumb.
A third suggestion is to take statistics using vmstat -s before any high I/O activity begins, and again at the end. Check the field iodone. From this you can determine how many physical I/Os are being handled in a given wall clock period. Then increase the maximum number of servers and see if you can get more iodones in the same time period.

Prerequisites

To make use of asynchronous I/O the following fileset must be installed:

 bos.rte.aio

To determine if this fileset is installed, use:

 lslpp -l bos.rte.aio

You must also make the aio0 or posix_aio0 device available using SMIT.

smit chgaio
smit chgposixaio

STATE to be configured at system restart available

smit aio
smit posixaio

Configure aio now

Functions of Asynchronous I/O

Functions provided by the asynchronous I/O facilities are:

Large File-Enabled Asynchronous I/O

The fundamental data structure associated with all asynchronous I/O operations is struct aiocb. Within this structure is the aio_offset field which is used to specify the offset for an I/O operation.

Due to the signed 32-bit definition of aio_offset, the default asynchronous I/O interfaces are limited to an offset of 2G minus 1. To overcome this limitation, a new aio control block with a signed 64-bit offset field and a new set of asynchronous I/O interfaces has been defined. These 64-bit definitions end with "64".

The large offset-enabled asynchronous I/O interfaces are available under the _LARGE_FILES compilation environment and under the _LARGE_FILE_API programming environment. For further information, see Writing Programs That Access Large Files in AIX 5L Version 5.2 General Programming Concepts: Writing and Debugging Programs.

Under the _LARGE_FILES compilation environment, asynchronous I/O applications written to the default interfaces see the following redefinitions:

Item	Redefined To Be	Header File
struct aiocb	struct aiocb64	sys/aio.h
aio_read()	aio_read64()	sys/aio.h
aio_write()	aio_write64()	sys/aio.h
aio_cancel()	aio_cancel64()	sys/aio.h
aio_suspend()	aio_suspend64()	sys/aio.h
aio_listio()	aio_listio64()	sys/aio.h
aio_return()	aio_return64()	sys/aio.h
aio_error()	aio_error64()	sys/aio.h

For information on using the _LARGE_FILES environment, see Porting Applications to the Large File Environment in AIX 5L Version 5.2 General Programming Concepts: Writing and Debugging Programs

In the _LARGE_FILE_API environment, the 64-bit API interfaces are visible. This environment requires recoding of applications to the new 64-bit API name. For further information on using the _LARGE_FILE_API environment, see Using the 64-Bit File System Subroutines in AIX 5L Version 5.2 General Programming Concepts: Writing and Debugging Programs

Nonblocking I/O

After issuing an I/O request, the user application can proceed without being blocked while the I/O operation is in progress. The I/O operation occurs while the application is running. Specifically, when the application issues an I/O request, the request is queued. The application can then resume running before the I/O operation is initiated.

To manage asynchronous I/O, each asynchronous I/O request has a corresponding control block in the application's address space. This control block contains the control and status information for the request. It can be used again when the I/O operation is completed.

Notification of I/O Completion

After issuing an asynchronous I/O request, the user application can determine when and how the I/O operation is completed. This information is provided in three ways:

The application can poll the status of the I/O operation.
The system can asynchronously notify the application when the I/O operation is done.
The application can block until the I/O operation is complete.

Polling the Status of the I/O Operation

The application can periodically poll the status of the I/O operation. The status of each I/O operation is provided in the application's address space in the control block associated with each request. Portable applications can retrieve the status by using the aio_error subroutine.The aio_suspend subroutine suspends the calling process until one or more asynchronous I/O requests are completed.

Asynchronously Notifying the Application When the I/O Operation Completes

Asynchronously notifying the I/O completion is done by signals. Specifically, an application may request that a SIGIO signal be delivered when the I/O operation is complete. To do this, the application sets a flag in the control block at the time it issues the I/O request. If several requests have been issued, the application can poll the status of the requests to determine which have actually completed.

Blocking the Application until the I/O Operation Is Complete

The third way to determine whether an I/O operation is complete is to let the calling process become blocked and wait until at least one of the I/O requests it is waiting for is complete. This is similar to synchronous style I/O. It is useful for applications that, after performing some processing, need to wait for I/O completion before proceeding.

Cancellation of I/O Requests

I/O requests can be canceled if they are cancelable. Cancellation is not guaranteed and may succeed or not depending upon the state of the individual request. If a request is in the queue and the I/O operations have not yet started, the request is cancellable. Typically, a request is no longer cancelable when the actual I/O operation has begun.

Asynchronous I/O Subroutines

Note

The 64-bit APIs are as follows:

The following subroutines are provided for performing asynchronous I/O:

Subroutine	Purpose
aio_cancel or aio_cancel64	Cancels one or more outstanding asynchronous I/O requests.
aio_error or aio_error64	Retrieves the error status of an asynchronous I/O request.
aio_fsync	Synchronizes asynchronous files.
lio_listio or lio_listio64	Initiates a list of asynchronous I/O requests with a single call.
aio_nwait	Suspends the calling process until n asynchronous I/O requests are completed.
aio_read or aio_read64	Reads asynchronously from a file.
aio_return or aio_return64	Retrieves the return status of an asynchronous I/O request.
aio_suspend or aio_suspend64	Suspends the calling process until one or more asynchronous I/O requests is completed.
aio_write or aio_write64	Writes asynchronously to a file.

Order and Priority of Asynchronous I/O Calls

An application may issue several asynchronous I/O requests on the same file or device. However, because the I/O operations are performed asynchronously, the order in which they are handled may not be the order in which the I/O calls were made. The application must enforce ordering of its own I/O requests if ordering is required.

Priority among the I/O requests is not currently implemented. The aio_reqprio field in the control block is currently ignored.

For files that support seek operations, seeking is allowed as part of the asynchronous read or write operations. The whence and offset fields are provided in the control block of the request to set the seek parameters. The seek pointer is updated when the asynchronous read or write call returns.

Subroutines Affected by Asynchronous I/O

The following existing subroutines are affected by asynchronous I/O:

The close subroutine
The exit subroutine
The exec subroutine
The fork subroutine

If the application closes a file, or calls the _exit or exec subroutines while it has some outstanding I/O requests, the requests are canceled. If they cannot be canceled, the application is blocked until the requests have completed. When a process calls the fork subroutine, its asynchronous I/O is not inherited by the child process.

One fundamental limitation in asynchronous I/O is page hiding. When an unbuffered (raw) asynchronous I/O is issued, the page that contains the user buffer is hidden during the actual I/O operation. This ensures cache consistency. However, the application may access the memory locations that fall within the same page as the user buffer. This may cause the application to block as a result of a page fault. To alleviate this, allocate page aligned buffers and do not touch the buffers until the I/O request using it has completed.

Changing Attributes for Asynchronous I/O

You can change attributes relating to asynchronous I/O using the chdev command or SMIT. Likewise, you can use SMIT to configure and remove (unconfigure) asynchronous I/O. (Alternatively, you can use the mkdev and rmdev commands to configure and remove asynchronous I/O). To start SMIT at the main menu for asynchronous I/O, enter smit aio or smit posixaio.

MINIMUM number of servers

indicates the minimum number of kernel processes dedicated to asynchronous I/O processing. Because each kernel process uses memory, this number should not be large when the amount of asynchronous I/O expected is small.

MAXIMUM number of servers

indicates the maximum number of kernel processes dedicated to asynchronous I/O processing. There can never be more than this many asynchronous I/O requests in progress at one time, so this number limits the possible I/O concurrency.

Maximum number of REQUESTS

indicates the maximum number of asynchronous I/O requests that can be outstanding at one time. This includes requests that are in progress as well as those that are waiting to be started. The maximum number of asynchronous I/O requests cannot be less than the value of AIO_MAX, as defined in the /usr/include/sys/limits.h file, but it can be greater. It would be appropriate for a system with a high volume of asynchronous I/O to have a maximum number of asynchronous I/O requests larger than AIO_MAX.

Server PRIORITY

indicates the priority level of kernel processes dedicated to asynchronous I/O. The lower the priority number is, the more favored the process is in scheduling. Concurrency is enhanced by making this number slightly less than the value of PUSER, the priority of a normal user process. It cannot be made lower than the values of PRI_SCHED.

Because the default priority is (40+nice), these daemons will be slightly favored with this value of (39+nice). If you want to favor them more, make changes slowly. A very low priority can interfere with the system process that require low priority.

Attention: Raising the server PRIORITY (decreasing this numeric value) is not recommended because system hangs or crashes could occur if the priority of the AIO servers is favored too much. There is little to be gained by making big priority changes.

PUSER and PRI_SCHED are defined in the /usr/include/sys/pri.h file.

STATE to be configured at system restart

indicates the state to which asynchronous I/O is to be configured during system initialization. The possible values are 1.) defined, which indicates that the asynchronous I/O will be left in the defined state and not available for use, and 2.) available, indicating that asynchronous I/O will be configured and available for use.

STATE of FastPath

Disabling this option forces all I/O activity through the aios kprocs, including I/O activity involving raw logical volumes. In earlier releases, the fast path is enabled by default and cannot disabled.

64-bit Enhancements

Asynchronous I/O (AIO) has been enhanced to support 64-bit enabled applications. On 64-bit platforms, both 32-bit and 64-bit AIO can occur simultaneously.

The struct aiocb, the fundamental data structure associated with all asynchronous I/O operation, has changed. The element of this struct, aio_return, is now defined as ssize_t. Previously, it was defined as an int. AIO supports large files by default. An application compiled in 64-bit mode can do AIO to a large file without any additional #define or special opening of those files.