[ Previous | Next | Contents | Home | Search ]
AIX Version 4.3 Kernel Extensions and Device Support Programming Concepts

Understanding System Call Execution

The system call handler gains control when a user program starts a system call. The system call handler changes the protection domain from the caller protection domain, user, to the system call protection domain, kernel, and switches to a protected stack.

The system call handler then calls the function supporting the system call. The loader maintains a table of the currently defined system calls for this purpose.

The system call runs within the calling process, but with more privilege than the calling process. This is because the protection domain has changed from user to kernel.

The system call function returns to the system call handler when it has performed its operation. The system call handler then restores the state of the process and returns to the user program.

There are two major protection domains in the operating system: the user mode protection domain and the kernel mode protection domain.

User Protection Domain

Programs that run in the user protection domain include those running within user processes and those within real-time processes. This protection domain implies that code runs in user execution mode and has:

Programs running in the user protection domain do not have access to the kernel or kernel data segments except indirectly through the use of system calls. A program in this protection domain can only affect its own execution environment and runs in the processor unprivileged state.

Kernel Protection Domain

Programs that run in the kernel protection domain include interrupt handlers, kernel processes, the base kernel, and kernel extensions (device drivers, system calls, and file systems). This protection domain implies that code runs in kernel execution mode and has the following access:

User data within the process address space must be accessed using kernel services. Programs running in this protection domain can affect the execution environments of all programs because they:

All kernel extensions run in the kernel protection domain as described above. The use of a system call by a user-mode process allows a kernel function to be called from user mode. Access to functions that directly or indirectly invoke system calls is typically provided by programming libraries providing access to operating system functions.

Actions of the System Call Handler

When a call is made in user mode that starts a system call, the system call handler is invoked. This system call handler switches the protection domain from user to kernel and performs the following steps:

  1. Sets privileged access to the process private address region.
  2. Sets privileged access to the kernel address regions.
  3. Sets the u t _error field in the uthread structure to 0.
  4. Switches to the kernel stack.
  5. Starts the specified kernel function (the target of the system call).

On return from the specified kernel function, the system call handler performs the following steps before returning to the caller:

  1. Switches back to the user stack.
  2. Updates the thread-specific errno variable if the u t _error field is not equal to 0.
  3. Clears the privileged access to the kernel address regions.
  4. Clears the privileged access to the process private region.
  5. Performs signal processing if a signal is pending.

The system call (and associated kernel function) runs within the context of the calling process, but with more privilege than the user-mode caller. This is because the system call handler has changed the protection domain from user state to kernel state. When the kernel function that was the target of the system call has performed the requested operation (or encountered an error), it returns to the system call handler. When this happens, the system call handler restores the state and protection domain back to user mode and returns control to the user program.

Accessing Kernel Data While in a System Call

The following information is provided to assist you in learning about accessing kernel data while in a system call:

Kinds of Kernel Data

Attention: Incorrectly modifying fields in kernel or user block structures can cause unpredictable results or system crashes while running in the kernel protection domain.

A system call can access data that the caller cannot because the system call is running in a more privileged protection domain. This applies to all kernel data, of which there are three general categories:

Passing Parameters to System Calls

The fact that a system call does not run on the same stack as the caller imposes one limitation. System calls are limited in the number of parameters they can use.

The operating system linkage convention passes some parameters in registers and the rest on the stack. The system call handler ensures that the first 8 words of the parameter list are accessible to the system call. All other parameters are not accessible.

For some languages, various types of parameters can take more than one word in the parameter list. The writer of a system call should be familiar with the way parameters are passed by the compiler and conform to this 8-word limit.

Preempting a System Call

The kernel allows a thread to be preempted by a more favored thread even when starting a system call. This is not typical of most operating systems. The kernel makes this change to enhance support for real-time processes and large multiuser systems.

System calls should use the lockl and unlockl kernel serviceslocking kernel services to serialize access to any global data that they access. Remember that all of the system call static data is located in global memory and therefore must be accessed serially.

The lockl kernel service ensures locking kernel services ensure that the owner of a lock runs with the most favored priority of any of the waiters of that lock. It does this by assigning to the lock owner the thread priority of the most favored waiter for the lock. This mechanism is similar to the standard operating system sleep priority. However, the thread priority must be assigned when the resource is allocated since the system call can be inactivated by preemption, as well as by calling sleep. Unlocking the lock restores the thread priority.

Note that a thread can be preempted even when it owns a lock. The lock only ensures that anothe thread that tries to lock the resource waits until the owner of the resource unlocks it. A system call must never return with a lock locked. By convention, a locking hierarchy is followed to prevent deadlocks. "Understanding Locking" provides more information on locking.

Handling Signals While in a System Call

Signals can be generated asynchronously or synchronously with respect to the process that receives the signal. An asynchronously generated signal is one that results from some action external to a process. It is not directly related to the current instruction stream of that process. Generally these are generated by other processes for interprocess communication or by device drivers.

A synchronously generated signal is one that results from the current instruction stream of the process. These signals cause interrupts. Examples of such cases are the calling of an illegal instruction, or an attempted data access to nonexistent address space. These are often referred to as exceptions.

Delivery of Signals to a System Call

The kernel delays the delivery of all signals, including SIGKILL, when starting a system call, device driver, or other kernel extension. The signal takes effect upon leaving the kernel and returning from the system call. This happens when the process returns to the user protection domain, just before running the first instruction at the caller return address. Signal delivery for kernel processes is described in "Using Kernel Processes".

Asynchronous Signals and Wait Termination

An asynchronous signal can alter the operation of a system call or kernel extension by terminating a long wait. Kernel services such as e_block_thread, e_sleep_thread, and et_wait all support terminating a wait by a signal. These services provide three options:

The sleep kernel service, provided for compatibility, also supports the PCATCH and SWAKEONSIG options to control the response to a signal during the sleep function.

Previously, the kernel automatically saved context on entry to the system call handler. As a result, any long (interruptible) sleep not specifying the PCATCH option returned control to the saved context when a signal interrupted the wait. The system call handler then set the errno global variable to EINTR and returned a return code of -1 from the system call.

The kernel, however, requires each system call that can directly or indirectly issue a sleep call without the PCATCH option to set up a saved context using the setjmpx kernel service. This is done to avoid overhead for system calls that handle waits terminated by signals. Using the setjmpx service, the system can set up a saved context sets the system call return code to a -1 and the ut_error field to EINTR, if a signal interrupts a long wait not specifying return-from-signal.

It is probably faster and more robust to specify return-from-signal on all long waits and use the return code to control the system call return.

Stacking Saved Contexts for Nested setjmpx Calls

The kernel supports nested calls to the setjmpx kernel service. It implements the stack of saved contexts by maintaining a linked list of context information anchored in the machine state save area. This area is in the user block structure for a process. Interrupt handlers have special machine state save areas.

An initial context is set up for each process by the initp kernel service for kernel processes and by the fork subroutine for user processes. The process terminates if that context is resumed.

Handling Exceptions While in a System Call

Exceptions are interrupts detected by the processor as a result of the current instruction stream. They therefore take effect synchronously with respect to the current process.

The default exception handling normally generates a signal if the process is in a state where signals are delivered without delay. If delivery of a signal can be delayed, however, default exception handling causes a dump.

Alternative Exception Handling Using the setjmpx Kernel Service

For certain types of exceptions, a system call can specify unique exception-handler routines through calls to the setjmpx service. The exception handler routine is saved as part of the stacked saved context. Each exception handler is passed the exception type as a parameter.

The exception handler returns a value that can specify any of the following:

In that case, the next exception handler in the stack of contexts is called. If none of the stacked exception handlers handle the exception, the kernel performs default exception handling. The setjmpx and longjmpx kernel services help implement exception handlers.

Understanding Nesting and Kernel-Mode Use of System Calls

The operating system supports nested system calls with some restrictions. System calls (and any other kernel-mode routines running under the process environment of a user-mode process) can use system calls that pass all parameters by value. System calls and other kernel-mode routines must not start system calls that have one or more parameters passed by reference. Doing so can result in a system crash. This is because system calls with reference parameters assume that the referenced data area is in the user protection domain. As a result, these system calls must use special kernel services to access the data. However, these services are unsuccessful if the data area they are trying to access is not in the user protection domain.

This restriction does not apply to kernel processes. User-mode data access services can distinguish between kernel processes and user-mode processes in kernel mode. As a result, these services can access the referenced data areas accessed correctly when the caller is a kernel process.

Kernel processes cannot call the fork or exec system calls, among others. A list of the base operating system calls available to system calls or other routines in kernel mode is provided in "System Calls Available to Kernel Extensions".

Page Faulting within System Calls

Attention: A page fault that occurs while external interrupts are disabled results in a system crash. Therefore, a system call should be programmed to ensure that its code, data, and stack are pinned before it disables external interrupts.

Most data accessed by system calls is pageable by default. This includes the system call code, static data, dynamically allocated data, and stack. As a result, a system call can be preempted in two ways:

In the latter case, even less-favored processes can run while the system call is waiting for the paging I/O to complete.

Returning Error Information from System Calls

Error information returned by system calls differs from that returned by kernel services that are not system calls. System calls typically provide a return code of 0 if no error has occurred, or -1 if an error has occurred. In the latter case, the error value is placed in the ut_error field of the thread's uthread structure . In some cases, when data is returned by the return code, a data value of -1 indicates error. Or alternatively, a value of NULL can indicate error, depending on the interface and function definition of the system call.

In any case, when an error condition is to be returned, the ut_error field should be updated by the system call prior to returning from the system call function. The ut_error field is accessed by using the getuerror and setuerror kernel services.

Before actually calling the system call function, the system call handler sets the ut_error field to 0. Upon return from the system call function, the system call handler copies the value found in ut_error into the thread-specific errno variable if ut_error was nonzero. After setting the errno variable, the system call handler returns to user mode with the return code provided by the system call function.

Kernel-mode callers of system calls must be aware of this return code convention and use the getuerror kernel service to obtain the error value when an error indication is returned by the system call. When system calls are nested, the system call function called by the system call handler can return the error value provided by the nested system call function or can replace this value with a new one by using the setuerror kernel service.

Related Information

System Calls Kernel Extension Overview.

System Calls Available to Kernel Extensions.

Understanding Kernel Threads

Using Kernel Processes.

Using Libraries.

Understanding Locking.

Locking Kernel Services.

Understanding Interrupts.

The getuerror kernel service, setuerror kernel service. The initp kernel service, lockl kernel service, e_sleep kernel service, e_sleepl kernel service, et_wait kernel service, setjmpx kernel service, longjmpx kernel service, unlockl kernel service.

The fork subroutine.


[ Previous | Next | Contents | Home | Search ]