Understanding Exception Handling

[ Previous | Next | Contents | Home | Search ]

AIX Version 4.3 Kernel Extensions and Device Support Programming Concepts

Understanding Exception Handling

Exception handling involves a basic distinction between interrupts and exceptions:

An interrupt is an asynchronous event and is not associated with the instruction that is executing when the interrupt occurs.
An exception is a synchronous event and is directly caused by the instruction that is executing when the exception occurs.

The computer hardware generally uses the same mechanism to report both interrupts and exceptions. The machine saves and modifies some of its state and forces a branch to a particular location. When decoding the reason for the machine interrupt, the interrupt handler determines whether the event is an interrupt or an exception, then processes the event accordingly.

Note: Ordinary page faults are treated more like interrupts than exceptions. The only difference between a page-fault interrupt and other interrupts is that the interrupted program is not dispatchable until the page fault is resolved.

Exception Processing

When an exception occurs, the current instruction stream cannot continue. If you ignore the exception, the results of executing the instruction may become undefined. Further execution of the program may cause unpredictable results. The kernel provides a default exception-handling mechanism by which an instruction stream (a process- or interrupt-level program) can specify what action is to be taken when an exception occurs. Exceptions are handled differently depending on whether they occurred while executing in kernel mode or user mode.

Default Exception-Handling Mechanism

If no exception handler is currently defined when an exception occurs, typically one of two things happens:

If the exception occurs while a process is executing in user mode, the process is sent a signal relevant to the type of exception.
If the exception occurs while in kernel mode, the system halts.

Kernel-Mode Exception Handling

Exception handling in kernel mode extends the setjump and longjump subroutines context-save-and-restore mechanism by providing setjmpx and longjmpx kernel services to handle exceptions. The traditional system mechanism is extended by allowing these exception handlers (or context-save checkpoints) to be stacked on a per-process or per-interrupt handler basis.

This stacking mechanism allows the execution point and context of a process or interrupt handler to be restored to a point in the process or interrupt handler, at the point of return from the setjmpx kernel service. When execution returns to this point, the return code from setjmpx kernel service indicates the type of exception that occurred so that the process or interrupt handler state can be fully restored. Appropriate retry or recovery operations are then invoked by the software performing the operation.

When an exception occurs, the kernel first-level exception handler gets control. The first-level exception determines what type of exception has occurred and saves information necessary for handling the specific type of exception. For an I/O exception, the first-level handler also enables again the programmed I/O operations.

The first-level exception handler then modifies the saved context of the interrupted process or interrupt handler. It does so to execute the longjmpx kernel service when the first-level exception handler returns to the interrupted process or interrupt handler.

The longjmpx kernel service executes in the environment of the code that caused the exception and restores the current context from the topmost jump buffer on the stack of saved contexts. As a result, the state of the process or interrupt handler that caused the exception is restored to the point of the return from the setjmpx kernel service. (The return code, nevertheless, indicates that an exception has occurred.)

The process or interrupt handler software should then check the return code and invoke exception handling code to restore fully the state of the process or interrupt handler. Additional information about the exception can be obtained by using the getexcept kernel service.

User-Defined Exception Handling

A typical exception handler should do the following:

Perform any necessary clean-up such as freeing storage or segment registers and releasing other resources.
If the exception is recognized by the current handler and can be handled entirely within the routine, the handler should establish itself again by calling the setjmpx kernel service. This allows normal processing to continue.
If the exception is not recognized by the current handler, it must be passed to the previously stacked exception handler. The exception is passed by calling the longjmpx kernel service, which either calls the previous handler (if any) or takes the system's default exception-handling mechanism.
If the exception is recognized by the current handler but cannot be handled, it is treated as though it is unrecognized. The longjmpx kernel service is called, which either passes the exception along to the previous handler (if any) or takes the system default exception-handling mechanism.

When a kernel routine that has established an exception handler completes normally, it must remove its exception handler from the stack (by using the clrjmpx kernel service) before returning to its caller.

Note: When the longjmpx kernel service invokes an exception handler, that handler's entry is automatically removed from the stack.

Implementing Kernel Exception Handlers

The following information is provided to assist you in implementing kernel exception handlers:

The setjmpx, longjmpx, and clrjmpx Kernel Services

Exception Handler Environment

Restrictions on Using the setjmpx Kernel Service

Exception Codes

Hardware Detection of Exceptions

setjmpx, longjmpx, and clrjmpx Kernel Services

The setjmpx kernel service provides a way to save the following portions of the program state at the point of a call:

Nonvolatile general registers
Stack pointer
TOC pointer
Interrupt priority number (intpri)
Ownership of kernel-mode lock

This state can be restored later by calling the longjmpx kernel service, which accomplishes the following tasks:

Reloads the registers (including TOC and stack pointers).
Enables or disables to the correct interrupt level.
Conditionally acquires or releases the kernel-mode lock.
Forces a branch back to the point of original return from the setjmpx kernel service.

The setjmpx kernel service takes the address of a jump buffer (a label_t structure) as an explicit parameter. This structure can be defined anywhere including on the stack (as an automatic variable). After noting the state data in the jump buffer, the setjmpx kernel service pushes the buffer onto the top of a stack that is maintained in the machine-state save structure.

The longjmpx kernel service is used to return to the point in the code at which the setjmpx kernel service was called. Specifically, the longjmpx kernel service returns to the most recently created jump buffer, as indicated by the top of the stack anchored in the machine-state save structure.

The parameter to the longjmpx kernel service is an exception code that is passed to the resumed program as the return code from the setjmp kernel service. The resumed program tests this code to determine the conditions under which the setjmpx kernel service is returning. If the setjmpx kernel service has just saved its jump buffer, the return code is 0. If an exception has occurred, the program is entered by a call to the longjmpx kernel service, which passes along a return code that is not equal to 0.

Note: Only the resources listed here are saved by the setjmpx kernel service and restored by the longjmpx kernel service. Other resources, in particular segment registers, are not restored. A call to the longjmpx kernel service, by definition, returns to an earlier point in the program. The program code must free any resources that are allocated between the call to the setjmpx kernel service and the call to the longjmpx kernel service.

If the exception handler stack is empty when the longjmpx kernel service is issued, there is no place to jump to and the system default exception-handling mechanism is used. If the stack is not empty, the context that is defined by the topmost jump buffer is reloaded and resumed. The topmost buffer is then removed from the stack.

The clrjmpx kernel service removes the top element from the stack as placed there by the setjmpx kernel service. The caller to the clrjmpx kernel service is expected to know exactly which jump buffer is being removed. This should have been established earlier in the code by a call to the setjmpx kernel service. Accordingly, the address of the buffer is required as a parameter to the clrjmpx kernel service. It can then perform consistency checking by asserting that the address passed is indeed the address of the top stack element.

Exception Handler Environment

The stacked exception handlers run in the environment in which the exception occurs. That is, an exception occurring in a process environment causes the next dispatch of the process to run the exception handler on the top of the stack of exception handlers for that process. An exception occurring in an interrupt handler causes the interrupt handler to return to the context saved by the last call to the setjmpx kernel service made by the interrupt handler.

Note: An interrupt handler context is newly created each time the interrupt handler is invoked. As a result, exception handlers for interrupt handlers must be registered (by calling the setjmpx kernel service) each time the interrupt handler is invoked. Otherwise, an exception detected during execution of the interrupt handler will be handled by the default handler.

Restrictions on Using the setjmpx Kernel Service

Process and interrupt handler routines registering exception handlers with the setjmpx kernel service must not return to their caller before removing the saved jump buffer or buffers from the list of jump buffers. A saved jump buffer can be removed by invoking the clrjmpx kernel service in the reverse order of the setjmpx calls. The saved jump buffer must be removed before return because the routine's context no longer exists once the routine has returned to its caller.

If, on the other hand, an exception does occur (that is, the return code from setjmpx kernel service is nonzero), the jump buffer is automatically removed from the list of jump buffers. In this case, a call to the clrjmpx kernel service for the jump buffer must not be performed.

Care must also be taken in defining variables that are used after the context save (the call to the setjmpx service), and then again by the exception handler. Sensitive variables of this nature must be restored to their correct value by the exception handler when an exception occurs.

Note: If the last value of the variable is desired at exception time, the variable data type must be declared as "volatile."

Exception handling is concluded in one of two ways. Either a registered exception handler handles the exception and continues from the saved context, or the default exception handler is reached by exhausting the stack of jump buffers.

Exception Codes

The /usr/include/sys/except.h file contains a list of code numbers corresponding to the various types of hardware exceptions. When an exception handler is invoked (the return from the setjmpx kernel service is not equal to 0), it is the responsibility of the handler to test the code to ensure that the exception is one the routine can handle. If it is not an expected code, the exception handler must:

Release any resources that would not otherwise be freed (buffers, segment registers, storage acquired using the xmalloc routines).
Call the longjmpx kernel service, passing it the exception code as a parameter.

Thus, when an exception handler does not recognize the exception for which it has been invoked, it passes the exception on to the next most recent exception handler. This continues until an exception handler is reached that recognizes the code and can handle it. Eventually, if no exception handler can handle the exception, the stack is exhausted and the system default action is taken.

In this manner, a component can allocate resources (after calling the setjmpx kernel service to establish an exception handler) and be assured that the resources will later be released. This ensures the exception handler gets a chance to release those resources regardless of what events occur before the instruction stream (a process- or interrupt-level code) is terminated.

By coding the exception handler to recognize what exception codes it can process, (rather than encoding this knowledge in the stack entries), a powerful and simple-to-use mechanism is created. Each handler need only investigate the exception code that it receives rather than just assuming that it was invoked because a particular exception has occurred to implement this scheme, the set of exception codes used cannot have duplicates.

Exceptions generated by hardware use one of the codes in the /usr/include/sys/except.h file. However, the longjmpx kernel service can be invoked by any kernel component, and any integer can serve as the exception code. A mechanism similar to the old-style setjmp and longjmp kernel services can be implemented on top of the setjmpx/longjmpx stack by using exception codes outside the range of those used for hardware exceptions.

To implement this old-style mechanism, a unique set of exception codes is needed. These codes must not conflict with either the pre-assigned hardware codes or codes used by any other component. A simple way to get such codes is to use the addresses of unique objects as code values.

For example, a program that establishes an exception handler might compare the exception code to the address of its own entry point (that is, by using its function descriptor). Later on in the calling sequence, after any number of intervening calls to the setjmpx kernel service by other programs, a program can issue a call to the longjmpx kernel service and pass the address of the agreed-on function descriptor as the code. This code is only recognized by a single exception handler. All the intervening ones just clean up their resources and pass the code to the longjmpx kernel service again.

Addresses of function descriptors are not the only possibilities for unique code numbers. For example, addresses of external variables can also be used. By using addresses that are resolved to unique values by the binder and loader, the problem of code-space collision is transformed into a problem of external-name collision. This problem is easier to solve, and is routinely solved whenever the system is built. By comparison, pre-assigning exception numbers by using #define statements in a header file is a much more cumbersome and error-prone method.

Hardware Detection of Exceptions

Each of the exception types results in a hardware interrupt. For each such interrupt, a first-level interrupt handler (FLIH) saves the state of the executing program and calls a second-level handler (SLIH). The SLIH is passed a pointer to the machine-state save structure and a code indicating the cause of the interrupt.

When a SLIH determines that a hardware interrupt should actually be considered a synchronous exception, it sets up the machine-state save to invoke the longjmpx kernel service, and then returns. The FLIH then resumes the instruction stream at the entry to the longjmpx service.

The longjmpx service then invokes the top exception handler on the stack or takes the system default action as previously described.

User-Mode Exception Handling

Exceptions that occur in a user-mode process and are not automatically handled by the kernel cause the user-mode process to be signaled. If the process is in a state in which it cannot take the signal, it is terminated and the information logged. Kernel routines can install user-mode exception handlers that catch exceptions before they are signaled to the user-mode process.

The uexadd and uexdel kernel services allow system-wide user-mode exception handlers to be added and removed.

The most recently registered exception handler is the first called. If it cannot handle the exception, the next most recent handler on the list is called, and this second handler attempts to handle the exception. If this attempt fails, successive handlers are tried, until the default handler is called, which generates the signal.

Additional information about the exception can be obtained by using the getexcept kernel service.

Related Information

Handling Signals While in a System Call.

The setjmpx kernel service longjmpx kernel service, clrjmpx kernel service getexcept kernel service, xmalloc kernel service, uexadd kernel service and uexdel kernel service.

[ Previous | Next | Contents | Home | Search ]