[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

General Programming Concepts: Writing and Debugging Programs


Dynamic Processor Deallocation

Starting with machine type 7044 model 270, the hardware of all systems with more than two processors will be able to detect correctable errors, which are gathered by the firmware. These errors are not fatal and, as long as they remain rare occurrences, can be safely ignored. However, when a pattern of failures seems to be developing on a specific processor, this pattern may indicate that this component is likely to exhibit a fatal failure in the near future. This prediction is made by the firmware based-on-failure rates and threshold analysis.

AIX, on these systems, implements continuous hardware surveillance and regularly polls the firmware for hardware errors. When the number of processor errors hits a threshold and the firmware recognizes that there is a distinct probability that this system component will fail, the firmware returns an error report to AIX. In all cases, AIX logs the error in the system error log. In addition, on multiprocessor systems, depending on the type of failure, AIX attempts to stop using the untrustworthy processor and deallocate it. This feature is called Dynamic Processor Deallocation.

At this point, the processor is also flagged by the firmware for persistent deallocation for subsequent reboots, until maintenance personnel replaces the processor.

Potential Impact to Applications

This processor decallocation is transparent for the vast majority of applications, including drivers and kernel extensions. However, you can use AIX published interfaces to determine whether an application or kernel extension is running on a multiprocessor machine, find out how many processors there are, and bind threads to specific processors.

The interface for binding processes or threads to processors uses logical CPU numbers. The logical CPU numbers are in the range [0..N-1] where N is the total number of CPUs. To avoid breaking applications or kernel extensions that assume no "holes" in the CPU numbering, AIX always makes it appear for applications as if it is the "last" (highest numbered) logical CPU to be deallocated. For instance, on an 8-way SMP, the logical CPU numbers are [0..7]. If one processor is deallocated, the total number of available CPUs becomes 7, and they are numbered [0..6]. Externally, it looks like CPU 7 has disappeared, regardless of which physical processor failed. In the rest of this description, the term CPU is used for the logical entity and the term processor for the physical entity.

Applications or kernel extensions using processes/threads binding could potentially be broken if AIX silently terminated their bound threads or forcefully moved them to another CPU when one of the processors needs to be deallocated. Dynamic Processor Deallocation provides programming interfaces so that those applications and kernel extensions can be notified that a processor deallocation is about to happen. When these applications and kernel extensions get this notification, they are responsiblefor moving their bound threads and associated resources (such as timer request blocks) away form the last logical CPU and adapt themselves to the new CPU configuration.

If, after notification of applications and kernel extensions, some of the threads are still bound to the last logical CPU, the deallocation is aborted. In this case AIX logs the fact that the deallocation has been aborted in the error log and continues using the ailing processor. When the processor ultimately fails, it creates a total system failure. Thus, it is important for applications or kernel extensions binding threads to CPUs to get the notification of an impending processor deallocation, and act on this notice.

Even in the rare cases where the deallocation cannot go through, Dynamic Processor Deallocation still gives advanced warning to system administrators. By recording the error in the error log, it gives them a chance to schedule a maintenance operation on the system to replace the ailing component before a global system failure occurs.

Processor Deallocation: Flow of Events

The typical flow of events for processor deallocation is as follows:

  1. The firmware detects that a recoverable error threshold has been reached by one of the processors.
  2. AIX logs the firmware error report in the system error log, and, when executing on a machine supporting processor deallocation, start the deallocation process.
  3. AIX notifies non-kernel processes and threads bound to the last logical CPU.
  4. AIX waits for all the bound threads to move away from the last logical CPU. If threads remain bound, AIX eventually times out (after ten minutes)and aborts the deallocation
  5. Otherwise, AIX invokes the previously registered High Availability Event Handlers (HAEHs). An HAEH may return an error that will abort the deallocation.
  6. Otherwise, AIX goes on with the deallocation process and ultimately stops the failing processor.

In case of failure at any point of the deallocation, AIX logs the failure with the reason why the deallocation was aborted. The system administrator can look at the error log, take corrective action (when possible) and restart the deallocation. For instance, if the deallocation was aborted because at least one application did not unbind its bound threads, the system administrator could stop the application(s), restart the deallocation (which should go through this time) and restart the application.

Programming Interfaces

Existing AIX Interfaces Dealing with Individual Processors

The following is a list of existing interfaces:

Interfaces to Determine the Number if CPUs on a System

sysconf Subroutine

The sysconf subroutine returns a number of processors using:

_SC_NPROCESSORS_CONF: Number of processors configured.
_SC_NPROCESSORS_ONLN: Number of processors online.

For more information, see sysconf Subroutine in AIX 5L Version 5.1 Technical Reference: Base Operating System and Extensions Volume 2.

The value returned by sysconf for _SC_NPROCESSORS_CONF, will remain constant between reboots. The value returned for _SC_NPROCESSORS_ONLN will be the count of active CPUs and will be decremented every time a processor is deallocated.

_system_configuration.ncpus

_system_configuration.ncpus identifies the number of CPUs active on a machine. Uniprocessor (UP) machines are identified by a 1. Values greater than 1 indicate multiprocessor (MP) machines. For more information, see systemcfg.h File in AIX 5L Version 5.1 Files Reference.

Because of processor deallocations, the ncpus value may now change over time when processors are deallocated. Code that depends upon this value being a constant will probably fail. To prevent such code from suddenly changing between uniprocessor (UP) and multiprocessor (MP) behavior, a processor will not deallocate a if only two processors are currently active. Thus, if ncpus starts with a value greater than 1, it will be guaranteed to remain greater than 1 until the next reboot.

For code that must know how many processors were originally available at boot time, a new ncpus_cfg field is added to _system_configuration table that does remain constant between reboots.

The CPUs are identified by a logical CPU number in the range [0..(ncpus-1)]. The processors also have a physical CPU number which depends on which CPU board they are on, in which order, and so on. The commands and subroutines dealing with CPU numbers always use logical CPU numbers. To ease the transition to varying numbers of CPUs, the logical CPU numbers are contiguous numbers in the range [0..(ncpus-1)] in AIX 4.3.3. The effect of this is that from a user point of view, when a processor deallocation takes place, it always looks like the highest-numbered ("last") logical CPU is going away, regardless of which physical processor failed.

Note: To avoid problems, use the ncpus_cfg variable to determine what the highest possible logical CPU number is for a particular system.

Interfaces to Bind Threads to a Specific Processor

The bindprocessor and the bindprocessor() programming interface allow you to bind a thread or a process to a specific CPU, designated by its logical CPU number. Both interfaces will allow you to bind threads or processes only to active CPUs. They are mentioned here because those programs which directly use the bindprocessor() programming interface or are bound externally by a bindprocessor command must be able to handle the processor deallocation.

The primary problem seen by programs that bind to a processor when a CPU has been deallocated is that requests to bind to a deallocated processor will fail. Code that issues bindprocessor requests should always check the return value from those requests.

For more information on these interfaces, see bindprocessor Command in AIX 5L Version 5.1 Commands Reference, Volume 1 or bindprocessor Subroutine in AIX 5L Version 5.1 Technical Reference: Base Operating System and Extensions Volume 1.

Interfaces for Processor Deallocation Notification

The notification mechanism is different for user mode applications having threads bound to the last logical CPU and for kernel extensions.

Notification in User Mode

Each thread of a user mode application that is bound to the last logical CPU will be sent a new signal SIGCPUFAIL, which is ignored by default. These applications need to be modified to catch this new signal and dispose of the threads bound to the last logical CPU (either by unbinding them or by binding them to a different CPU).

Notification in Kernel Mode

The drivers and kernel extensions which need to be notified of an impending processor deallocation have to register a High-Availability Event Handler (HAEH) routine with the kernel. This routine will be called when a processor deallocation is imminent. An interface is also provided to unregister the HAEH before the kernel extension is unconfigured or unloaded.

Registering a High-Availability Event Handler

The kernel exports a new function to allow notification of the kernel extensions in case of events, which affect the availability of the system.

The system call is:

int register_HA_handler(ha_handler_ext_t *)

For more information on this system call, see register_HA_handler in AIX 5L Version 5.1 Technical Reference: Kernel and Subsystems Volume 1.

The return value is equal to 0 in case of success. A non zero value indicates a failure.

The argument is a pointer to a structure describing the kernel extension's high-availability event handler. This structure is defined in a new header file, named <sys/high_avail.h> as follows:

typedef struct _ha_handler_ext_ { 
    int (*_fun)();        /* Function to be invoked */ 
    long long _data;      /* Private data for (*_fun)() */ 
    char        _name[sizeof(long long) + 1]; 
} ha_handler_ext_t;

The private data field _data is provided for the use of the kernel extension if it is needed. Whatever value given in this field at the time of registration will be passed as a parameter to the registered fuction when it is called due to a CPU predictive failure event.

The _name field is a null terminated string with a maximum length of 8 characters (not including the null character terminator) which is used to uniquely identify the kernel extension with the kernel. This name has to be unique among all the registered kernel extensions. This name appears in the detailed data area of the CPU_DEALLOC_ABORTED error log entry if the kernel extension returns an error when the HAEH routine is called by the kernel.

Kernel extensions should register their HAEH only once.

Invocation of the High-Availability Event Handler

Two parameters call the HAEH routine. The first one is whatever is in the _data field of the ha_handler_ext_t structure passed to register_HA_handler. The second parameter is a pointer to a ha_event_t structure defined in <sys/high_avail.h> as:

typedef struct {                    /* High-availability related event */ 
    uint _magic;                    /* Identifies the kind of the event */ 
#define HA_CPU_FAIL 0x40505546      /* "CPUF" */ 
    union { 
        struct {                   /* Predictive processor failure */ 
            cpu_t dealloc_cpu;     /* Logical ID of failing processor */ 
                       ushort domain;         /* future extension */ 
            ushort nodeid;         /* future extension */ 
            ushort reserved3;      /* future extension */ 
            uint reserved[4];      /* future extension */ 
        } _cpu; 
        /* ... */                  /* Additional kind of events -- */ 
        /* future extension */ 
    } _u; 
} haeh_event_t;

The function should return one of the following codes, also defined in <sys/high_avail.h>.

#define HA_ACCEPTED 0     /* Positive acknowledgement */ 
#define HA_REFUSED -1     /* Negative acknowledgement */

If any of the registered extensions does not return HA_ACCEPTED, the deallocation is aborted. The HAEH routines are called in the process environment and do not need to be pinned.

If a kernel extension depends on the CPU configuration, its HAEH routine must react to the upcoming CPU deallocation. This is highly application dependent. To allow AIX to proceed with the deconfiguration, they just need to move away their threads bound to the last logical CPU, if any. Also, if they have been using timers started from bound threads, those timers will be moved to another CPU as part of the CPU deallocation. If they have any dependency on these timers being delivered to a specific CPU, they must take actions such as stopping them, and restart their timer requests when the threads are bound to a new CPU, for instance. Again, this is very much application dependent.

Canceling the Registration of a High-Availability Event Handler

To keep the system coherent, and prevent system crashes, the kernel extensions which register an HAEH must cancel the registration when they are unconfigured and are going to be unloaded. The interface is:

int unregister_HA_handler(ha_handler_ext_t *)

For more information on the system call, see unregister_HA_handler in AIX 5L Version 5.1 Technical Reference: Kernel and Subsystems Volume 1.

Returns 0 in case of success. Any non-zero return value indicates an error.

Test Environment

Hardware problems triggering a processor deallocation are, hopefully, very rare events. In order to test any of the modifications made in applications or kernel extensions to support this processor deallocation, a command is provided to trigger the deallocation of a CPU designated by its logical CPU number. The syntax is:

cpu_deallocate cpunum

where:

cpunum is a valid logical cpu number.

You must reboot the system to get the target processor back online. Hence, this command is provided for test purposes only and is not intended as a system administration tool.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]