[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

General Programming Concepts:
Writing and Debugging Programs

Dynamic Logical Partitioning

Partitioning your system is similar to partitioning a hard drive. When you partition a hard drive, you divide a single physical hard drive so that the operating system recognizes it as a number of separate logical hard drives. On each of these divisions, called partitions, you can install an operating system and use each partition as you would a separate physical system.

A logical partition (LPAR) is the division of a computer's processors, memory, and hardware resources into multiple environments so that each environment can be operated independently with its own operating system and applications. The number of logical partitions that can be created depends on the system. Typically, partitions are used for different purposes, such as database operation, client/server operations, Web server operations, test environments, and production environments. Each partition can communicate with the other partitions as if each partition is a separate machine.

Dynamic logical partitioning (DLPAR) provides the ability to logically attach and detach a managed system's resources to and from a logical partition's operating system without rebooting. Some of the features of DLPAR include:

DLPAR requests are built from simple add and remove requests that are directed to logical partitions. The user can execute these commands as move requests at the Hardware Management Console (HMC), which manages all DLPAR operations. DLPAR operations are enabled by pSeries firmware and AIX.

DLPAR-Safe and -Aware Programs

A DLPAR-safe program is one that does not fail as a result of DLPAR operations. Its performance might suffer when resources are removed and it might not scale with the addition of new resources, but the program still works as expected In fact, a DLPAR-safe program can prevent a DLPAR operation from succeeding because it has a dependency that the operating system is obligated to honor.

A DLPAR-aware program is one that has DLPAR code that is designed to adjust its use of system resources as the actual capacity of the system varies over time. This can be accomplished in the following ways:

DLPAR-aware programs must be designed, at minimum, to avoid introducing conditions that might cause DLPAR operations to fail. At maximum, DLPAR-aware programs are concerned with performance and scalability. This is a much more complicated task because buffers might need to be drained and resized to maintain expected levels of performance when memory is added or removed. In addition, the number of threads must be dynamically adjusted to account for changes in the number of online processors. These thread-based adjustments are not necessarily limited to processor-based decisions. For example, the best way to reduce memory consumption in Java programs might be to reduce the number of threads, because this reduces the number of active objects that need to be processed by the Java Virtual Machine's garbage collector.

Most applications are DLPAR-safe by default.

Making Programs DLPAR-Safe

The following types of errors, which represent binary compatibility exposures, can be introduced by DLPAR:

Note
These errors are a result of processor addition.

Making Programs DLPAR-Aware

A DLPAR-aware program is one that is designed to recognize and dynamically adapt to changes in the system configuration. This code need not subscribe to the DLPAR model of awareness, but can be structured more generally in the form of a system resource monitor that regularly polls the system to discover changes in the system configuration. This approach can be used to achieve some limited performance-related goals, but because it is not tightly integrated with DLPAR, it cannot be effectively used to manage large-scale changes to the system configuration. For example, the polling model might not be suitable for a system that supports processor hot plug, because the hot-pluggable unit might be composed of several processor and memory boards. Nor can it be used to manage application-specific dependencies, such as processor bindings, that need to be resolved before the DLPAR processor remove event is started.

The following types of applications can exploit DLPAR technology:

Dynamic logical partitioning of large memory pages is not supported. The amount of memory that is preallocated to the large page pool can have a material impact on the DLPAR capabilities of the partition regarding memory. A memory region that contains a large page cannot be removed. Therefore, application developers might want to provide an option to not use large pages.

Making Programs DLPAR-Aware Using DLPAR APIs

Application interfaces are provided to make programs DLPAR-aware. The SIGRECONFIG signal is sent to the applications at the various phases of dynamic logical partitioning. The DLPAR subsystem defines check, pre and post phases for a typical operation. Applications can watch for this signal and use the DLPAR-supported system calls to learn more about the operation in progress and to take any necessary actions.

Note
When using signals, the application might inadvertently block the signal, or the load on the system might prevent the thread from running in a timely fashion. In the case of signals, the system will wait a short period of time, which is a function of the user-specified timeout, and proceed to the next phase. It is not appropriate to wait indefinitely because a non-privileged rogue thread could prevent all DLPAR operations from occurring.

The issue of timely signal delivery can be managed by the application by controlling the signal mask and scheduling priority. The DLPAR-aware code can be directly incorporated into the algorithm. Also, the signal handler can be cascaded across multiple shared libraries so that notification can be incorporated in a more modular way.

To integrate the DLPAR event using APIs, do the following:

  1. Catch the SIGRECONFIG signal by using the sigaction system call. The default action is to ignore the signal.
  2. Control the signal mask in at least one of the threads so that the signal can be delivered in real time.
  3. Ensure that the scheduling priority for the thread that is to receive the signal is sufficient so that it will run quickly after the signal has been sent.
  4. Run the dr_reconfig system call to obtain the type of resource, type of action, phase of the event, as well as other information that is relevant to the current request.
    Note
    The dr_reconfig system call is used inside the signal handler to determine the nature of the DLPAR request.

Managing an Application's DLPAR Dependencies

A DLPAR remove request can fail for a variety of reasons. The most common of these is that the resource is busy, or that there are not enough system resources currently available to complete the request. In these cases, the resource is left in a normal state as if the DLPAR event never happened.

The primary cause of processor removal failure is processor bindings. The operating system cannot ignore processor bindings and carry on DLPAR operations or applications might not continue to operate properly. To ensure that this does not occur, release the binding, establish a new one, or terminate the application. The specific process or threads that are impacted is a function of the type of binding that is used. For more information, see Processor Bindings.

The primary cause of memory removal failure is that there is not enough pinned memory available in the system to complete the request. This is a system-level issue and is not necessarily the result of a specific application. If a page in the memory region to be removed has a pinned page, its contents must be migrated to another pinned page, while atomically maintaining its virtual to physical mappings. The failure occurs when there is not enough pinnable memory in the system to accommodate the migration of the pinned data in the region that is being removed. To ensure that this does not occur, lower the level of pinned memory in the system. This can be accomplished by destroying pinned shared memory segments, terminating programs that implement the plock system call, or removing the plock on the program.

The primary cause of PCI slot removal failure is that the adapters in the slot are busy. Note that device dependencies are not tracked. For example, the device dependency might extend from a slot to one of the following: an adapter, a device, a volume group, a logical volume, a file system, or a file. In this case, resolve the dependencies manually by stopping the relevant applications, unmounting file systems, and varying off volume groups.

Processor Bindings

Applications can bind to a processor by using the bindprocessor system call. This system call assumes a processor-numbering scheme starting with zero (0) and ending with N-1, where N is the number of online CPUs. N is determined programmatically by reading the _system_configuration.ncpus system variable. As processors are added and removed, this variable is incremented and decremented using dynamic logical partitioning.

Note that the numbering scheme does not include holes. Processors are always added to the Nth position and removed from the Nth-1 position. The numbering scheme used by the bindprocessor system call cannot be used to bind to a specific logical processor, because any processor can be removed and this is not reflected in the numbering scheme, because the Nth-1 CPU is always deallocated. For this reason, the identifiers used by the bindprocessor system call are called bind CPU IDs.

Changes to the _system_configuration.ncpus system variables have the following implications:

Applications can also bind to a set of processors using a feature of Workload Manager (WLM) called Software Partitioning. It assumes a numbering scheme that is based on logical CPU IDs, which also start with zero (0) and end with N-1. However, N in this case is the maximum number of processors that can be supported architecturally by the partition. The numbering scheme reflects both online and offline processors.

Therefore, it is important to note the type of binding that is used so that the correct remedy can be applied when removing a processor. The bindprocessor command can be used to determine the number of online processors. The ps command can be used to identify the processes and threads that are bound to the last online processor. After the targets have been identified, the bindprocessor command can be used again to define new attachments.

WLM-related dependencies can be resolved by identifying the particular software partitions that are causing problems. To resolve these dependencies, do the following:

Note
The system schedules bound jobs around offline or pending offline processors, so no change is required if the particular software partition has another online CPU.

  1. Use the lsrset command to view the set of software partitions that are used by WLM.
  2. Identify these software partitions by using the lsclass command.
  3. Identify the set of classes that use these software partitions by using the chclass command.
  4. Reclassify the system using the wlmctrl command.

At this point, the new class definitions take effect and the system automatically migrates bound jobs away from the logical processor that is being removed.

Integrating the DLPAR Operation into the Application

The DLPAR operation can be integrated into the application in the following ways:

Both of these methods follow the same high level structure. Either method can be used to provide support for DLPAR, although only the script-based mechanism can be used to manage DLPAR dependencies related to Workload Manager software partitions (processor sets). No APIs are associated with Workload Manager, so the use of a signal handler is not a suitable vehicle for dealing with the Workload Manager-imposed scheduling constraints. The applications themselves are not Workload Manager-aware. In this case, the system administrator might want to provide a script that invokes Workload Manager commands to manage DLPAR interactions with Workload Manager.

The decision of which method to use should be based on how the system or resource-specific logic was introduced into the application. If the application was externally directed to use a specific number of threads or to size its buffers, use the script-based approach. If the application is directly aware of the system configuration and uses this information accordingly, use the signal-based approach.

The DLPAR operation itself is divided into the following phases:

The system attempts to ensure that all of the check code across the different mediums is executed in its entirety at the system level before the DLPAR event advances to the next phase.

Actions Taken by DLPAR Scripts

Application scripts are invoked for both add and remove operations. When removing resources, scripts are provided to resolve conditions imposed by the application that prevent the resource from being removed. The presence of particular processor bindings and the lack of pinnable memory might cause a remove request to fail. A set of commands is provided to identify these situations, so that scripts can be written to resolve them.

To identify and resolve the DLPAR dependencies, the following commands can be used:

Scripts can also be used for scalability and general performance issues. When resources are removed, you can reduce the number of threads that are used or the size of application buffers. When the resources are added, you can increase these parameters. You can provide commands that can be used to dynamically make these adjustments, which can be triggered by these scripts. Install the scripts to invoke these commands within the context of the DLPAR operation.

High-Level Structure of DLPAR Scripts

This section provides an overview of the scripts, which can be Perl scripts, shell scripts, or commands. Application scripts are required to provide the following commands:

Installing Application Scripts Using the drmgr Command

The drmgr command maintains an internal database of installed-script information. This information is collected when the system is booted and is refreshed when new scripts are installed or uninstalled. The information is derived from the scriptinfo, register, and usage commands. The installation of scripts is supported through options to the drmgr command, which copies the named script to the /usr/lib/dr/scripts/all/ directory where it can be later accessed. You can specify an alternate location for this repository. To determine the machine upon which a script is used, specify the target host name when installing the script.

To specify the location of the base repository, use the following command:

drmgr -R base_directory_path

To install a script, use the following command:

drmgr -i script_name [-f] [-w mins] [-D hostname]

The following flags are defined:

To uninstall a script, use the following command:

drmgr -u script_name [-D hostname]

The following flags are defined:

To display information about scripts that have already been installed, use the following command:

drmgr -l

Naming Conventions for Scripts

It is suggested that the script names be built from the vendor name and the subsystem that is being controlled. System administrators should name their scripts with the sysadmin prefix. For example, a system administrator who wanted to provide a script to control Workload Manager assignments might name the script sysadmin_wlm.

Script Execution Environment and Input Parameters

Scripts are invoked with the following execution environment :

Scripts receive input parameters through command arguments and environment variables, and provide output by writing name=value pairs to standard output, where name=value pairs are delimited by new lines. The name is defined to be the name of the return data item that is expected, and value is the value associated with the data item. Text strings must be enclosed by parentheses; for example, DR_ERROR="text". All environment variables and name=value pairs must begin with DR_, which is reserved for communicating with application scripts.

The scripts use DR_ERROR name=value environment variable pair to provide error descriptions.

You can examine the command arguments to the script to determine the phase of the DLPAR operation, the type of action, and the type of resource that is the subject of the pending DLPAR request. For example, if the script command arguments are checkrelease mem, then the phase is check, the action is remove, and the type of resource is memory. The specific resource that is involved can be identified by examining environment variables.

The following environment variables are set for memory add and remove:

Note
In the following description, one frame is equal to 4 KB.

The following environment variables are set for processor add and remove:

The operator can display the information about the current DLPAR request using the detail level at the HMC to observe events as they occur. This parameter is specified to the script using the DR_DETAIL_LEVEL=N environment variable, where N can range from 0 to 5. The default value is zero (0) and signifies no information. A value of one (1) is reserved for the operating system and is used to present the high-level flow. The remaining levels (2-5) can be used by the scripts to provide information with the assumption that larger numbers provide greater detail.

Scripts provide detailed data by writing the following name=value pairs to standard output:

name=value pair Description
DR_LOG_ERR=message Logs the message with the syslog level of the LOG_ERR environment variable.
DR_LOG_WARNING=message Logs the message with the syslog level of the LOG_WARNING environment variable.
DR_LOG_INFO=message Logs the message with the syslog level of the LOG_INFO environment variable.
DR_LOG_EMERG=message Logs the message with the syslog level of the LOG_EMERG environment variable.
DR_LOG_DEBUG=message Logs the message with the syslog level of the LOG_DEBUG environment variable.

In addition, the operator can also set up a log of information that is preserved by using the syslog facility, in which case, the above information is routed to that facility as well. You must configure the syslog facility in this case.

DLPAR Script Commands

This section describes the script commands for DLPAR:

scriptinfo Provides information about the installed scripts, such as their creation date and resources.
register Invoked to collect the list of resources that are managed by the script. The drmgr command uses these lists to invoke scripts based on the type of resource that is being reconfigured.
usage Provides human-readable strings describing the service provided by the named resource. The context of the message should help the user decide the implications on the application and the services that it provides when named resource is reconfigured. This command is invoked when the script is installed, and the information provided by this command is maintained in an internal database that is used by the drmgr command. Display the information using the -l list option of the drmgr command.
checkrelease When removing resources, the drmgr command assesses the impacts of the removal of the resource. This includes execution of DLPAR scripts that implement the checkrelease command. Each DLPAR script in turn will be able to evaluate the peculiarities of its application and indicate to the drmgr command that is using the script's return code whether the resource removal will affect the associated application. If it finds that the removal of the resource can be done safely, an exit status of success is returned. If the application is in a state that the resource is critical to its execution and cannot be reconfigured without interrupting the execution of the application, then the script indicates the resource should not be removed by returning an error. When the FORCE option is specified by the user, which applies to the entire DLPAR operation including its phases, the drmgr command skips the checkrelease command and begins with the prerelease commands.
prerelease Before a resource is released, the DLPAR scripts are directed to assist in the release of the named resource by reducing or eliminating the use of the resource from the application. However, if the script detects that the resource cannot be released from the application, it should indicate that the resource will not be removed from the application by returning an error. This does not prevent the system from attempting to remove the resource in either the forced or non-forced mode of execution, and the script will be called in the post phase, regardless of the actions or inactions that were taken by the prerelease command. The actions taken by the operating system are safe. If a resource cannot be cleanly removed, the operation will fail.

The DLPAR script is expected to internally record the actions that were taken by the prerelease command, so that they can be restored in the post phase should an error occur. This can also be managed in post phase if rediscovery is implemented. The application might need to take severe measures if the force option is specified.

postrelease After a resource is successfully released, the postrelease command for each installed DLPAR script is invoked. Each DLPAR script performs any post processing that is required during this step. Applications that were halted should be restarted.

The calling program will ignore any errors reported by the postrelease commands, and the operation will be considered a success, although an indication of any errors that may have occurred will also be reported to the user. The DR_ERROR environment variable message is provided for this purpose, so the message should identify the application that was not properly reconfigured.

undoprerelease After a prerelease command is issued by the drmgr command to the DLPAR script, if the drmgr command fails to remove or release the resource, it will try to revert to the old state. As part of this process, the drmgr command will issue the undoprerelease command to the DLPAR script. The undoprerelease command will only be invoked if the script was previously called to release the resource in the current DLPAR request. In this case, the script should undo any actions that were taken by the prerelease command of the script. To this end, the script might need to document its actions, or otherwise provide the capability of rediscovering the state of the system and reconfiguring the application, so that in effect, the DLPAR event never occurred.
checkacquire This command is the first DLPAR script-based command that is called in the acquire-new-resource sequence. It is called for each installed script that previously indicated that it supported the particular type of resource that is being added. One of the primary purposes of the checkacquire phase is to enable processor-based license managers, which might want to fail the addition of a processor. The checkacquire command is always invoked, regardless of the value of the FORCE environment variable, and the calling program honors the return code of the script. The user cannot force the addition of a new processor if a script or DLPAR-aware program fails the DLPAR operation in the check phase.

In short, the FORCE environment variable does not really apply to the checkacquire command, although it does apply to the other phases. In the preacquire phase, it dictates how far the script should go when reconfiguring the application. The force option can be used by the scripts to control the policy by which applications are stopped and restarted similar to when a resource is released, which is mostly a DLPAR-safe issue.

preacquire Assuming that no errors were reported in the checkacquire phase, the system advances to the preacquire phase, where the same set of scripts are invoked to prepare for the acquisition of a new resource, which is supported through the preacquire command. Each of these scripts are called, before the system actually attempts to integrate the resource, unless an error was reported and the FORCE environment variable was not specified by the user. If the FORCE environment variable was specified, the system proceeds to the integrate stage, regardless of the script's stated return code. No errors are detected when the FORCE environment variable is specified, because all errors are avoidable by unconfiguring the application, which is an accepted practice when the FORCE environment variable is specified. If an error is encountered and the FORCE environment variable is not specified, the system will proceed to the undopreacquire phase, but only the previously executed scripts in the current phase are rerun. During this latter phase, the scripts are directed to perform recovery actions.
undopreacquire The undopreacquire phase is provided so that the scripts can perform recovery operations. If a script is called in the undopreacquire phase, it can assume that it successfully completed the preacquire command.
postacquire The postacquire command is executed after the resource has been successfully integrated by the system. Each DLPAR script that was previously called in the check and pre phases is called again. This command is used to incorporate the new resource into the application. For example, the application might want to create new threads, expands its buffers, or the application might need to be restarted if it was previously halted.

Making Kernel Extensions DLPAR-Aware

Like applications, most kernel extensions are DLPAR-safe by default. However, some are sensitive to the system configuration and might need to be registered with the DLPAR subsystem. Some kernel extensions partition their data along processor lines, create threads based on the number of online processors, or provide large pinned memory buffer pools. These kernel extensions must be notified when the system topology changes. The mechanism and the actions that need to be taken parallel those of DLPAR-aware applications.

Registering Reconfiguration Handlers

The following kernel services are provided to register and unregister reconfiguration handlers:

#include sys/dr.h

int reconfig_register(int (*handler)(void *, void *, int, dr_info_t *),
                      int actions, void * h_arg, ulong *h_token, char *name);

void reconfig_unregister(ulong h_token);

The parameters for the reconfig_register subroutine are as follows:

The reconfig_register function returns 0 for success and the appropriate errno value otherwise.

The reconfig_unregister function is called to remove a previously installed handler.

Both the reconfig_register and reconfig_unregister function can only be called in the process environment.

If a kernel extension registers for the pre phase, it is advisable that it register for the check phase to avoid partial unconfiguration of the system when removing resources.

Reconfiguration Handlers

The interface to the reconfiguration handler is as follows:

struct dri_cpu {
        cpu_t           lcpu;           /* Logical CPU Id of target CPU */
        cpu_t           bcpu;           /* Bind Id of target CPU        */
};

struct dri_mem {
        size64_t        req_memsz_change;   /* user requested mem size  */
        size64_t        sys_memsz;          /* system mem size at start */
        size64_t        act_memsz_change;   /* mem added/removed so far */
        rpn64_t         sys_free_frames;    /* Number of free frames */
        rpn64_t         sys_pinnable_frames;/* Number of pinnable frames */
        rpn64_t         sys_total_frames;   /* Total number of frames */
        unsigned long long lmb_addr;        /* start addr of logical memory block */
        size64_t        lmb_size;           /* Size of logical memory block being added */
};

int (*handler)(void *event, void *h_arg, int req, void *resource_info);

The parameters to the reconfiguration handler are as follows:

Reconfiguration handlers are invoked in the process environment.

Kernel extensions can assume the following:

The check phase provides the ability for DLPAR-aware applications and kernel extensions to react to the user's request before it has been applied. Therefore, the check-phase kernel extension handler is invoked once, even though the request might devolve to multiple logical memory blocks. Unlike the check phase, the pre phase, post phase, and post-error phase are applied at the logical memory block level. This is different for application notification, where the pre phase, post phase, or alternatively the post-error phase are invoked once for each user request, regardless of the number of underlying logical memory blocks. Another difference is that the post-error phase for kernel extensions is used when a specific logical memory block operation fails, while the post-error phase for applications is used when the operation, which in this case is the entire user request, fails.

In general, during the check phase, the kernel extension examines its state to determine whether it can comply with the impending DLPAR request. If this operation cannot be managed, or if it would adversely effect the proper execution of the extension, then the handler returns DR_FAIL. Otherwise the handler returns DR_SUCCESS.

During the pre-remove phase, kernel extensions attempts to remove any dependencies that it might have on the designated resource. An example is a driver that maintains per-processor buffer pools. The driver might mark the associated buffer pool as pending delete, so that new requests are not allocated from it. In time, the pool will be drained and it might be freed. Other items that must be considered in the pre-remove phase are timers and bound threads, which need to be stopped and terminated, respectively. Alternatively, bound threads can be unbound.

During the post-remove phase, kernel extensions attempts to free any resources through garbage collection, assuming that the resource was actually removed. If it was not, timers and threads must be re-established. The DR_resource_POST_ERROR request is used to signify that an error occurred.

During the pre-add phase, kernel extensions should pre-initialize any data paths that are dependent on the new resource, so that when the new resource is configured, it is ready to be used. The system does not guarantee that the resource will not be used prior to the handler being called again in the post phase.

During the post-add phase, kernel extensions can assume that the resource has been properly added and can be used. This phase is a convenient place to start bound threads, schedule timers, and increase the size of buffers.

If possible, within a few seconds, the reconfiguration handlers return DR_SUCCESS to indicate successful reconfiguration, or DR_FAIL to indicate failure. If more time is required, the handler returns DR_WAIT.

Extended DR Handlers

If a kernel extension expects that the operation is likely to take a long time, that is, several seconds, the handler returns DR_WAIT to the caller, but proceed with the request asynchronously. In the following case, the handler indicates that it has completed the request by invoking the reconfig_handler_complete routine.

void reconfig_handler_complete(void *event, int rc);

The event parameter is the same parameter that was passed to the handler when it was invoked by the kernel. The rc parameter must be set to either DR_SUCCESS or DR_FAIL to indicate the completion status of the handler.

The reconfig_handler_complete kernel service can be invoked in the process or interrupt environments.

Related Information

drmgr, drslot command in AIX 5L Version 5.2 Commands Reference, Volume 2.

dr_reconfig System Call and reconfig_register and reconfig_unregister Kernel Services in AIX 5L Version 5.2 Technical Reference: Kernel and Subsystems Volume 1

Hardware Management Console Installation and Operations Guide

Planning for Partitioned-System Operations

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]