Partitioning your system is similar to partitioning a hard drive. When you partition a hard drive, you divide a single physical hard drive so that the operating system recognizes it as a number of separate logical hard drives. On each of these divisions, called partitions, you can install an operating system and use each partition as you would a separate physical system.
A logical partition (LPAR) is the division of a computer's processors, memory, and hardware resources into multiple environments so that each environment can be operated independently with its own operating system and applications. The number of logical partitions that can be created depends on the system. Typically, partitions are used for different purposes, such as database operation, client/server operations, Web server operations, test environments, and production environments. Each partition can communicate with the other partitions as if each partition is a separate machine.
Dynamic logical partitioning (DLPAR) provides the ability to logically attach and detach a managed system's resources to and from a logical partition's operating system without rebooting. Some of the features of DLPAR include:
DLPAR requests are built from simple add and remove requests that are directed to logical partitions. The user can execute these commands as move requests at the Hardware Management Console (HMC), which manages all DLPAR operations. DLPAR operations are enabled by pSeries firmware and AIX.
A DLPAR-safe program is one that does not fail as a result of DLPAR operations. Its performance might suffer when resources are removed and it might not scale with the addition of new resources, but the program still works as expected In fact, a DLPAR-safe program can prevent a DLPAR operation from succeeding because it has a dependency that the operating system is obligated to honor.
A DLPAR-aware program is one that has DLPAR code that is designed to adjust its use of system resources as the actual capacity of the system varies over time. This can be accomplished in the following ways:
DLPAR-aware programs must be designed, at minimum, to avoid introducing conditions that might cause DLPAR operations to fail. At maximum, DLPAR-aware programs are concerned with performance and scalability. This is a much more complicated task because buffers might need to be drained and resized to maintain expected levels of performance when memory is added or removed. In addition, the number of threads must be dynamically adjusted to account for changes in the number of online processors. These thread-based adjustments are not necessarily limited to processor-based decisions. For example, the best way to reduce memory consumption in Java programs might be to reduce the number of threads, because this reduces the number of active objects that need to be processed by the Java Virtual Machine's garbage collector.
Most applications are DLPAR-safe by default.
The following types of errors, which represent binary compatibility exposures, can be introduced by DLPAR:
Programs can determine the number of online processors by:
Avoid this problem by preallocating processor-based data using the maximum possible number of processors that can be brought online at the same time. The operating system can be said to be configured to support a maximum of N processors, not that there are N processors active at any given time. The maximum number of processors is constant, while the number of online processors is incremented and decremented as processors are brought online and taken offline. When a partition is created, the minimum, desired, and maximum numbers of processors are specified. The maximum value is reflected in the following variables:
The _system_configuration.original_ncpus and var.v_ncpus_cfg variables are preexisting variables. On DLPAR-enabled systems they represent a potential maximum value. On systems not enabled for DLPAR, the value is dictated by the number of processors that are configured at boot time. Both represent the conceptual maximum value that can be supported, even though a processor might have been taken offline by Dynamic Processor Deallocation. The use of these preexisting fields is recommended for applications that are built on AIX 4.3, because this facilitates the use of the same binary on AIX 4.3 and later. If the application requires runtime initialization of its processor-based data, it can register a DLPAR handler that is called before a processor is added.
A DLPAR-aware program is one that is designed to recognize and dynamically adapt to changes in the system configuration. This code need not subscribe to the DLPAR model of awareness, but can be structured more generally in the form of a system resource monitor that regularly polls the system to discover changes in the system configuration. This approach can be used to achieve some limited performance-related goals, but because it is not tightly integrated with DLPAR, it cannot be effectively used to manage large-scale changes to the system configuration. For example, the polling model might not be suitable for a system that supports processor hot plug, because the hot-pluggable unit might be composed of several processor and memory boards. Nor can it be used to manage application-specific dependencies, such as processor bindings, that need to be resolved before the DLPAR processor remove event is started.
The following types of applications can exploit DLPAR technology:
Dynamic logical partitioning of large memory pages is not supported. The amount of memory that is preallocated to the large page pool can have a material impact on the DLPAR capabilities of the partition regarding memory. A memory region that contains a large page cannot be removed. Therefore, application developers might want to provide an option to not use large pages.
Application interfaces are provided to make programs DLPAR-aware. The SIGRECONFIG signal is sent to the applications at the various phases of dynamic logical partitioning. The DLPAR subsystem defines check, pre and post phases for a typical operation. Applications can watch for this signal and use the DLPAR-supported system calls to learn more about the operation in progress and to take any necessary actions.
The issue of timely signal delivery can be managed by the application by controlling the signal mask and scheduling priority. The DLPAR-aware code can be directly incorporated into the algorithm. Also, the signal handler can be cascaded across multiple shared libraries so that notification can be incorporated in a more modular way.
To integrate the DLPAR event using APIs, do the following:
A DLPAR remove request can fail for a variety of reasons. The most common of these is that the resource is busy, or that there are not enough system resources currently available to complete the request. In these cases, the resource is left in a normal state as if the DLPAR event never happened.
The primary cause of processor removal failure is processor bindings. The operating system cannot ignore processor bindings and carry on DLPAR operations or applications might not continue to operate properly. To ensure that this does not occur, release the binding, establish a new one, or terminate the application. The specific process or threads that are impacted is a function of the type of binding that is used. For more information, see Processor Bindings.
The primary cause of memory removal failure is that there is not enough pinned memory available in the system to complete the request. This is a system-level issue and is not necessarily the result of a specific application. If a page in the memory region to be removed has a pinned page, its contents must be migrated to another pinned page, while atomically maintaining its virtual to physical mappings. The failure occurs when there is not enough pinnable memory in the system to accommodate the migration of the pinned data in the region that is being removed. To ensure that this does not occur, lower the level of pinned memory in the system. This can be accomplished by destroying pinned shared memory segments, terminating programs that implement the plock system call, or removing the plock on the program.
The primary cause of PCI slot removal failure is that the adapters in the slot are busy. Note that device dependencies are not tracked. For example, the device dependency might extend from a slot to one of the following: an adapter, a device, a volume group, a logical volume, a file system, or a file. In this case, resolve the dependencies manually by stopping the relevant applications, unmounting file systems, and varying off volume groups.
Applications can bind to a processor by using the bindprocessor system call. This system call assumes a processor-numbering scheme starting with zero (0) and ending with N-1, where N is the number of online CPUs. N is determined programmatically by reading the _system_configuration.ncpus system variable. As processors are added and removed, this variable is incremented and decremented using dynamic logical partitioning.
Note that the numbering scheme does not include holes. Processors are always added to the Nth position and removed from the Nth-1 position. The numbering scheme used by the bindprocessor system call cannot be used to bind to a specific logical processor, because any processor can be removed and this is not reflected in the numbering scheme, because the Nth-1 CPU is always deallocated. For this reason, the identifiers used by the bindprocessor system call are called bind CPU IDs.
Changes to the _system_configuration.ncpus system variables have the following implications:
Applications can also bind to a set of processors using a feature of Workload Manager (WLM) called Software Partitioning. It assumes a numbering scheme that is based on logical CPU IDs, which also start with zero (0) and end with N-1. However, N in this case is the maximum number of processors that can be supported architecturally by the partition. The numbering scheme reflects both online and offline processors.
Therefore, it is important to note the type of binding that is used so that the correct remedy can be applied when removing a processor. The bindprocessor command can be used to determine the number of online processors. The ps command can be used to identify the processes and threads that are bound to the last online processor. After the targets have been identified, the bindprocessor command can be used again to define new attachments.
WLM-related dependencies can be resolved by identifying the particular software partitions that are causing problems. To resolve these dependencies, do the following:
At this point, the new class definitions take effect and the system automatically migrates bound jobs away from the logical processor that is being removed.
The DLPAR operation can be integrated into the application in the following ways:
Both of these methods follow the same high level structure. Either method can be used to provide support for DLPAR, although only the script-based mechanism can be used to manage DLPAR dependencies related to Workload Manager software partitions (processor sets). No APIs are associated with Workload Manager, so the use of a signal handler is not a suitable vehicle for dealing with the Workload Manager-imposed scheduling constraints. The applications themselves are not Workload Manager-aware. In this case, the system administrator might want to provide a script that invokes Workload Manager commands to manage DLPAR interactions with Workload Manager.
The decision of which method to use should be based on how the system or resource-specific logic was introduced into the application. If the application was externally directed to use a specific number of threads or to size its buffers, use the script-based approach. If the application is directly aware of the system configuration and uses this information accordingly, use the signal-based approach.
The DLPAR operation itself is divided into the following phases:
The check phase is invoked first and it enables applications to fail the current DLPAR request before any state in the system is changed. For example, the check phase could be used by a CPU-based license manager to fail the integration of a new processor if that CPU addition makes the number of processors in the system exceed the number of licensed processors. It could also be used to preserve the DLPAR safeness of a program that is not DLPAR-safe. In the latter case, consideration must be given to services provided by the application, because it might be better to stop the program, complete the request, and then restart the program.
The pre phase and post phase are provided to stop the program, complete the request, and then restart the program.
The system attempts to ensure that all of the check code across the different mediums is executed in its entirety at the system level before the DLPAR event advances to the next phase.
Application scripts are invoked for both add and remove operations. When removing resources, scripts are provided to resolve conditions imposed by the application that prevent the resource from being removed. The presence of particular processor bindings and the lack of pinnable memory might cause a remove request to fail. A set of commands is provided to identify these situations, so that scripts can be written to resolve them.
To identify and resolve the DLPAR dependencies, the following commands can be used:
Scripts can also be used for scalability and general performance issues. When resources are removed, you can reduce the number of threads that are used or the size of application buffers. When the resources are added, you can increase these parameters. You can provide commands that can be used to dynamically make these adjustments, which can be triggered by these scripts. Install the scripts to invoke these commands within the context of the DLPAR operation.
This section provides an overview of the scripts, which can be Perl scripts, shell scripts, or commands. Application scripts are required to provide the following commands:
Identifies the version, date, and vendor of the script. It is called when the script is installed.
Identifies the resources managed by the script. If the script returns the resource name cpu or mem, the script will be automatically invoked when DLPAR attempts to reconfigure processors and memory, respectively. The register command is called when the script is installed with the DLPAR subsystem.
Returns information describing how the resource is being used by the application. The description should be relevant so that the user can determine whether to install or uninstall the script. It should identify the software capabilities of the application that are impacted. The usage command is called for each resource that was identified by the register command.
Indicates whether the DLPAR subsystem should continue with the removal of the named resource. A script might indicate that the resource should not be removed if the application is not DLPAR-aware and the application is considered critical to the operation of the system.
Reconfigures, suspends, or terminates the application so that its hold on the named resource is released.
Resumes or restarts the application.
Invoked if an error is encountered and the resource is not released.
Indicates whether the DLPAR subsystem should proceed with the resource addition. It might be used by a license manager to prevent the addition of a new resource, for example cpu, until the resource is licensed.
Used to prepare for a resource addition.
Invoked if an error is encountered in the preacquire phase or when the event is acted upon.
Resumes or starts the application.
The drmgr command maintains an internal database of installed-script information. This information is collected when the system is booted and is refreshed when new scripts are installed or uninstalled. The information is derived from the scriptinfo, register, and usage commands. The installation of scripts is supported through options to the drmgr command, which copies the named script to the /usr/lib/dr/scripts/all/ directory where it can be later accessed. You can specify an alternate location for this repository. To determine the machine upon which a script is used, specify the target host name when installing the script.
To specify the location of the base repository, use the following command:
drmgr -R base_directory_path
To install a script, use the following command:
drmgr -i script_name [-f] [-w mins] [-D hostname]
The following flags are defined:
To uninstall a script, use the following command:
drmgr -u script_name [-D hostname]
The following flags are defined:
To display information about scripts that have already been installed, use the following command:
drmgr -l
It is suggested that the script names be built from the vendor name and the subsystem that is being controlled. System administrators should name their scripts with the sysadmin prefix. For example, a system administrator who wanted to provide a script to control Workload Manager assignments might name the script sysadmin_wlm.
Scripts are invoked with the following execution environment :
Scripts receive input parameters through command arguments and environment variables, and provide output by writing name=value pairs to standard output, where name=value pairs are delimited by new lines. The name is defined to be the name of the return data item that is expected, and value is the value associated with the data item. Text strings must be enclosed by parentheses; for example, DR_ERROR="text". All environment variables and name=value pairs must begin with DR_, which is reserved for communicating with application scripts.
The scripts use DR_ERROR name=value environment variable pair to provide error descriptions.
You can examine the command arguments to the script to determine the phase of the DLPAR operation, the type of action, and the type of resource that is the subject of the pending DLPAR request. For example, if the script command arguments are checkrelease mem, then the phase is check, the action is remove, and the type of resource is memory. The specific resource that is involved can be identified by examining environment variables.
The following environment variables are set for memory add and remove:
The number of free frames currently in the system, in hexadecimal format.
The number of megabytes that were successfully added or removed, in decimal format.
The size of the memory request in megabytes, in decimal format.
The total number of pinnable frames currently in the system, in hexadecimal format. This parameter provides valuable information when removing memory in that it can be used to determine when the system is approaching the limit of pinnable memory, which is the primary cause of failure for memory remove requests.
The total number of frames currently in the system, in hexadecimal format.
The following environment variables are set for processor add and remove:
The bind CPU ID of the processor that is being added or removed in decimal format. A bindprocessor attachment to this processor does not necessarily mean that the attachment has to be undone. This is only true if it is the Nth processor in the system, because the Nth processor position is the one that is always removed in a CPU remove operation. Bind IDs are consecutive in nature, ranging from 0 to N and are intended to identify only online processors. Use the bindprocessor command to determine the number of online CPUs.
The logical CPU ID of the processor that is being added or removed in decimal format.
The operator can display the information about the current DLPAR request using the detail level at the HMC to observe events as they occur. This parameter is specified to the script using the DR_DETAIL_LEVEL=N environment variable, where N can range from 0 to 5. The default value is zero (0) and signifies no information. A value of one (1) is reserved for the operating system and is used to present the high-level flow. The remaining levels (2-5) can be used by the scripts to provide information with the assumption that larger numbers provide greater detail.
Scripts provide detailed data by writing the following name=value pairs to standard output:
In addition, the operator can also set up a log of information that is preserved by using the syslog facility, in which case, the above information is routed to that facility as well. You must configure the syslog facility in this case.
This section describes the script commands for DLPAR:
Like applications, most kernel extensions are DLPAR-safe by default. However, some are sensitive to the system configuration and might need to be registered with the DLPAR subsystem. Some kernel extensions partition their data along processor lines, create threads based on the number of online processors, or provide large pinned memory buffer pools. These kernel extensions must be notified when the system topology changes. The mechanism and the actions that need to be taken parallel those of DLPAR-aware applications.
The following kernel services are provided to register and unregister reconfiguration handlers:
#include sys/dr.h int reconfig_register(int (*handler)(void *, void *, int, dr_info_t *), int actions, void * h_arg, ulong *h_token, char *name); void reconfig_unregister(ulong h_token);
The parameters for the reconfig_register subroutine are as follows:
The reconfig_register function returns 0 for success and the appropriate errno value otherwise.
The reconfig_unregister function is called to remove a previously installed handler.
Both the reconfig_register and reconfig_unregister function can only be called in the process environment.
If a kernel extension registers for the pre phase, it is advisable that it register for the check phase to avoid partial unconfiguration of the system when removing resources.
The interface to the reconfiguration handler is as follows:
struct dri_cpu { cpu_t lcpu; /* Logical CPU Id of target CPU */ cpu_t bcpu; /* Bind Id of target CPU */ }; struct dri_mem { size64_t req_memsz_change; /* user requested mem size */ size64_t sys_memsz; /* system mem size at start */ size64_t act_memsz_change; /* mem added/removed so far */ rpn64_t sys_free_frames; /* Number of free frames */ rpn64_t sys_pinnable_frames;/* Number of pinnable frames */ rpn64_t sys_total_frames; /* Total number of frames */ unsigned long long lmb_addr; /* start addr of logical memory block */ size64_t lmb_size; /* Size of logical memory block being added */ }; int (*handler)(void *event, void *h_arg, int req, void *resource_info);
The parameters to the reconfiguration handler are as follows:
Reconfiguration handlers are invoked in the process environment.
Kernel extensions can assume the following:
The check phase provides the ability for DLPAR-aware applications and kernel extensions to react to the user's request before it has been applied. Therefore, the check-phase kernel extension handler is invoked once, even though the request might devolve to multiple logical memory blocks. Unlike the check phase, the pre phase, post phase, and post-error phase are applied at the logical memory block level. This is different for application notification, where the pre phase, post phase, or alternatively the post-error phase are invoked once for each user request, regardless of the number of underlying logical memory blocks. Another difference is that the post-error phase for kernel extensions is used when a specific logical memory block operation fails, while the post-error phase for applications is used when the operation, which in this case is the entire user request, fails.
In general, during the check phase, the kernel extension examines its state to determine whether it can comply with the impending DLPAR request. If this operation cannot be managed, or if it would adversely effect the proper execution of the extension, then the handler returns DR_FAIL. Otherwise the handler returns DR_SUCCESS.
During the pre-remove phase, kernel extensions attempts to remove any dependencies that it might have on the designated resource. An example is a driver that maintains per-processor buffer pools. The driver might mark the associated buffer pool as pending delete, so that new requests are not allocated from it. In time, the pool will be drained and it might be freed. Other items that must be considered in the pre-remove phase are timers and bound threads, which need to be stopped and terminated, respectively. Alternatively, bound threads can be unbound.
During the post-remove phase, kernel extensions attempts to free any resources through garbage collection, assuming that the resource was actually removed. If it was not, timers and threads must be re-established. The DR_resource_POST_ERROR request is used to signify that an error occurred.
During the pre-add phase, kernel extensions should pre-initialize any data paths that are dependent on the new resource, so that when the new resource is configured, it is ready to be used. The system does not guarantee that the resource will not be used prior to the handler being called again in the post phase.
During the post-add phase, kernel extensions can assume that the resource has been properly added and can be used. This phase is a convenient place to start bound threads, schedule timers, and increase the size of buffers.
If possible, within a few seconds, the reconfiguration handlers return DR_SUCCESS to indicate successful reconfiguration, or DR_FAIL to indicate failure. If more time is required, the handler returns DR_WAIT.
If a kernel extension expects that the operation is likely to take a long time, that is, several seconds, the handler returns DR_WAIT to the caller, but proceed with the request asynchronously. In the following case, the handler indicates that it has completed the request by invoking the reconfig_handler_complete routine.
void reconfig_handler_complete(void *event, int rc);
The event parameter is the same parameter that was passed to the handler when it was invoked by the kernel. The rc parameter must be set to either DR_SUCCESS or DR_FAIL to indicate the completion status of the handler.
The reconfig_handler_complete kernel service can be invoked in the process or interrupt environments.
drmgr, drslot command in AIX 5L Version 5.2 Commands Reference, Volume 2.
dr_reconfig System Call and reconfig_register and reconfig_unregister Kernel Services in AIX 5L Version 5.2 Technical Reference: Kernel and Subsystems Volume 1
Hardware Management Console Installation and Operations Guide
Planning for Partitioned-System Operations