IBM Books

Planning Volume 2, Control Workstation and Software Environment


Planning for LoadLeveler

LoadLeveler is an IBM software product that provides workload management of both interactive and batch processing on your RS/6000 SP system or RS/6000 workstations. The LoadLeveler software lets you build, submit, and process both serial and parallel jobs. It allows multiple user space tasks per node, enabling applications that use the message passing interface (MPI) protocol to exploit SMP nodes or servers and realize significant performance improvements. Applications using the LAPI User Space API or Parallel Environment MPI 2.4 and LoadLeveler 2.1, or later versions of the software, can start up to four user space tasks per node with the SP Switch and up to 16 with the SP Switch2. These tasks can be from the same or from different parallel jobs. This allows parallel applications to exploit the symmetric multiprocessors on the node or SP-attached server without restructuring or recompiling. |A user space MPI or LAPI job can consist of up to 1024 tasks on SP |Switch systems and up to 4096 tasks on SP Switch2 system (up to 2048 tasks for |the MPI/IP library).

LoadLeveler is an integral piece of the total System Management solution on the RS/6000 SP system. The latest version of LoadLeveler is included with your new SP order. You choose whether to use it. LoadLeveler can take advantage of features provided in PSSP, such as event management, performance monitoring, and SP switch management. LoadLeveler will also interoperate with other schedulers to support batch job processing on other hardware platforms. These schedulers can include Network Queueing System (NQS) and the IBM Network Queuing System/MVS (NQS/MVS).

Compatibility

|The latest release is LoadLeveler 3.1. It is available |for AIX 4.3.3. For compatibility information see LoadLeveler in Chapter 11, Planning for migration.

|For the most up-to-date migration instructions, see the README file |distributed with LoadLeveler 3.1.

Planning for a highly available LoadLeveler cluster

LoadLeveler provides features within the product for automatic recovery in the event of failure of the central manager in the batch configuration and of the domain name server running Interactive Network Dispatcher in the interactive configuration. Additionally, the availability of individual compute nodes and file systems in the LoadLeveler cluster can be enhanced by using the High Availability Cluster Multi-Processing (HACMP) product as well as the High Availability Control Workstation (HACWS) optional feature of PSSP. For details on how to configure LoadLeveler for high availability, see the ITSO Redbook Implementing High Availability on the RS/6000 SP.

Planning your LoadLeveler configuration

In general, planning the LoadLeveler installation for workload management requires making the following configuration decisions. You must decide what is suitable to your environment. Be sure to address the following:

Other planning considerations:

  1. LoadLeveler requires a common name space for the entire LoadLeveler cluster. To run jobs on any machine in the LoadLeveler cluster, you must have the same uid (system ID number for a user) and gid (system ID number for a group) on every machine in the cluster. If you do not have a user ID on one machine, your jobs will not run on that machine.
  2. LoadLeveler works in conjunction with the NFS or AFS file systems. Allowing users to share file systems to obtain a single, network-wide image, is one way to make managing LoadLeveler easier.
  3. Some nodes in the LoadLeveler cluster might have special software installed that you might need to run your jobs successfully. You should configure LoadLeveler to distinguish those nodes from other nodes using, for example, job classes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]