IBM Books

Administration Guide


Specific LAPI hints

The LAPI library uses the SIGBUS signal handler. If an application using LAPI also uses SIGBUS, the application's SIGBUS handler should be registered before the call is made to the LAPI_Init function.

The LAPI_Senv function should be used to enable error checking (ERROR_CHK should be set on) during application development and disabled (ERROR_CHK should be set off) for normal application use.

When the order of execution between any two LAPI functions within one task of a parallel program needs to be guaranteed, using LAPI_Waitcntr between the two LAPI functions will usually be more efficient than LAPI_Fence. LAPI_Fence requires that all LAPI operations initiated on the current thread before the LAPI_Fence be completed before any LAPI operation after the fence is allowed to start. LAPI_Waitcntr can be used to indicate the completion of a single LAPI function which had been initiated on the current thread before the LAPI_Waitcntr.

The scope of LAPI_Fence is per thread. For example, a LAPI_Fence which is issued from the completion handler thread will only guarantee that no LAPI operations initiated after the fence (on the completion handler thread) will start until all LAPI operations initiated before the fence have completed. In this case there are no guarantees about the order of LAPI operations initiated from the main application thread.

LAPI_Waitcntr can be used to indicate the completion of a single LAPI function which might have been initiated from an alternate thread (completion handler) within the same task. Therefore the possibility exists to use LAPI_Waitcntr to wait for the completion of another LAPI function which is initiated after the call to LAPI_Waitcntr.

LAPI_Waitcntr can be used to guarantee order of execution of LAPI_Amsend operations which are initiated from a single origin task. When LAPI_Amsend operations use the cmpl_cntr counter, this counter is incremented after the completion counter (or header handler if a completion handler is not specified) has executed at the target task. LAPI_Fence and LAPI_Gfence do not provide an indication that LAPI_Amsend operations have completed execution at the target.

LAPI_Waitcntr is a blocking call. If a user prefers to avoid this blocking operation a program loop comprised of the sequence

LAPI_Getcntr
a check of the value returned from Get
LAPI_Probe

will provide an equivalent logical operation and provide the user with added flexibility.

Before calling the LAPI_Init function, any unused fields of the lapi_info_t structure, the second parameter to LAPI_Init, must be set to zero.

LAPI_Init must be called before any thread (the main thread or the completion handler thread) can make a LAPI call. In addition to this, LAPI_Address_init or LAPI_Gfence should be the second LAPI call. These two functions provide a barrier which guarantees that all other LAPI tasks have initialized their LAPI subsystems and are ready to receive requests from remote tasks. Failure to provide these barrier functions might result in dropped switch packets, low performance at start-up and unnecessary switch congestion. The instance of the LAPI subsystem should be quiesced before LAPI_Term is called to terminate the LAPI instance. This can be done by calling LAPI_Gfence before LAPI_Term.

When one task of a parallel job (using POE) determines a condition which requires that all tasks of the parallel job be terminated, the task needs to send a process terminating signal (for example, SIGTERM or SIGQUIT) to itself. The POE daemon (PMD) which spawned that task will:

  1. Catch SIGCHILD
  2. Send a message to the POE "home node"
  3. The POE "home node" broadcasts a message to all the PMDs to terminate the job
  4. Each of the PMDs sends SIGTERM to their child process and waits for SIGCHILD
  5. Each task termination is reported back to the "home node"
  6. The complete POE job ends

General LAPI hints

Programs using the LAPI library are compiled with the POE commands mpcc_r, mpCC_r and mpxlf_r. The user's program can (but is not required to) create threads. The POE commands also link a POE version of the standard re-entrant C library, libc_r.a, which allows synchronization of the tasks at exit. All the PE and CSS libraries are dynamic shared libraries, enhancing the portability of the user's application between operating system levels and machine types. mpcc_r invokes the re-entrant C compiler with the compiler option to include the threaded (re-entrant) libraries and to generate thread-aware code. mpcc_r links in the threaded LAPI library, including the threaded versions of the POE utility libraries.

A threaded program has more than one independent instruction stream, but all threads share the same address space, file, and environment variables.

Core dumps

If a task produces a core file, it is written to an appropriately named subdirectory of the user's current directory. The partial dump produced by default does not contain the stack and status information for all threads; thus it is of limited usefulness in trying to diagnose hang conditions. It is possible to request AIX to produce a full core file, but such files are generally larger than permitted by AIX user limits. The communication subsystem alone generates more than 64MB of core information. Thus, if possible, use the attach capability of dbx, xldb or pdbx to examine the task while it is still running.

Hangs

Coordinating the threads in a task requires careful locking and signaling. Program deadlocks waiting on locks that haven't been released are common, in addition to the deadlock possibilities offered by improper use of the LAPI calls.

Use with thread-safe libraries

A threaded LAPI program must meet the same criteria as any other threaded program. It must avoid using non-thread safe functions in more than one thread (for example, strtok). In addition, it must use only thread-safe libraries if library functions are called on more than one thread. AIX provides thread-safe versions of some libraries, such as the libc_r.a library. However, not all libraries have a thread-safe version. It is your responsibility to determine whether the libraries you use can be safely called by more than one thread.

AIX requires that threaded programs that include <pthread.h> put this include first, before includes for <stdio.h> or other system includes. This is because <pthread.h> defines some conditional compile variables that modify the code generation of subsequent includes, particularly <stdio.h>. Note that <pthread.h> is not required unless the file uses thread-related calls or data.

POE catches asynchronous signals that normally terminate a program (SIGINT, SIGQUIT, SIGTERM, SIGDANGER) and terminates the entire parallel job. In addition, POE catches synchronous signals (SIGSEGV, SIGBUS) that are specific to a thread, terminates the entire parallel job, and causes a core file to be written at the point of interrupt on that thread. Since it is possible for several threads to generate synchronous signals simultaneously, the POE signal handler blocks subsequent occurrences.

The stacksize for the main thread is set as a characteristic of the user (in /etc/security/limits which is maintained by SMIT). The user can set the stacksize of any additional threads created; the default is 96K. Thread stacks are allocated in the process heap.

The main program should exit, not just pthread_exit. pthread_exit might not terminate all threads. exit will result in all threads being terminated.

If the user's application is also using the MPI protocol, the threaded MPI library (libmpi_r.a) must be used. Use of the MPI signal library (libmpi.a) along with the LAPI library is not supported.

The AIX compilers support a flag -qarch that allows the user to target code generation to a particular processor architecture. While this option can provide performance enhancements on specific platforms, it inhibits portability, particularly between the Power and PowerPC machines. The LAPI library is not targeted to a particular architecture, and is the same on PowerPC and Power nodes.

Once the LAPI library is initialized, if the process forks, only the forking thread exists in the child process. The child process doing LAPI communications is not supported.

IBM Parallel Environment for AIX: Hitchhiker's Guide provides additional information which has applicability to running LAPI parallel jobs using POE.

Linking with libraries built with libc.a

Compiling a threaded LAPI program will cause the libc_r.a library to be used to resolve all the calls to the standard C library. If your program links with a library that has been built using the standard C library, it is still usable (assuming that it provides the necessary logical thread safety) under the following conditions:

To explain further, the run-time library path for an executable is composed of the AIX LIBPATH environment variable concatenated with the library path string contained in the a.out file (and put there when the program was linked). All the shared libraries used by the executable must be found in the directories occurring in this string. Normally the AIX LIBPATH environment variable is empty, and the cc (xlc) compiler creates a string in which /usr/lib is the first entry. Thus, the executable looks for /usr/lib/libc.a. However, the cc_r (xlc_r) compiler creates a string starting with /usr/lib/threads:/usr/lib. Thus, the executable will look for libc.a in /usr/lib/threads first. And, in fact, there is exactly one member of /usr/lib/threads: /usr/lib/threads/libc.a, which is, however, a symbolic link to /usr/lib/libc_r.a! Thus, the executable looking for libc.a actually loads libc_r.a. Calls to symbols in libc.a are resolved by the second entry, /usr/lib/libc_r.a. Thus, all calls to the standard C library are resolved by the same library, providing a consistent internal state.

mpcc_r does the same sort of thing. mpcc_r puts /usr/lpp/ppe.poe/lib/threads:/usr/lpp/ppe.poe/lib in the executable's libpath string and expects /usr/lpp/ppe.poe/lib/threads/libc.a to be a symbolic link to /usr/lpp/ppe.poe/lib/libc_r.a

The LIBPATH environment variable can be set by the user directly. The current version of POE (Version 2.3) sets the default LIBPATH as follows: LIBPATH=$MP_EUILIBPATH/$MP_EUILIB, and expects that the mpcc_r command has caused /usr/lpp/ppe.poe/lib to be included in the LIBPATH search string.

Use of segment registers (-bmaxdata restrictions)

The User Space LAPI library uses two segment registers of the 10 that are unassigned in the user's AIX process space. Thus, the user can use a maximum of 8 segments (-bmaxdata=0x80000000) for extended heap for large data arrays. The MPI library uses 3 segment registers. A program calling both the LAPI and MPI interfaces has a maximum of 5 segments available for shared memory or extended heap.

IBM suggests that you don't write threaded message passing programs until you are quite familiar with writing and debugging threaded single-task programs.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]