IBM Books

Administration Guide


Understanding the LAPI

To help you achieve a fuller understanding of the LAPI, this section presents further details on the active message infrastructure and the defined set of functions. In addition, concepts important to understanding the LAPI are explained.

The active message infrastructure

The underlying infrastructure that was selected for the LAPI is referred to as the active message. It has the following characteristics:

Writing handlers

The ability for programmers to write their own handlers provides a generalized, yet efficient, mechanism for customizing the interface to one's specific requirements. The user is responsible for protecting shared structures and buffers where necessary by using the locking structures available in the AIX p-threads library.

The LAPI supports messages that can be larger than the size supported by the underlying SP Switch subsystem. Therefore, the data sent with the active message may arrive at the target in multiple packets and, further, these packets can arrive out of order. This situation places some requirements on how the handler is written.

When the active message brings with it data from the originating process, the architecture requires that the handler be written as two separate routines:

  1. A header handler function. This function is specified in the active message call. It is called when the message first arrives at the target process and provides the LAPI dispatcher (the part of the LAPI that deals with the arrival of messages and invocation of handlers) with the address of where to copy the arriving data, the address of the optional completion handler, and a pointer to the parameter that is to be passed to the completion handler.
  2. A completion handler that is called after the whole message has been received.

An example of LAPI active message function

In this example, a programmer writes a handler for the LAPI active message interface. See the book PSSP: Command and Technical Reference for more information on the LAPI_Amsend subroutine.

  1. The desired function (accumulate) is to add a vector (S) to another (D) on the target node and put the results in the vector at the target:
    D[0..N-1] = D[0..N-1] + S[0..N-1]
    

    where, S[N] is a vector of length N in the address space of the origin process (origin_process) D[N] is a vector of length N in the address space of the target process (target_process)

  2. The generic active message call is defined as LAPI_Amsend (hndl, tgt, hdr_hdl, uhdr, uhdr_len, udata, udata_len, tgt_cntr, org_cntr, cmpl_cntr)
  3. Before making the active message call, you must obtain the address of the target counter (target_cntr_addr) and the address of the header handler to be executed on the target process (accumulate_addr). The address of the header handler is obtained by the LAPI_Address function.
  4. Initialize the udhr based on the header expected by accumulate. For example, the structure of udhr could be:
    typedef struct {
       void *target_addr;
       uint length;
    } put_add_hdr_t;
     
    put_add_hdr_t uhdr;
     
    uhdr.target_addr = D;
    uhdr.length = N;
    
  5. Make the specific call
    LAPI_Amsend (hndl, target_process, accumulate_addr,
    &uhdr, sizeof(put_add_hdr_t), &S[0],
    N*sizeof(S[0]), target_cntr_addr, &origin_cntr, &completion_cntr)
    
  6. When this message is received at the target (assuming that the entire origin data is contained within a packet), the accumulate handler you specified is invoked by the dispatcher. The structure of the header handler is:
    void *header_handler (lapi_handle_t hndl, void *uhdr,
       uint uhdr_len, uint msg_len, completion_handler_t,
       *completion_handler, void *user_info)
    

    The structure of the completion handler is:

    void completion_handler (lapi_handle_t hndl,
       void *user_info)
    
  7. If any state information about the message is required by the completion handler, the information required must be saved in a user buffer by the header handler. The header handler passes the address of this buffer to the dispatcher through the parameter user_info. The dispatcher uses this pointer as a parameter (user_info) for the completion handler.
  8. For this example operations performed at the target process are:

The accumulate handler is the header handler and is called by the LAPI layer when the message first arrives at the target process. The header handler saves the information required by complete_accumulate (target_addr, length, and buf) in user_info and passes back a pointer to the complete_accumulate handler as user_info. Additionally, the header handler returns address of a buffer buf.

Large active messages are generally transferred as multiple packets. In this case the LAPI layer stores the incoming data in the packets as they arrive into buf. When all the data has been received, it calls the complete_accumulate function which uses user_info to access the two vectors, adds them and stores them at the desired location. After the return from the complete_accumulate routine, the LAPI layer increments tgt_ctr. The origin_cntr increments when it is safe to return the origin buffer back to the user.

The cmpl_cntr increments after the completion handler has completed execution. The cmpl_cntr, therefore, is a reflection, at the origin, of the tgt_cntr.

The defined set of functions

Fundamentally, the defined set of functions for the LAPI provides a Remote Memory Copy (RMC) interface. The primary characteristics of the defined set of functions provided by LAPI are:

Important LAPI concepts

To use the LAPI, it is important to understand the following concepts:

Origin and target

Origin denotes the task (or process or processor) that initiates a LAPI operation (PUT, GET, or active message.). Target denotes the other task whose address space is accessed. Although multiple tasks may run on a single node, it is convenient to think of each task as running on a different node. Therefore the origin task may also be referred to as the origin node and the target task as the target node. The origin and target can be the same for any of the calls, but if the origin and target data areas overlap, the results are undefined.

Blocking and non-blocking calls

A blocking procedure is one that returns only after the operation is complete. There are no restrictions on the reuse of user resources.

A non-blocking procedure is one that may return before the operation is complete and before the user is allowed to reuse all the resources specified in the call. A non-blocking operation is considered to be complete only after a completion testing function, such as LAPI_Waitcntr or Lapi_Getcntr, indicates that the operation is complete.

Completion of communication operation

A communication operation is considered to be complete, with respect to the buffer, when the buffer is reusable.

A PUT is complete with respect to the origin buffer when the data has been copied out of the buffer at the origin and may be overwritten. A GET is complete with respect to the origin buffer when that origin buffer holds the new data that was obtained by GET.

A PUT is complete with respect to the target buffer when the new data is available at the target buffer. A GET is complete with respect to the target buffer when the data has been copied out of the buffer at target and the target task may overwrite that buffer.

Communication behaviors

Two communication behaviors support two different definitions of "completion":

The LAPI defines both standard and synchronous behaviors for PUT operations. The LAPI defines only synchronous behavior for GET operations.

Message ordering and atomicity

Two LAPI operations that have the same origin task are considered to be ordered with respect to the origin if one of the operations starts after the other has completed at the origin task. Similarly, two LAPI operations that have the same target task are considered to be ordered with respect to the target if one of the operations starts after the other has completed at the target task. If two operations are not ordered, they are considered concurrent. The LAPI provides no guarantees of ordering for concurrent communication operations. The LAPI does provide mechanisms which an application can use to guarantee order.

As an example, consider the case where a node issues two standard behavior PUT operations to the same target node, where the targets overlap. These two operations may complete in any order, including the possibility of the first PUT overlapping the second, in time. The contents of the overlapping region will be undefined, even after both PUTs complete. Using synchronous behavior for both PUT operations, (waiting for the first to complete before starting the second) will ensure that the overlapping region contains the result of the second after both PUTs have completed.

Error handling

If an error occurs during a communications operation, the error may be signaled at the origin of operation, or the target or both. Some errors may be caught before the communication operation begins, and these will be signaled at the origin. However, some errors will not occur until the communication is in progress (a segmentation violation at the target, for example); these may be signaled at either or both ends of the communication.

Progress

All LAPI operations are unilateral by default and can complete successfully or fail, independent of the actions of other tasks. Specifically, a LAPI operation to a particular target should complete even if the target is stuck in an infinite loop: that is, when the target process is in interrupt mode.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]