Administration Guide

Understanding the LAPI

To help you achieve a fuller understanding of the LAPI, this section presents further details on the active message infrastructure and the defined set of functions. In addition, concepts important to understanding the LAPI are explained.

The active message infrastructure

The underlying infrastructure that was selected for the LAPI is referred to as the active message. It has the following characteristics:

The active message includes the address of a user-specified handler. When the active message arrives at the target process, the specified handler is invoked and executes in the address space of the target process.
The active message optionally may also bring with it the user header and data from the originating process.
Operations are unilateral in the sense that the target process does not have to take explicit action for the active message to complete.
Storage buffers for arriving data need to be provided by the invoked handler.

Writing handlers

The ability for programmers to write their own handlers provides a generalized, yet efficient, mechanism for customizing the interface to one's specific requirements. The user is responsible for protecting shared structures and buffers where necessary by using the locking structures available in the AIX p-threads library.

The LAPI supports messages that can be larger than the size supported by the underlying SP Switch subsystem. Therefore, the data sent with the active message may arrive at the target in multiple packets and, further, these packets can arrive out of order. This situation places some requirements on how the handler is written.

When the active message brings with it data from the originating process, the architecture requires that the handler be written as two separate routines:

A header handler function. This function is specified in the active message call. It is called when the message first arrives at the target process and provides the LAPI dispatcher (the part of the LAPI that deals with the arrival of messages and invocation of handlers) with the address of where to copy the arriving data, the address of the optional completion handler, and a pointer to the parameter that is to be passed to the completion handler.
A completion handler that is called after the whole message has been received.

An example of LAPI active message function

In this example, a programmer writes a handler for the LAPI active message interface. See the book PSSP: Command and Technical Reference for more information on the LAPI_Amsend subroutine.

The desired function (accumulate) is to add a vector (S) to another (D) on the target node and put the results in the vector at the target:
```
D[0..N-1] = D[0..N-1] + S[0..N-1]
```
where, S[N] is a vector of length N in the address space of the origin process (origin_process) D[N] is a vector of length N in the address space of the target process (target_process)
The generic active message call is defined as LAPI_Amsend (hndl, tgt, hdr_hdl, uhdr, uhdr_len, udata, udata_len, tgt_cntr, org_cntr, cmpl_cntr)
Before making the active message call, you must obtain the address of the target counter (target_cntr_addr) and the address of the header handler to be executed on the target process (accumulate_addr). The address of the header handler is obtained by the LAPI_Address function.

Initialize the udhr based on the header expected by accumulate. For example, the structure of udhr could be:

typedef struct {
   void *target_addr;
   uint length;
} put_add_hdr_t;
 
put_add_hdr_t uhdr;
 
uhdr.target_addr = D;
uhdr.length = N;

Make the specific call

LAPI_Amsend (hndl, target_process, accumulate_addr,
&uhdr, sizeof(put_add_hdr_t), &S[0],
N*sizeof(S[0]), target_cntr_addr, &origin_cntr, &completion_cntr)

When this message is received at the target (assuming that the entire origin data is contained within a packet), the accumulate handler you specified is invoked by the dispatcher. The structure of the header handler is:
```
void *header_handler (lapi_handle_t hndl, void *uhdr,
   uint uhdr_len, uint msg_len, completion_handler_t,
   *completion_handler, void *user_info)
```
The structure of the completion handler is:
```
void completion_handler (lapi_handle_t hndl,
   void *user_info)
```
If any state information about the message is required by the completion handler, the information required must be saved in a user buffer by the header handler. The header handler passes the address of this buffer to the dispatcher through the parameter user_info. The dispatcher uses this pointer as a parameter (user_info) for the completion handler.

For this example operations performed at the target process are:

Within the dispatcher:
1. The LAPI header is read.
2. uhdr and uhdr_len are extracted from the LAPI header.
3. The header handler is invoked.
```
buf = (*accumulate_addr)(hndl, uhdr, uhdr_len, msg_len,
       &completion_handler, &user_info);
```
4. udata is copied into buf.
5. The completion handler is invoked.
```
(*completion_handler)(&hndl, user_info);
```
Note:
If the message was not contained within a packet, the LAPI layer will save the necessary information and will invoke the completion handler after all the udata has arrived and copied into buf

User-defined functions:

Header handler:

accumulate(hndl, uhdr, uhdr_len, msg_len, completion_handler, user_info)
{
   buf = addr where incoming data should be buffered
   save (target_addr=D, length=N, buf) in user_info
   completion_handler = complete_accumulate
   return buf
}

Completion handler:

complete_accumulate (hndl, user_info)
{
   retrieve required data (namely D,N and buf) from user_info;
   for (i=0; i<N; i++) D[i]=D[i]
     + buf [i];
   return
}

The accumulate handler is the header handler and is called by the LAPI layer when the message first arrives at the target process. The header handler saves the information required by complete_accumulate (target_addr, length, and buf) in user_info and passes back a pointer to the complete_accumulate handler as user_info. Additionally, the header handler returns address of a buffer buf.

Large active messages are generally transferred as multiple packets. In this case the LAPI layer stores the incoming data in the packets as they arrive into buf. When all the data has been received, it calls the complete_accumulate function which uses user_info to access the two vectors, adds them and stores them at the desired location. After the return from the complete_accumulate routine, the LAPI layer increments tgt_ctr. The origin_cntr increments when it is safe to return the origin buffer back to the user.

The cmpl_cntr increments after the completion handler has completed execution. The cmpl_cntr, therefore, is a reflection, at the origin, of the tgt_cntr.

The defined set of functions

Fundamentally, the defined set of functions for the LAPI provides a Remote Memory Copy (RMC) interface. The primary characteristics of the defined set of functions provided by LAPI are:

The basic data transfer operations are memory to memory copy operations that transfer data from one virtual address space to another virtual address space.
The operations are unilateral. That is, one process initiates an operation and the completion of the operation does not require any other process to take some complementary action. (This is unlike a send and receive operation, where a send requires a complementary receive with matching parameters to be posted for completion.)
The operations support both "pull" and "push". The LAPI_Get operation copies data from the address space of the target process into the address space of the origin process. The LAPI_Put operation copies data into the address space of the target process from the address space of the origin process.
The initiating process specifies the virtual address of both the source and destination of the data (unlike a send and receive process where each side specifies the address in its own address space). To avoid the limitation of requiring that the address maps on the different processes be identical, the LAPI provides the LAPI_Address_init mechanism by which the different communicating processes can exchange information regarding the address map of shared data objects.
Because data transfer operations are unilateral and no synchronization between the two processes is implied, additional primitives are provided for explicit process synchronization when it is necessary for program correctness.
Vector data transfer functions help you transfer noncontiguous data more efficiently.
Functions are provided to detect completion and to enforce ordering.

Important LAPI concepts

To use the LAPI, it is important to understand the following concepts:

Origin and target
Blocking and non-blocking calls
Completion of communication operation
Message ordering and atomicity
Error handling
Progress

Origin and target

Origin denotes the task (or process or processor) that initiates a LAPI operation (PUT, GET, or active message.). Target denotes the other task whose address space is accessed. Although multiple tasks may run on a single node, it is convenient to think of each task as running on a different node. Therefore the origin task may also be referred to as the origin node and the target task as the target node. The origin and target can be the same for any of the calls, but if the origin and target data areas overlap, the results are undefined.

Blocking and non-blocking calls

A blocking procedure is one that returns only after the operation is complete. There are no restrictions on the reuse of user resources.

A non-blocking procedure is one that may return before the operation is complete and before the user is allowed to reuse all the resources specified in the call. A non-blocking operation is considered to be complete only after a completion testing function, such as LAPI_Waitcntr or Lapi_Getcntr, indicates that the operation is complete.

Completion of communication operation

A communication operation is considered to be complete, with respect to the buffer, when the buffer is reusable.

A PUT is complete with respect to the origin buffer when the data has been copied out of the buffer at the origin and may be overwritten. A GET is complete with respect to the origin buffer when that origin buffer holds the new data that was obtained by GET.

A PUT is complete with respect to the target buffer when the new data is available at the target buffer. A GET is complete with respect to the target buffer when the data has been copied out of the buffer at target and the target task may overwrite that buffer.

Communication behaviors

Two communication behaviors support two different definitions of "completion":

In standard behavior, a communication operation is defined as complete at the origin task when it is complete with respect to the origin buffer; it is complete at the target task when it is complete with respect to the target buffer.
In synchronous behavior, a communication operation is defined as complete at the origin task when it is complete with respect to both the origin buffer and target buffer. It is complete at the target task when it is complete with respect to the target buffer.

The LAPI defines both standard and synchronous behaviors for PUT operations. The LAPI defines only synchronous behavior for GET operations.

Message ordering and atomicity

Two LAPI operations that have the same origin task are considered to be ordered with respect to the origin if one of the operations starts after the other has completed at the origin task. Similarly, two LAPI operations that have the same target task are considered to be ordered with respect to the target if one of the operations starts after the other has completed at the target task. If two operations are not ordered, they are considered concurrent. The LAPI provides no guarantees of ordering for concurrent communication operations. The LAPI does provide mechanisms which an application can use to guarantee order.

As an example, consider the case where a node issues two standard behavior PUT operations to the same target node, where the targets overlap. These two operations may complete in any order, including the possibility of the first PUT overlapping the second, in time. The contents of the overlapping region will be undefined, even after both PUTs complete. Using synchronous behavior for both PUT operations, (waiting for the first to complete before starting the second) will ensure that the overlapping region contains the result of the second after both PUTs have completed.

Error handling

If an error occurs during a communications operation, the error may be signaled at the origin of operation, or the target or both. Some errors may be caught before the communication operation begins, and these will be signaled at the origin. However, some errors will not occur until the communication is in progress (a segmentation violation at the target, for example); these may be signaled at either or both ends of the communication.

Progress

All LAPI operations are unilateral by default and can complete successfully or fail, independent of the actions of other tasks. Specifically, a LAPI operation to a particular target should complete even if the target is stuck in an infinite loop: that is, when the target process is in interrupt mode.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]