IBM Books

Managing Shared Disks


Application programming considerations for Virtual Shared Disks

Application programmers who work with virtual shared disks need to be aware of data integrity considerations, the IBM Virtual Shared Disk transmission protocol, and how to get information about the configured virtual shared disks by using C language interfaces.

Data alignment

For optimum performance, data buffers should be aligned on 4 Kilobyte boundaries. This allows the device driver to take advantage of KLAPI transport services that will avoid a data copy.

Data integrity

There are data integrity considerations involved in accessing data, synchronizing I/O, and checksum processing.

Accessing data on a Virtual Shared Disk

Use the raw device (also called device special file) to access a virtual shared disk, because the block device can give you stale data from the operating system cache.

Synchronizing I/O

A virtual shared disk is a device. On a single node, you must provide and use your own synchronization mechanisms for your reads and writes to insure data integrity. Since IBM Virtual Shared Disk allows multiple nodes to have simultaneous access to devices, you must provide your own distributed synchronization mechanisms. Neither IBM Virtual Shared Disk nor the SP system software provides a distributed synchronization mechanism.

Checksum processing on unreliable networks

The switch is a reliable communication medium. The switch adapter performs checksum processing in hardware to provide a reliable network. Some networks do not provide checksum processing (for example, the Ethernet). IBM Virtual Shared Disk does provide a checksum processing option for you to use on unreliable networks. You can use the cksumvsd command to turn checksum processing on and off in the IBM Virtual Shared Disk device driver. Because the switch provides checksum processing in hardware, do not use cksumvsd if you use css0 as your virtual shared disk communication adapter. For more information refer to PSSP: Command and Technical Reference.

Note:
Checksum processing must be set the same, either on or off, on all virtual shared disk nodes in a system partition.

IBM Virtual Shared Disk transmission protocol

IBM Virtual Shared Disk uses the Kernel LAPI protocol and its own UDP-like internet protocol. The requesting node implements an exponential back-off retransmission strategy. If the remote node does not service the request after about 15 minutes of retransmission, IBM Virtual Shared Disk fails the application's I/O request. Suspending and resuming a virtual shared disk (done by IBM Recoverable Virtual Shared Disk) gives all requests a fresh start with the full 15 minutes of retransmission time. The time needed for recovery with IBM Recoverable Virtual Shared Disk does not cause application requests to fail.

Sequence numbers

Each node keeps an expected and outgoing sequence number for all other nodes. Each time a node sends a request message to another node, it increments its outgoing sequence number for that node. If a node receives a message, and its sequence number is not within the valid range, based on the expected number for that node, the message is discarded. The valid range is defined as the current value of the expected number for the node, plus a fairly large window size. When a node receives a valid request message from another node, the receiving node resets its expected sequence number for that node to the sequence number in the message.

IBM Recoverable Virtual Shared Disk resets sequence numbers for its users.

Sequence numbers become significant if a node in the virtual shared disk cluster is rebooted after it has performed virtual shared disk I/O. When the node comes back up, all its sequence numbers will be reset to 0, but all other nodes in the system partition will expect numbers greater than the last sequence number they received from the node before it rebooted, that is, nonzero. Therefore, none of the other existing nodes that received messages from the node before it crashed will now accept requests from the rebooted node because the sequence number is below the expected value. The rebooted node continues to retransmit, increasing the outgoing sequence number each time, until it eventually reaches the expected range on all the other nodes. Some requests, however, may exhaust their retry limits and fail. The rebooted node will probably accept requests from the existing nodes as their sequence number will be within the range of the expected count (0) and the (large) window size.

When one node's sequence numbers are not synchronized with the rest of the virtual shared disk nodes, it will look like all virtual shared disk I/O to remote servers from the out-of-synch node is hung. After about 15 minutes of retransmissions, the I/O will start to fail if the system administrator does not intervene and reset the sequence numbers.

Checking sequence numbers

To check sequence numbers from the IBM Virtual Shared Disk Perspective:

Alternatively, you can run the statvsd command on client and server nodes. (See the statvsd command reference pages for more details.)

Resetting sequence numbers

To reset sequence numbers from the IBM Virtual Shared Disk Perspective:

To reset sequence numbers with line commands, use the -R and -r options of the ctlvsd to reset the sequence number of all or some of the nodes to 0 (both the expected and the outgoing). Before issuing ctlvsd, put the virtual shared disk in suspended state. (See the ctlvsd command reference pages for more details.)

Getting Virtual Shared Disk information with C interfaces

Note:
The header file /usr/include/vsd_ioctl.h is installed with the vsd.vsdd option. Structures defined in vsd_ioctl.h might change from release to release. If ioctl fails, you might have to recompile.

The IBM Virtual Shared Disk base code has a sample directory: /usr/lpp/csd/samples.

This directory contains the following files:

vsdinfo.c
When compiled, a binary is generated that shows information for the vsd_name given from the command line.

vsdsinfo.c
When compiled, a binary is generated that lists information for all configured virtual shared disks.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]