IBM Books

Managing Shared Disks


Tuning Virtual Shared Disk performance

The IBM Virtual Shared Disk device driver passes all its requests to the underlying Logical Volume Manager subsystem. Before you tune the virtual shared disk, check that the I/O subsystem is not the bottleneck. See AIX Performance and Tuning Guide for information on I/O subsystem performance and tuning. If an overloaded I/O subsystem is degrading your system's performance, tuning the virtual shared disk will not help. In the case of I/O subsystem overload, consider spreading the I/O load over more disks or nodes.

For best performance, do the following:

  1. Use the defaults when defining virtual shared disks (refer to Designating nodes as IBM Virtual Shared Disk nodes)
  2. Turn IBM Virtual Shared Disk caching off if you are not using your system for online transaction processing.
  3. Unload the device driver from the kernel. See Unloading the device driver from the kernel.
  4. Do a performance run to collect statistics on the virtual shared disks, your I/O subsystem, and the CPU on all nodes. Issue statvsd several times during the performance run and compare the values for the various statistics. Use iostat to check your disk utilization. If you notice increasing numbers of queued requests, do the following:
  5. Reset the statistics counter by running the ctlvsd command (you can use the Run Command... action of the IBM Virtual Shared Disk Perspective graphical user interface).
  6. Do another performance run.

You should generally operate with IBM Virtual Shared Disk caching off. Memory is better allocated to the operating system itself, for paging, and to the cache belonging to the application using the virtual shared disk. To turn IBM Virtual Shared Disk caching off, do the following:

  1. Shut down your applications that use virtual shared disks and stop activity (see Stopping Virtual Shared Disk activity)
  2. If you do not use the IBM Recoverable Virtual Shared Disk subsystem, unconfigure the virtual shared disks (see Unconfiguring Virtual Shared Disks and Hashed Shared Disks)
  3. Select one or more nodes
  4. Use the Run Command... action and run the updatevsdtab command to change the cache/nocache option to nocache
  5. If you do not use the IBM Recoverable Virtual Shared Disk subsystem, configure the virtual shared disks (see Configuring Virtual Shared Disks or Hashed Shared Disks)
  6. If you do use the IBM Recoverable Virtual Shared Disk subsystem, refresh the virtual shared disk configuration (see Refreshing the IBM Recoverable Virtual Shared Disk subsystem)
  7. Restart your applications.

If you do use caching, remember that the IBM Virtual Shared Disk component only caches 4KB requests aligned on 4KB boundaries.

See the PSSP: Command and Technical Reference for command options and syntax.

Tunable parameters related to Virtual Shared Disks

The IBM Virtual Shared Disk device driver has been modified so that it is mostly self-tuning. The following parameters are no longer tunable:

The main tunable parameters are:

These are discussed with relevant tuning considerations in the following sections. You should also consider the tunable characteristics of the applications that use virtual shared disks, especially the use of buffers.

Logical Volume Manager (LVM) tuning considerations

There is always an associated logical volume for every virtual shared disk defined and configured in a system. Every virtual shared disk I/O request eventually becomes an I/O request to the associated logical volume (unless you get a cache hit at the server). This mapping of virtual shared disk I/O requests to the associated logical volume I/O requests is handled transparently by the IBM Virtual Shared Disk subsystem. All the performance tuning considerations that apply to a logical volume also apply to a virtual shared disk. Refer to AIX System Management Guide: Operating Systems and Devices and AIX Performance and Tuning Guide for information on the performance tuning of logical volumes.

SP Switch considerations when using the IP protocol for data transmissions

If you configure the virtual shared disk nodes to use the SP Switch (css0), set the maximum IP message size (utilized by the virtual shared disk driver) to 61440 (60KB). Note that the value you assign to maximum_buddy_buffer_size in the SDR also limits the maximum size of the request that the IBM Virtual Shared Disk subsystem sends across the nodes. For example, if you have:

the following transmission sequence occurs:

  1. The IBM Virtual Shared Disk subsystem divides the 256KB of data into four 64KB requests in four buddy buffers
  2. Each 64KB block of data becomes one 60KB packet and one 4KB packet for transmission to the server via IP
  3. At the server, the eight packets are reassembled into four 64KB blocks of data, each in a 64KB buddy buffer
  4. The server then has to perform four 64KB write operations and return four acknowledgements to the client.

A better scenario for the same write operation would use the maximum buddy buffer size:

Producing the following transmission sequence:

  1. The 256KB request becomes four 60 KB packets and one 16KB packet for transmission to the server via IP
  2. At the server, the five packets are reassembled into one 256KB block of data in a single buddy buffer
  3. The server then performs one 256 KB write operation and returns an acknowledgement to the client.

The second scenario is preferable to the first because the I/O operations at the server are minimized. A perfect scenario would be one where the IBM Virtual Shared Disk component does not use buddy buffers at all -- when the client request is less than or equal to the maximum IP message size. For example:

When you use the switch, send pool clusters are used instead of buddy buffers as long as the request size is less than the ip_message_size, as in the example just cited. Buddy buffers are used only when a shortage in the switch buffer pool occurs or when the size of the request is greater than the ip_message_size. If you see buddy buffer shortages, instead of increasing your buddy buffers, you need to increase your switch send pool size. See mbufs and the switch pool.

mbufs and the switch pool

mbufs are used for data transfer between the client and the server nodes by the IBM Virtual Shared Disk subsystem's own UDP-like internet protocol. If you are using the switch (css0) as your communications adapter, the IBM Virtual Shared Disk component uses mbuf clusters to do I/O directly from the switch's send and receive pools for data requests that are less than or equal to 60 KB.

If you notice that the indirect I/O statistic (from the IBM Virtual Shared Disk Perspectives Statistics notebook page or from the output of the vsdstat command) is incremented consistently, run errpt to check the error log. If you see the line:

IFIOCTL_MGET(): send pool shortage

you should consider increasing the size of the send and receive pools.

To check the current sizes of the send and receive pools, type:

lsattr -l css0 -E

The default size for each pool is 524288 bytes (512KB). IBM suggests setting this to 16MB.

To change the sizes of the pools to 16MB, type:

/usr/lpp/ssp/css/chgcss -l css0 -a spoolsize=16777216
/usr/lpp/ssp/css/chgcss -l css0 -a rpoolsize=16777216
Note:
You must reboot the node for the new sizes to take effect.

System performance considerations regarding mbufs and mbuf clusters also apply to virtual shared disk environments. See AIX Performance and Tuning Guide for more information.

Buddy buffers

The virtual shared disk server node uses the buddy buffer to temporarily store data for I/O operations originating at a client node and to handle requests that are greater than the ip_message_size. In contrast to the data in the cache buffer, the data in a buddy buffer is purged immediately after the I/O operation completes.

The values associated with the buddy buffer are:

These values can be set using the IBM Virtual Shared Disk Perspective graphical user interface or the vsdnode command.

Buddy buffer space is allocated in powers of two. If an I/O request size is not a power of two, the smallest power of two that is larger than the request is allocated. For example, for a request size of 24KB, 32KB are allocated on the server.

If you are using the switch as your adapter for virtual shared disks, IBM suggests setting 4096 (4KB) and 262144 (256KB), respectively, for minimum and maximum buddy buffer size allocated to a single request.

To define the total size of the buddy buffer, consider the remote I/O throughput for the server and specify the number of maximum-sized buddy buffers in the buffer. For example, if you expect the server to serve 10MB per second on behalf of remote clients and a request spends an average of 60 milliseconds on the server, multiply 10MB per second X 0.06 second and, for safety, double or triple the result for a total buddy buffer size of 1.8MB (eight 256KB maximum buddy buffers). IBM suggests the range of 32-96 buddy buffers (where the maximum buddy buffer size has been set to 256 KB)

If the virtual shared disk statistics consistently show requests queued waiting for buddy buffers, check the switch send and receive pool sizes before trying to add more buddy buffers. (See mbufs and the switch pool) or spread the data over disks attached to other nodes, to prevent a bottleneck.

Note:
If your application uses the fastpath option of asynchronous I/O, the maximum buddy buffer size must be greater than or equal to 128KB. Otherwise, you will get EMSGSIZE "Message too long" errors.

SP Switch considerations when using the KLAPI protocol for data transmissions

The maximum IP message size is not used to break up data requests when the IBM Virtual Shared Disk subsystem transmits data using the KLAPI protocol. The maximum IP message size should still be set to 61440 (60KB) because the IP protocol will still be used at times.

Buddy buffers

The KLAPI protocol always uses buddy buffers. See Buddy buffers for more information.

Buffer allocation

Your application should make all new allocated buffers on the page boundary. If your I/O buffer is not aligned on a page boundary, the IBM Virtual Shared Disk device drivers will not parallelize I/O requests to underlying virtual shared disks and performance will be degraded. In addition, the KLAPI transport service requires the data to be page aligned to avoid a data copy.

The cache buffer

Each IBM Virtual Shared Disk device driver, that is, each node, has a single cache buffer, shared by all cacheable virtual shared disks configured on and served by the node. The cache buffer is used to store the most recently accessed data from the cached virtual shared disks (associated logical volumes) on the server node. The objective is to minimize physical disk I/O activity. If the requested data is found in the cache, it is read from the cache, rather than the corresponding logical volume.

Data in the cache is stored in 4KB blocks. The content of the cache is a replica of the corresponding data blocks on the physical disks. Write-through cache semantics apply; that is, the write operation is not complete until the data is on the disk.

When you create virtual shared disks with the IBM Virtual Shared Disk Perspective graphical user interface or the createvsd command, you can specify the cache option or the nocache option. IBM suggests that you specify nocache (or make the cache buffer small) in most instances (especially in the cases of read-only or other than 4KB applications) for the following reasons:

If you are running an application that involves heavy writing followed immediately by reading, it might be advantageous to turn the cache buffer on for some virtual shared disks on a particular node. Choose the appropriate size for the cache based on the expected throughput and the expected time lag between writes and reads. For example, if the expected throughput is 100 4 KB-aligned I/O operations per second and reads lag writes by 0.5 seconds, calculate the cache buffer size by multiplying 100 X 0.5 and, as a safety factor, double it for a total of 100 cache blocks.

The lsvsd -s command gives detailed statistics on virtual shared disk cache hits and I/O activities. This will tell you which virtual shared disks are heavily used. See Monitoring Virtual Shared Disks for information on how to see statistics using the IBM Virtual Shared Disk Perspective and the Event Perspective graphical user interfaces.

Maximum I/O request size

The following factors limit the block size that the IBM Virtual Shared Disk subsystem uses to process each I/O request:

The atomicity of an I/O operation is gated by the size of the virtual shared disk request, rather than the size of the application request (if the virtual shared disk request is smaller than the application request the application request would be split down to the size of the virtual shared disk request). The atomicity of an I/O operation is also gated by the max BB size.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]