Developing a Logical Volume Strategy

AIX Version 4.3 System Management Guide: Operating System and Devices

Developing a Logical Volume Strategy

The policies described in this section help you set a strategy for logical volume use that is oriented toward a combination of availability, performance, and cost that is appropriate for your site.

Availability is the ability to recover data that is lost because of disk, adapter, or other hardware problems. The recovery is made from copies of the data that are made and maintained on separate disks and adapters during normal system operation.

Performance is the average speed at which data is accessed. Policies such as write-verify and mirroring enhance availability but add to the system processing load, and thus degrade performance. Mirroring doubles or triples the size of the logical volume. In general, increasing availability degrades performance. Disk striping can increase performance but does not allow mirroring.

By controlling the allocation of data on the disk and between disks, you can tune the storage system for the highest possible performance. See "Monitoring and Tuning Memory Use," and "Monitoring and Turning Disk I/O," in AIX Versions 3.2 and 4 Performance Tuning Guide for detailed information on how to maximize storage-system performance.

The sections that follow should help you evaluate the tradeoffs among performance, availability, and cost. Remember that increased availability often decreases performance, and vice versa. Mirroring may increase performance, however, if the LVM chooses the copy on the least busy disk for Reads.

Note: Mirroring does not protect against the loss of individual files that are accidentally deleted or lost because of software problems. These files can only be restored from conventional tape or diskette backups.

This section discusses:

Prerequisites

It is important that you understand the material contained in the "Logical Volume Storage Overview".

Analyze Needs for Performance and Availability

Determine whether the data that will be stored in the logical volume is valuable enough to warrant the processing and disk-space costs of mirroring.

Performance and mirroring are not always opposed. If the different instances (copies) of the logical partitions are on different physical volumes, preferably attached to different adapters, the LVM can improve Read performance by reading the copy on the least busy disk. Writes, unless disks are attached to different adapters, always cost the same because you must update all copies, but you only need to Read one.

If you have a large sequential-access file system that is performance-sensitive, you may want to consider disk striping.

Normally, whenever data on a logical partition is updated, all the physical partitions containing that logical partition are automatically updated. However, physical partitions can become stale (no longer containing the most current data) because of system malfunctions or because the physical volume was unavailable at the time of an update. The LVM can refresh stale partitions to a consistent state by copying the current data from an up-to-date physical partition to the stale partition. This process is called mirror synchronization. The refresh can take place when the system is restarted, when the physical volume comes back online, or when you issue the syncvg command.

Note: The syncvg command should always be run in the foreground. Running syncvg in the background could prevent you from mounting a file system.

While mirroring improves storage system availability, it is not intended as a substitute for conventional tape backup arrangements.

Attention: Some logical volumes, such as the boot logical volume (hd5) and the dump device, should not be mirrored. Any dumps attempted to a mirrored logical volume will fail, and no error message will be displayed. In Version 4.1, the dump device may also be the paging device (hd6). In this case, either paging must not be mirrored or a new nonmirrored dump device must be created and used in place of hd6.

Determine Scheduling Policy for Mirrored Writes to Disk

For data that has only one physical copy, the logical volume device driver (LVDD) translates a logical Read or Write request address into a physical address and calls the appropriate physical device driver to service the request. This single-copy or nonmirrored policy handles bad block relocation for Write requests and returns all Read errors to the calling process.

If you use mirrored logical volumes, two different scheduling policies for writing to disk can be set for a logical volume with multiple copies, sequential and parallel.

The sequential-scheduling policy performs Writes to multiple copies or mirrors in order. The multiple physical partitions representing the mirrored copies of a single logical partition are designated primary, secondary, and tertiary. In sequential scheduling, the physical partitions are written to in sequence; the system waits for the Write operation for one physical partition to complete before starting the Write operation for the next one.

The parallel-scheduling policy starts the Write operation for all the physical partitions in a logical partition at the same time. When the Write operation to the physical partition that takes the longest to complete finishes, the Write operation is completed.

For Read operations on mirrored logical volumes with a sequential-scheduling policy, the primary copy is read. If that Read operation is unsuccessful, the next copy is read. During the Read retry operation on the next copy, the failed primary copy is corrected by the LVM with a hardware relocation. Thus the bad block that prevented the first Read from completing is patched for future access.

Specifying mirrored logical volumes with a parallel-scheduling policy may improve I/O read-operation performance, because multiple copies allow the system to direct the Read operation to the copy that can be most quickly accessed.

Determine Mirror Write Consistency (MWC) Policy for a Logical Volume

Mirror Write Consistency (MWC) identifies which logical partitions may be inconsistent if the system or the volume group is not shut down properly. When the volume group is varied back online for use, this information is used to make logical partitions consistent again.

If a logical volume is using MWC, then requests for this logical volume are held within the scheduling layer until the MWC cache blocks can be updated on the target physical volumes. When the MWC cache blocks have been updated, the request proceeds with the physical data Write operations.

When MWC is being used, system performance can be adversely affected. This is caused by the overhead of logging or journalling that a Write request is active in a Logical Track Group (LTG) (32 4K-byte pages or 128K bytes). This overhead is for mirrored Writes only. It is necessary to guarantee data consistency between mirrors only if the system or volume group crashes before the Write to all mirrors has been completed. When MWC is not used, the mirrors of a mirrored logical volume can be left in an inconsistent state in the event of a system or volume group crash.

After a crash, any mirrored logical volume that has MWC turned off should do a forced sync (syncvg -f -l LVname) before the data within the logical volume is used. With MWC turned off, Writes outstanding at the time of the crash can leave mirrors with inconsistent data the next time the volume group is varied on. An exception to this is logical volumes whose content is only valid while the logical volume is open,such as paging spaces.

A mirrored logical volume is no different really than a non-mirrored logical volume with respect to a Write. When LVM completely finishes with a Write request the data has been written to the drive(s) below LVM. Until LVM issues an iodone on a Write the outcome of the Write is unknown. Any blocks being written that have not been completed (iodone) when a machine crashes should be rewritten whether mirrored or not and regardless of the MWC setting.

MWC only makes mirrors consistent when the volume group is varied back online after a crash by picking one mirror and propagating that data to the other mirrors. MWC does not keep track of the latest data it only keeps track of LTGs currently being written, therefore MWC does not guarantee that the lastest data will be propagated to all the mirrors. It is the application above LVM that has to determine the validity of the data after a crash. From the LVM prospective, if the application always reissues all outstanding Writes from the time of the crash,the possiblely inconsistent mirrors will be consistent when these Writes finish, (as long as the same blocks are written after the crash as were outstanding at the time of the crash).

Choose an Inter-Disk Allocation Policy for Your System

The inter-disk allocation policy specifies the number of disks on which a logical volume's physical partitions are located. The physical partitions for a logical volume might be located on a single disk or spread across all the disks in a volume group. Two options in the mklv and chlv commands are used to determine inter-disk policy:

The Range option determines the number of disks used for a single physical copy of the logical volume.
The Strict option determines whether the mklv operation will succeed if two or more copies must occupy the same physical volume.
Striped logical volumes can only have a maximum range and a strict inter-disk policy.

Inter-Disk Settings for a Single Copy of the Logical Volume

If you select the minimum inter-disk setting (Range = minimum), the physical partitions assigned to the logical volume are located on a single disk to enhance availability. If you select the maximum inter-disk setting (Range = maximum), the physical partitions are located on multiple disks to enhance performance. The allocation of mirrored copies of the original partitions is discussed in the following section.

For nonmirrored logical volumes, use the minimum setting to provide the greatest availability (access to data in case of hardware failure). The minimum setting indicates that one physical volume should contain all the original physical partitions of this logical volume if possible. If the allocation program must use two or more physical volumes, it uses the minimum number, while remaining consistent with other parameters.

By using the minimum number of physical volumes, you reduce the risk of losing data because of a disk failure. Each additional physical volume used for a single physical copy increases that risk. An nonmirrored logical volume spread across four physical volumes is four times as likely to lose data because of one physical volume failure than a logical volume contained on one physical volume.

The Minimum Inter-Disk Allocation Policy figure illustrates a minimum inter-disk allocation policy:

The maximum setting, considering other constraints, spreads the physical partitions of the logical volume as evenly as possible over as many physical volumes as possible. This is a performance-oriented option, because spreading the physical partitions over several disks tends to decrease the average access time for the logical volume. To improve availability, the maximum setting should only be used with mirrored logical volumes.

The Maximum Inter-Disk Allocation Policy figure illustrates a maximum inter-disk allocation policy:

These definitions are also applicable when extending or copying an existing logical volume. The allocation of new physical partitions is determined by your current allocation policy and where the existing used physical partitions are located.

Inter-Disk Settings for Logical Volume Copies

The allocation of a single copy of a logical volume on disk is fairly straightforward. When you create mirrored copies, however, the resulting allocation is somewhat complex. The figures that follow show minimum and maximum inter-disk (Range) settings for the first instance of a logical volume along with the available Strict settings for the mirrored logical volume copies.

For example, if there are mirrored copies of the logical volume, the minimum setting causes the physical partitions containing the first instance of the logical volume to be allocated on a single physical volume, if possible. Then, depending on the setting of the Strict option, the additional copy or copies are allocated on the same or on separate physical volumes. In other words, the algorithm uses the minimum number of physical volumes possible, within the constraints imposed by other parameters such as the Strict option, to hold all the physical partitions.

The setting Strict = y means that each copy of the logical partition will be placed on a different physical volume. The setting Strict = n means that the copies are not restricted to different physical volumes.

Note: If there are fewer physical volumes in the volume group than the number of copies per logical partition you have chosen, you should set Strict to n. If Strict is set to y, an error message is returned when you try to create the logical volume.

The Minimum Inter-Disk Policy/Strict figure illustrates a minimum inter-disk allocation policy with differing Strict settings:

The Maximum Inter-Disk Policy/Strict figure illustrates a maximum inter-disk allocation policy with differing Strict settings:

Choose an Intra-Disk Allocation Policy for Each Logical Volume

The closer a given physical partition is to the center of a physical volume, the lower the average seek time because the center has the shortest average seek distance from any other part of the disk.

The file system log is a good candidate for allocation at the center of a physical volume because it is used by the operating system so often. At the other extreme, the boot logical volume is used infrequently and therefore should be allocated at the edge or middle of the physical volume.

The general rule, then, is that the more I/Os, either absolutely or during the running of an important application, the closer to the center of the physical volumes the physical partitions of the logical volume should be allocated. This rule has two important exceptions:

Logical volumes on 200MB, 540MB, or 1GB disks that contain large, sequential files should be at the edge because sequential performance is better there (there are more blocks per track at the edge than farther in).
Mirrored logical volumes with Mirror Write Consistency (MWC) set to ON should be at the edge because that is where the system writes MWC data. If mirroring is not in effect, MWC does not apply and does not affect performance. Otherwise, see "Performance Implications of Disk Mirroring" in the section on performance-related installation guidelines in the AIX Versions 3.2 and 4 Performance Tuning Guide.

The intra-disk allocation policy choices are based on the five regions of a disk where physical partitions can be located. The five regions are: outer edge, inner edge, outer middle, inner middle, and center. The edge partitions have the slowest average seek times, which generally result in longer response times for any application that uses them. The center partitions have the fastest average seek times, which generally result in the best response time for any application that uses them. There are, however, fewer partitions on a physical volume at the center than at the other regions.

The Five Regions of a Disk illustration shows the regions that can be used for allocating physical partitions in a physical volume.

Combining Allocation Policies

If you select inter-disk and intra-disk policies that are not compatible, you may get unpredictable results. The system will assign physical partitions by allowing one policy to take precedence over the other. For example, if you choose an intra-disk policy of center and an inter-disk policy of minimum, the inter-disk policy will take precedence. The system will place all of the partitions for the logical volume on one disk if possible, even if the partitions will not all fit into the center region. Make sure you understand the interaction of the policies you choose before implementing them.

Using Map Files for Precise Allocation

If the default options provided by the inter- and intra-disk policies are not sufficient for your needs, consider creating map files to specify the exact order and location of the physical partitions for a logical volume.

You can use Web-based System Manager, SMIT, or the -m option for the mklv command to create map files.

Note: The -m option is not permitted with disk striping.

For example, to create a ten-partition logical volume called lv06 in the rootvg in partitions 1 through 3, 41 through 45, and 50 through 60 of hdisk1, you could use the following procedure from the command line.

Use the command:
```
lspv -p hdisk1
```
to verify that the physical partitions you plan to use are free to be allocated.
Create a file, such as /tmp/mymap1 , containing:
```
hdisk1:1-3
hdisk1:41-45
hdisk1:50-60
```
The mklv command will allocate the physical partitions in the order that they appear in the map file. Be sure that there are sufficient physical partitions in the map file to allocate the entire logical volume that you specify with the mklv command. (You can list more than you need.)

Use the command:

mklv -t jfs -y lv06 -m /tmp/mymap1 rootvg 10

Developing a Striped Logical Volume Strategy

Striped logical volumes are used for large sequential file systems that are frequently accessed and performance-sensitive. Striping is intended to improve performance. Since mirroring is not supported when striping is applied, the availability of the striped logical volume is low compared with a mirrored logical volume.

Attention: You may import a Version 3.2 created volume group into a Version 4.1 system, and you may import a Version 4.1 volume group into a Version 3.2. system, provided striping has not been applied. Once striping is put onto a disk, its importation into Version 3.2 is prevented. The current implementation of mksysb will not restore any striped logical volume after the mksysb image is restored.

To create a 12-partition striped logical volume called lv07 in VGName with a stripe size of 16KB across hdisk1, hdisk2, and hdisk3, you would enter the following:

mklv -y lv07 -S 16K VGName 12 hdisk1 hdisk2 
hdisk3

To create a 12-partition striped logical volume called lv08 in VGName with a stripe size of 8KB across any three disks within VGName, you would enter the following:

mklv -y lv08 -S 8K -u 3 VGName 12

For more information on how to improve performance by using disk striping, see AIX Versions 3.2 and 4 Performance Tuning Guide.

Determine a Write-Verify Policy

Using the write-verify option causes all Write operations to be verified by an immediate follow-up Read operation to check the success of the Write. If the Write operation is not successful, you will get an error message. This policy enhances availability but degrades performance because of the extra time needed for the Read. You can specify the use of a write-verify policy on a logical volume either when you create it (mklv) or later by changing it (chlv).

Implement Volume Group Policies

Use the lspv command to check your allocated and free physical volumes. In a standard configuration, the more disks that make up a quorum volume group the better the chance of the quorum remaining when a disk failure occurs. In a nonquorum group, a minimum of two disks must make up the volume group.
To ensure a quorum, add one or more physical volumes (see "Add fixed disk without data to existing volume group") , or "Add fixed disk without data to new volume group". To change a volume group to nonquorum status, see "Change a User-Defined Volume Group to Nonquorum Status".
The standard configuration provides a single volume group that includes multiple physical volumes attached to the same disk adapter and other supporting hardware. Reconfiguring the hardware is an elaborate step. Separate hardware is only necessary if your site requires high availability.