Performance Management Guide

Tuning VMM Page Replacement with the vmtune Command

The memory management algorithm, discussed in Real-Memory Management, tries to keep the size of the free list and the percentage of real memory occupied by persistent segment pages within specified bounds. These bounds can be altered with the vmtune command, which can only be run by the root user. Changes made by this tool remain in effect until the next reboot of the system. To determine whether the vmtune command is installed and available, run the following command:

# lslpp -lI bos.adt.samples

Note

The vmtune command is in the samples directory because it is very VMM-implementation dependent. The vmtune code that accompanies each release of the operating system is tailored specifically to the VMM in that release. Running the vmtune command from one release on a different release might result in an operating-system failure. It is also possible that the functions of vmtune may change from release to release. Do not propagate shell scripts or /etc/inittab entries that include the vmtune command to a new release without checking the vmtune documentation for the new release to make sure that the scripts will still have the desired effect.

Executing the vmtune command on AIX 4.3.3 with no options results in the following output:

# /usr/samples/kernel/vmtune
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
  52190   208760       2          8        120      128     524288        0

  -M      -w      -k      -c        -b         -B           -u        -l    -d
maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps

209581    4096    1024       1      93         96          9      131072     1

        -s              -n         -S           -h
sync_release_ilock  nokillroot  v_pinshm  strict_maxperm
        0               0           0             0

number of valid memory pages = 261976   maxperm=79.7% of real memory
maximum pinable=80.0% of real memory    minperm=19.9% of real memory
number of file memory pages = 19772     numperm=7.5% of real memory

The output shows the current settings for all the parameters.

Executing the vmtune command on AIX 5.2 results in the following output:

# /usr/samples/kernel/vmtune
vmtune:  current values:
     -M        -p        -P          -t          -f       -F         -l
maxpin%  minperm%  maxperm%  maxclient%     minfree  maxfree  lrubucket
   80.1      20.0      80.0        80.0         120      128     131072

        -r         -R        -N         -W      -w      -k       -c
minpgahead maxpgahead pd_npages maxrandwrt npswarn npskill numclust
         2          8     65536          0    4096    1024        1

       -b          -B          -u     -d                  -s         -S
numfsbufs hd_pbuf_cnt  lvm_bufcnt  defps  sync_release_ilock   v_pinshm
      186         448           9      1                   0          0

          -L        -g             -h        -n                             -j
lgpg_regions lgpg_size strict_maxperm nokilluid j2_nPagesPerWriteBehindCluster
           0         0              0         0                              8

               -J                -z                       -Z 
j2_maxRandomWrite j2_nRandomCluster j2_nBufferPerPagerDevice
                0                 0                      512

                 -q                  -Q               -V               -i
j2_minPageReadAhead j2_maxPageReadAhead num_spec_dataseg spec_dataseg_int
                  2                   8                0              512

             -y
mem_affinity_on
              0

maxpin pages        =          838861   minperm pages    =        199609
memory size pages   =         1048576   maxperf pages    =        798436
remote pageouts scheduled           0   maxclient pages  =        798436
file pages                       7690   numperm  =   0.7% of real memory
compressed pages    =               0   numcompress= 0.0% of real memory
client pages        =               0   numclient=   0.0% of real memory

Choosing minfree and maxfree Settings

The purpose of the free list is to keep track of real-memory page frames released by terminating processes and to supply page frames to requestors immediately, without forcing them to wait for page steals and the accompanying I/O to complete. The minfree limit specifies the free-list size below which page stealing to replenish the free list is to be started. The maxfree parameter is the size above which stealing will end.

The objectives in tuning these limits are to ensure that:

Any activity that has critical response-time objectives can always get the page frames it needs from the free list.
The system does not experience unnecessarily high levels of I/O because of premature stealing of pages to expand the free list.

The default value of minfree and maxfree depend on the memory size of the machine. The default value of maxfree is determined by this formula:

maxfree = minimum (# of memory pages/128, 128)

By default the minfree value is the value of maxfree - 8. However, the difference between minfree and maxfree should always be equal to or greater than maxpgahead. Or in other words, the value of maxfree should always be greater than or equal to minfree plus the size of maxpgahead. The minfree/maxfree values will be different if there is more than one memory pool. Memory pools were introduced in AIX 4.3.3 for MP systems with large amounts of RAM. Each memory pool will have its own minfree/maxfree which are determined by the previous formulas, but the minfree/maxfree values shown by the vmtune command will be the sum of the minfree/maxfree for all memory pools.

Remember, that minfree pages in some sense are wasted, because they are available, but not in use. If you have a short list of the programs you want to run fast, you can investigate their memory requirements with the svmon command (see Determining How Much Memory Is Being Used), and set minfree to the size of the largest. This technique risks being too conservative because not all of the pages that a process uses are acquired in one burst. At the same time, you might be missing dynamic demands that come from programs not on your list that may lower the average size of the free list when your critical programs run.

A less precise but more comprehensive tool for investigating an appropriate size for minfree is the vmstat command. The following is a portion of a vmstat command output obtained while running an C compilation on an otherwise idle system.

# vmstat 1
kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  0  3085   118   0   0   0   0    0   0 115    2  19  0  0 99  0
 0  0  3086   117   0   0   0   0    0   0 119  134  24  1  3 96  0
 2  0  3141    55   2   0   6  24   98   0 175  223  60  3  9 54 34
 0  1  3254    57   0   0   6 176  814   0 205  219 110 22 14  0 64
 0  1  3342    59   0   0  42 104  249   0 163  314  57 43 16  0 42
 1  0  3411    78   0   0  49 104  169   0 176  306  51 30 15  0 55
 1  0  3528   160   1   0  10 216  487   0 143  387  54 50 22  0 27
 1  0  3627    94   0   0   0  72  160   0 148  292  79 57  9  0 34
 1  0  3444   327   0   0   0  64  102   0 132  150  41 82  8  0 11
 1  0  3505   251   0   0   0   0    0   0 128  189  50 79 11  0 11
 1  0  3550   206   0   0   0   0    0   0 124  150  22 94  6  0  0
 1  0  3576   180   0   0   0   0    0   0 121  145  30 96  4  0  0
 0  1  3654   100   0   0   0   0    0   0 124  145  28 91  8  0  1
 1  0  3586   208   0   0   0  40   68   0 123  139  24 91  9  0  0

Because the compiler has not been run recently, the code of the compiler itself must be read in. All told, the compiler acquires about 2 MB in about 6 seconds. On this 32 MB system, maxfree is 64 and minfree is 56. The compiler almost instantly drives the free list size below minfree, and several seconds of rapid page-stealing activity take place. Some of the steals require that dirty working segment pages be written to paging space, which shows up in the po column. If the steals cause the writing of dirty permanent segment pages, that I/O does not appear in the vmstat report (unless you have directed the vmstat command to report on the I/O activity of the physical volumes to which the permanent pages are being written).

This example describes a fork() and exec() environment (not an environment where a process is long lived, such as in a database) and is not intended to suggest that you set minfree to 500 to accommodate large compiles. It suggests how to use the vmstat command to identify situations in which the free list has to be replenished while a program is waiting for space. In this case, about 2 seconds were added to the compiler execution time because there were not enough page frames immediately available. If you observe the page frame consumption of your program, either during initialization or during normal processing, you will soon have an idea of the number page frames that need to be in the free list to keep the program from waiting for memory.

If we concluded from the example above that minfree needed to be 128, and we had set maxpgahead to 16 to improve sequential performance, we would use the following vmtune command:

# /usr/samples/kernel/vmtune -f 128 -F 144

Tuning Memory Pools

In operating system versions later than AIX 4.3.3, the vmtune -m number_of_memory_pools command allows you to change the number of memory pools that are configured at system boot time. The -m flag is therefore not a dynamic change. The change is written to the kernel file if it is an MP kernel (the change is not allowed on a UP kernel). A value of 0 restores the default number of memory pools.

By default, the vmtune -m command writes to the /usr/lib/boot/unix_mp file (/usr/lib/boot/unix_64 if running on a 64-bit kernel), but this can be changed with the command vmtune -U path_to_unix_file. Before changing the kernel file, the vmtune command saves the original file as name_of_original_file.sav.

Tuning lrubucket to Reduce Memory Scanning Overhead

Tuning lrubucket can reduce scanning overhead on large memory systems. In AIX 4.3, a new parameter lrubucket was added. The page-replacement algorithm scans memory frames looking for a free frame. During this scan, reference bits of pages are reset, and if a free frame has not been found, a second scan is done. In the second scan, if the reference bit is still off, the frame will be used for a new page (page replacement).

On large memory systems, there may be too many frames to scan, so now memory is divided up into buckets of frames. The page-replacement algorithm will scan the frames in the bucket and then start over on that bucket for the second scan before moving on to the next bucket. The default number of frames in this bucket is 131072 or 512 MB of RAM. The number of frames is tunable with the command vmtune -l, and the value is in 4 K frames.

Choosing minperm and maxperm Settings

The operating system takes advantage of the varying requirements for real memory by leaving in memory pages of files that have been read or written. If the file pages are requested again before their page frames are reassigned, this technique saves an I/O operation. These file pages may be from local or remote (for example, NFS) file systems.

The ratio of page frames used for files versus those used for computational (working or program text) segments is loosely controlled by the minperm and maxperm values:

If percentage of RAM occupied by file pages rises above maxperm, page-replacement steals only file pages.
If percentage of RAM occupied by file pages falls below minperm, page-replacement steals both file and computational pages.
If percentage of RAM occupied by file pages is between minperm and maxperm, page-replacement steals only file pages unless the number of file repages is higher than the number of computational repages.

In a particular workload, it might be worthwhile to emphasize the avoidance of file I/O. In another workload, keeping computational segment pages in memory might be more important. To understand what the ratio is in the untuned state, we use the vmtune command with no arguments.

# /usr/samples/kernel/vmtune
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
 209508   838032       2          8        120      128     524288        0

  -M      -w      -k      -c        -b         -B           -u        -l    -d
maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps
838852    4096    1024       1     186        160          9      131072     1

        -s              -n         -S         -L           -g            -h
sync_release_ilock  nokilluid  v_pinshm  lgpa_regions  lgpg_size  strict_maxperm
        0               0           0          0            0             0

   -t
maxclient
  838032

number of valid memory pages = 1048565   maxperm=79.9% of real memory
maximum pinable = 80.0% of real memory   minperm=20.0% of real memory
number of file memory pages = 6668       numperm=0.6% of real memory
number of compressed memory pages = 0    compressed=0.0% of real memory
number of client memory pages = 0        numclient=0.0% of real memory
# of remote pgs sched-pageout = 0        maxcllient=79.9% of real memory

The default values are calculated by the following algorithm:

minperm (in pages) = ((number of memory frames) - 1024) * .2
maxperm (in pages) = ((number of memory frames) - 1024) * .8

The numperm value gives the number of file pages in memory, 19772. This is 7.5 percent of real memory.

If we know that our workload makes little use of recently read or written files, we may want to constrain the amount of memory used for that purpose. The following command:

# /usr/samples/kernel/vmtune -p 15 -P 50

sets minperm to 15 percent and maxperm to 50 percent of real memory. This would ensure that the VMM would steal page frames only from file pages when the ratio of file pages to total memory pages exceeded 50 percent. This should reduce the paging to page space with no detrimental effect on the persistent storage. The maxperm value is not a strict limit, it is only considered when the VMM needs to perform page replacement. Because of this, it is usually safe to reduce the maxperm value on most systems.

On the other hand, if our application frequently references a small set of existing files (especially if those files are in an NFS-mounted file system), we might want to allow more space for local caching of the file pages by using the following command:

# /usr/samples/kernel/vmtune -p 30 -P 90

NFS servers that are used mostly for reads with large amounts of RAM can benefit from increasing the value of maxperm. This allows more pages to reside in RAM so that NFS clients can access them without forcing the NFS server to retrieve the pages from disk again.

Another example would be a program that reads 1.5 GB of sequential file data into the working storage of a system with 2 GB of real memory. You may want to set maxperm to 50 percent or less, because you do not need to keep the file data in memory.

Placing a Hard Limit on Persistent File Cache with strict_maxperm

Starting with AIX 4.3.3, a new vmtune option (-h) called strict_maxperm has been added. This option, when set to 1, places a hard limit on how much memory is used for a persistent file cache by making the maxperm value be the upper limit for this file cache. When the upper limit is reached, the least recently used (LRU) is performed on persistent pages.

Attention: The strict_maxperm option should only be enabled for those cases that require a hard limit on the persistent file cache. Improper use of strict_maxperm can cause unexpected system behavior because it changes the VMM method of page replacement.

Placing a Hard Limit on Enhanced JFS File System Cache with maxclient

The enhanced JFS file system uses client pages for its buffer cache, which are not affected by the maxperm and minperm threshold values. To establish hard limits on enhanced JFS file system cache, you can tune the maxclient parameter. This parameter represents the maximum number of client pages that can be used for buffer cache. To change this value, you can use the vmtune command with the -t option. The value for maxclient will be shown as a percentage of real memory.

After the maxclient threshold is reached, LRU will begin to steal client pages that have not been referenced recently. If not enough client pages can be stolen, the LRU might replace other types of pages. By reducing the value for maxclient, you help prevent Enhanced JFS file-page accesses from causing LRU to replace working storage pages, minimizing paging from paging space. The maxclient parameter also affects NFS clients and compressed pages. Also note that maxclient should generally be set to a value that is less than or equal to maxperm, particularly in the case where strict_maxperm is enabled.