The memory management algorithm, discussed in Real-Memory Management, tries to keep the size of the free list and the percentage of real memory occupied by persistent segment pages within specified bounds. These bounds can be altered with the vmtune command, which can only be run by the root user. Changes made by this tool remain in effect until the next reboot of the system. To determine whether the vmtune command is installed and available, run the following command:
# lslpp -lI bos.adt.samples
Executing the vmtune command on AIX 4.3.3 with no options results in the following output:
# /usr/samples/kernel/vmtune vmtune: current values: -p -P -r -R -f -F -N -W minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages maxrandwrt 52190 208760 2 8 120 128 524288 0 -M -w -k -c -b -B -u -l -d maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps 209581 4096 1024 1 93 96 9 131072 1 -s -n -S -h sync_release_ilock nokillroot v_pinshm strict_maxperm 0 0 0 0 number of valid memory pages = 261976 maxperm=79.7% of real memory maximum pinable=80.0% of real memory minperm=19.9% of real memory number of file memory pages = 19772 numperm=7.5% of real memory
The output shows the current settings for all the parameters.
Executing the vmtune command on AIX 5.2 results in the following output:
# /usr/samples/kernel/vmtune vmtune: current values: -M -p -P -t -f -F -l maxpin% minperm% maxperm% maxclient% minfree maxfree lrubucket 80.1 20.0 80.0 80.0 120 128 131072 -r -R -N -W -w -k -c minpgahead maxpgahead pd_npages maxrandwrt npswarn npskill numclust 2 8 65536 0 4096 1024 1 -b -B -u -d -s -S numfsbufs hd_pbuf_cnt lvm_bufcnt defps sync_release_ilock v_pinshm 186 448 9 1 0 0 -L -g -h -n -j lgpg_regions lgpg_size strict_maxperm nokilluid j2_nPagesPerWriteBehindCluster 0 0 0 0 8 -J -z -Z j2_maxRandomWrite j2_nRandomCluster j2_nBufferPerPagerDevice 0 0 512 -q -Q -V -i j2_minPageReadAhead j2_maxPageReadAhead num_spec_dataseg spec_dataseg_int 2 8 0 512 -y mem_affinity_on 0 maxpin pages = 838861 minperm pages = 199609 memory size pages = 1048576 maxperf pages = 798436 remote pageouts scheduled 0 maxclient pages = 798436 file pages 7690 numperm = 0.7% of real memory compressed pages = 0 numcompress= 0.0% of real memory client pages = 0 numclient= 0.0% of real memory
The purpose of the free list is to keep track of real-memory page frames released by terminating processes and to supply page frames to requestors immediately, without forcing them to wait for page steals and the accompanying I/O to complete. The minfree limit specifies the free-list size below which page stealing to replenish the free list is to be started. The maxfree parameter is the size above which stealing will end.
The objectives in tuning these limits are to ensure that:
The default value of minfree and maxfree depend on the memory size of the machine. The default value of maxfree is determined by this formula:
maxfree = minimum (# of memory pages/128, 128)
By default the minfree value is the value of maxfree - 8. However, the difference between minfree and maxfree should always be equal to or greater than maxpgahead. Or in other words, the value of maxfree should always be greater than or equal to minfree plus the size of maxpgahead. The minfree/maxfree values will be different if there is more than one memory pool. Memory pools were introduced in AIX 4.3.3 for MP systems with large amounts of RAM. Each memory pool will have its own minfree/maxfree which are determined by the previous formulas, but the minfree/maxfree values shown by the vmtune command will be the sum of the minfree/maxfree for all memory pools.
Remember, that minfree pages in some sense are wasted, because they are available, but not in use. If you have a short list of the programs you want to run fast, you can investigate their memory requirements with the svmon command (see Determining How Much Memory Is Being Used), and set minfree to the size of the largest. This technique risks being too conservative because not all of the pages that a process uses are acquired in one burst. At the same time, you might be missing dynamic demands that come from programs not on your list that may lower the average size of the free list when your critical programs run.
A less precise but more comprehensive tool for investigating an appropriate size for minfree is the vmstat command. The following is a portion of a vmstat command output obtained while running an C compilation on an otherwise idle system.
# vmstat 1 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 3085 118 0 0 0 0 0 0 115 2 19 0 0 99 0 0 0 3086 117 0 0 0 0 0 0 119 134 24 1 3 96 0 2 0 3141 55 2 0 6 24 98 0 175 223 60 3 9 54 34 0 1 3254 57 0 0 6 176 814 0 205 219 110 22 14 0 64 0 1 3342 59 0 0 42 104 249 0 163 314 57 43 16 0 42 1 0 3411 78 0 0 49 104 169 0 176 306 51 30 15 0 55 1 0 3528 160 1 0 10 216 487 0 143 387 54 50 22 0 27 1 0 3627 94 0 0 0 72 160 0 148 292 79 57 9 0 34 1 0 3444 327 0 0 0 64 102 0 132 150 41 82 8 0 11 1 0 3505 251 0 0 0 0 0 0 128 189 50 79 11 0 11 1 0 3550 206 0 0 0 0 0 0 124 150 22 94 6 0 0 1 0 3576 180 0 0 0 0 0 0 121 145 30 96 4 0 0 0 1 3654 100 0 0 0 0 0 0 124 145 28 91 8 0 1 1 0 3586 208 0 0 0 40 68 0 123 139 24 91 9 0 0
Because the compiler has not been run recently, the code of the compiler itself must be read in. All told, the compiler acquires about 2 MB in about 6 seconds. On this 32 MB system, maxfree is 64 and minfree is 56. The compiler almost instantly drives the free list size below minfree, and several seconds of rapid page-stealing activity take place. Some of the steals require that dirty working segment pages be written to paging space, which shows up in the po column. If the steals cause the writing of dirty permanent segment pages, that I/O does not appear in the vmstat report (unless you have directed the vmstat command to report on the I/O activity of the physical volumes to which the permanent pages are being written).
This example describes a fork() and exec() environment (not an environment where a process is long lived, such as in a database) and is not intended to suggest that you set minfree to 500 to accommodate large compiles. It suggests how to use the vmstat command to identify situations in which the free list has to be replenished while a program is waiting for space. In this case, about 2 seconds were added to the compiler execution time because there were not enough page frames immediately available. If you observe the page frame consumption of your program, either during initialization or during normal processing, you will soon have an idea of the number page frames that need to be in the free list to keep the program from waiting for memory.
If we concluded from the example above that minfree needed to be 128, and we had set maxpgahead to 16 to improve sequential performance, we would use the following vmtune command:
# /usr/samples/kernel/vmtune -f 128 -F 144
In operating system versions later than AIX 4.3.3, the vmtune -m number_of_memory_pools command allows you to change the number of memory pools that are configured at system boot time. The -m flag is therefore not a dynamic change. The change is written to the kernel file if it is an MP kernel (the change is not allowed on a UP kernel). A value of 0 restores the default number of memory pools.
By default, the vmtune -m command writes to the /usr/lib/boot/unix_mp file (/usr/lib/boot/unix_64 if running on a 64-bit kernel), but this can be changed with the command vmtune -U path_to_unix_file. Before changing the kernel file, the vmtune command saves the original file as name_of_original_file.sav.
Tuning lrubucket can reduce scanning overhead on large memory systems. In AIX 4.3, a new parameter lrubucket was added. The page-replacement algorithm scans memory frames looking for a free frame. During this scan, reference bits of pages are reset, and if a free frame has not been found, a second scan is done. In the second scan, if the reference bit is still off, the frame will be used for a new page (page replacement).
On large memory systems, there may be too many frames to scan, so now memory is divided up into buckets of frames. The page-replacement algorithm will scan the frames in the bucket and then start over on that bucket for the second scan before moving on to the next bucket. The default number of frames in this bucket is 131072 or 512 MB of RAM. The number of frames is tunable with the command vmtune -l, and the value is in 4 K frames.
The operating system takes advantage of the varying requirements for real memory by leaving in memory pages of files that have been read or written. If the file pages are requested again before their page frames are reassigned, this technique saves an I/O operation. These file pages may be from local or remote (for example, NFS) file systems.
The ratio of page frames used for files versus those used for computational (working or program text) segments is loosely controlled by the minperm and maxperm values:
In a particular workload, it might be worthwhile to emphasize the avoidance of file I/O. In another workload, keeping computational segment pages in memory might be more important. To understand what the ratio is in the untuned state, we use the vmtune command with no arguments.
# /usr/samples/kernel/vmtune vmtune: current values: -p -P -r -R -f -F -N -W minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages maxrandwrt 209508 838032 2 8 120 128 524288 0 -M -w -k -c -b -B -u -l -d maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps 838852 4096 1024 1 186 160 9 131072 1 -s -n -S -L -g -h sync_release_ilock nokilluid v_pinshm lgpa_regions lgpg_size strict_maxperm 0 0 0 0 0 0 -t maxclient 838032 number of valid memory pages = 1048565 maxperm=79.9% of real memory maximum pinable = 80.0% of real memory minperm=20.0% of real memory number of file memory pages = 6668 numperm=0.6% of real memory number of compressed memory pages = 0 compressed=0.0% of real memory number of client memory pages = 0 numclient=0.0% of real memory # of remote pgs sched-pageout = 0 maxcllient=79.9% of real memory
The default values are calculated by the following algorithm:
minperm (in pages) = ((number of memory frames) - 1024) * .2 maxperm (in pages) = ((number of memory frames) - 1024) * .8
The numperm value gives the number of file pages in memory, 19772. This is 7.5 percent of real memory.
If we know that our workload makes little use of recently read or written files, we may want to constrain the amount of memory used for that purpose. The following command:
# /usr/samples/kernel/vmtune -p 15 -P 50
sets minperm to 15 percent and maxperm to 50 percent of real memory. This would ensure that the VMM would steal page frames only from file pages when the ratio of file pages to total memory pages exceeded 50 percent. This should reduce the paging to page space with no detrimental effect on the persistent storage. The maxperm value is not a strict limit, it is only considered when the VMM needs to perform page replacement. Because of this, it is usually safe to reduce the maxperm value on most systems.
On the other hand, if our application frequently references a small set of existing files (especially if those files are in an NFS-mounted file system), we might want to allow more space for local caching of the file pages by using the following command:
# /usr/samples/kernel/vmtune -p 30 -P 90
NFS servers that are used mostly for reads with large amounts of RAM can benefit from increasing the value of maxperm. This allows more pages to reside in RAM so that NFS clients can access them without forcing the NFS server to retrieve the pages from disk again.
Another example would be a program that reads 1.5 GB of sequential file data into the working storage of a system with 2 GB of real memory. You may want to set maxperm to 50 percent or less, because you do not need to keep the file data in memory.
Starting with AIX 4.3.3, a new vmtune option (-h) called strict_maxperm has been added. This option, when set to 1, places a hard limit on how much memory is used for a persistent file cache by making the maxperm value be the upper limit for this file cache. When the upper limit is reached, the least recently used (LRU) is performed on persistent pages.
The enhanced JFS file system uses client pages for its buffer cache, which are not affected by the maxperm and minperm threshold values. To establish hard limits on enhanced JFS file system cache, you can tune the maxclient parameter. This parameter represents the maximum number of client pages that can be used for buffer cache. To change this value, you can use the vmtune command with the -t option. The value for maxclient will be shown as a percentage of real memory.
After the maxclient threshold is reached, LRU will begin to steal client pages that have not been referenced recently. If not enough client pages can be stolen, the LRU might replace other types of pages. By reducing the value for maxclient, you help prevent Enhanced JFS file-page accesses from causing LRU to replace working storage pages, minimizing paging from paging space. The maxclient parameter also affects NFS clients and compressed pages. Also note that maxclient should generally be set to a value that is less than or equal to maxperm, particularly in the case where strict_maxperm is enabled.