Performance Management Guide

Tuning VMM Memory Load Control with the schedtune Command

The VMM memory load control facility, described in VMM Memory Load Control Facility, protects an overloaded system from thrashing.

For early versions of the operating system, if a large number of processes hit the system at the same time, memory became overcommitted and thrashing occurred, causing performance to degrade rapidly. A memory-load control mechanism was developed that could detect thrashing. Certain parameters affect the function of the load control mechanism.

With the schedtune command, the root user can affect the criteria used to determine thrashing, the criteria used to determine which processes to suspend, the length of time to wait after thrashing ends before reactivating processes, the minimum number of processes exempt from suspension, or reset values to the defaults. To determine whether the schedtune command is installed and available, run the following command:

# lslpp -lI bos.adt.samples

Memory Load Control Tuning

Memory load control is intended to smooth out infrequent peaks in load that might otherwise cause the system to thrash. It trades multiprogramming for throughput and is not intended to act continuously in a configuration that has too little RAM to handle its normal workload. The design was made for batch jobs and is not very discriminating. The AIX Workload Manager provides a better solution to protect critical tasks.

The correct solution to a fundamental, persistent RAM shortage is to add RAM, not to experiment with memory load control in an attempt to trade off response time for memory. The situations in which the memory-load-control facility may really need to be tuned are those in which there is more RAM, not less than the defaults were chosen for. An example would be configurations in which the defaults are too conservative.

You should not change the memory load control parameter settings unless your workload is consistent and you believe the default parameters are ill-suited to your workload.

The default parameter settings shipped with the system are always in force unless changed. The default values of these parameters have been chosen to "fail safe" across a wide range of workloads. Changed parameters last only until the next system boot. All memory load control tuning activities must be done by the root user. The system administrator can use the schedtune command to change the parameters to tune the algorithm to a particular workload or to disable it entirely. The source and object code of the schedtune command are in /usr/samples/kernel.

Note: The schedtune command is in the samples directory because it is very VMM-implementation dependent. The schedtune code that accompanies each release of the operating system is tailored specifically to the VMM in that release. Running the schedtune command from one release on a different release might result in an operating-system failure. It is also possible that the functions of the schedtune command may change from release to release. Do not propagate shell scripts or /etc/inittab entries that include the schedtune command to a new release without checking the schedtune documentation for the new release to make sure that the scripts will still have the desired effect.

The schedtune -? command provides a terse description of the flags and options. A schedtune invocation with no flags displays the current parameter settings, as follows:

# /usr/samples/kernel/schedtune
 
THRASH           SUSP       FORK             SCHED
-h    -p    -m      -w    -e      -f       -d       -r        -t       -s
SYS  PROC  MULTI   WAIT  GRACE   TICKS   SCHED_D  SCHED_R  TIMESLICE MAXSPIN
 0   4     2        1      2      10       16       16         1      16384
 
   CLOCK   SCHED_FIFO2   IDLE MIGRATION   FIXED_PRI
   -c          -a            -b              -F
%usDELTA   AFFINITY_LIM  BARRIER/16       GLOBAL(1)
  100           7             4               0

The first five parameters specify the thresholds for the memory load control algorithm. These parameters set rates and thresholds for the algorithm. If the algorithm shows that RAM is overcommitted, the PROC (-p), MULTI (-m), WAIT (-w), and GRACE (-e) values are used. Otherwise, these values are ignored. If memory load control is disabled, these latter values are not used.

After a tuning experiment, memory load control can be reset to its default characteristics by executing the command schedtune -D.

When you have determined the number of processes that ought to be able to run in your system during periods of peak activity, you can add a schedtune command at the end of the /etc/inittab file, which ensures that it will be run each time the system is booted, overriding the defaults that would otherwise take effect with a reboot. For example, an appropriate /etc/inittab line for raising the minimum level of multiprogramming to 4 on an AIX Version 4 system would be as follows:

schedtune:2:wait:/usr/samples/kernel/schedtune -m 4

Remember, do not propagated this line to a new release of the operating system without a check of the operating system documentation.

While it is possible to vary other parameters that control the suspension rate of processes and the criteria by which individual processes are selected for suspension, it is impossible to predict with any confidence the effect of such changes on a particular configuration and workload. Great caution should be exercised if memory load control parameter adjustments other than those discussed here are considered.

The h Parameter

The h parameter controls the threshold defining memory overcommitment. Memory load control attempts to suspend processes when this threshold is exceeded during any one-second period. The threshold is a relationship between two direct measures: the number of pages written to paging space in the last second (po) and the number of page steals occurring in the last second (fr). You can see both these values in the vmstat output. The number of page writes is usually much less than the number of page steals. Memory is overcommitted when the following is true:

po/fr > 1/h or po*h > fr

The command schedtune -h 0 effectively disables memory load control. If a system has at least 128 MB of memory, the default value is 0, otherwise the default value is 6. With at least 128 MB of RAM, the normal VMM algorithms usually correct thrashing conditions on the average more efficiently than by using memory load control.

In some specialized situations, it might be appropriate to disable memory load control from the outset. For example, if you are using a terminal emulator with a time-out feature to simulate a multiuser workload, memory load control intervention may result in some responses being delayed long enough for the process to be killed by the time-out feature. Another example is, if you are using the rmss command to investigate the effects of reduced memory sizes, disable memory load control to avoid interference with your measurement.

If disabling memory load control results in more, rather than fewer, thrashing situations (with correspondingly poorer responsiveness), then memory load control is playing an active and supportive role in your system. Tuning the memory load control parameters then may result in improved performance or you may need to add RAM.

A lower value of h raises the thrashing detection threshold; that is, the system is allowed to come closer to thrashing before processes are suspended. Regardless of the system configuration, when the above po/fr fraction is low, thrashing is unlikely.

To alter the threshold to 4, enter the following:

# /usr/samples/kernel/schedtune -h 4

In this way, you permit the system to come closer to thrashing before the algorithm starts suspending processes.

The p Parameter

The p parameter determines whether a process is eligible for suspension and is used to set a threshold for the ratio of two measures that are maintained for every process: the number of repages (r) and the number of page faults that the process has accumulated in the last second (f). A high ratio of repages to page faults means the individual process is thrashing. A process is considered eligible for suspension (it is thrashing or contributing to overall thrashing) when the following is true:

r/f > 1/p or r*p > f

The default value of p is 4, meaning that a process is considered to be thrashing (and a candidate for suspension) when the fraction of repages to page faults over the last second is greater than 25 percent. A low value of p results in a higher degree of individual process thrashing being allowed before a process is eligible for suspension.

To disable processes from being suspended by the memory load control, do the following:

# /usr/samples/kernel/schedtune -p 0

Note that fixed-priority processes and kernel processes are exempt from being suspended.

The m Parameter

The m parameter determines a lower limit for the degree of multiprogramming, which is defined as the number of active processes. Active processes are those that can be run and are waiting for page I/O. Processes that are waiting for events and processes suspended are not considered active nor is the wait process considered active.

Setting the minimum multiprogramming level, m, effectively keeps m processes from being suspended. Suppose a system administrator knows that at least ten processes must always be resident and active in RAM for successful performance, and suspects that memory load control was too vigorously suspending processes. If the command schedtune -m 10 was issued, the system would never suspend so many processes that fewer than ten were competing for memory. The m parameter does not count:

The kernel processes
Processes that have been pinned in RAM with the plock() system call
Fixed-priority processes with priority values less than 60
Processes awaiting events

The system default of m=2 ensures that the kernel, all pinned processes, and two user processes will always be in the set of processes competing for RAM.

While m=2 is appropriate for a desktop, single-user configuration, it is frequently too small for larger, multiuser, or server configurations with large amounts of RAM.

If the system you are installing is larger than 32 MB, but less than 128 MB, and is expected to support more than five active users at one time, consider raising the minimum level of multiprogramming of the VMM memory-load-control mechanism.

As an example, if your conservative estimate is that four of your most memory-intensive applications should be able to run simultaneously, leaving at least 16 MB for the operating system and 25 percent of real memory for file pages, you could increase the minimum multiprogramming level from the default of 2 to 4 with the following command:

# /usr/samples/kernel/schedtune -m 4

On these systems, setting m to 4 or 6 may result in the best performance. Lower values of m, while allowed, mean that at times as few as one user process may be active.

When the memory requirements of the thrashing application are known, the m value can be suitably chosen. Suppose thrashing is caused by numerous instances of one application of size M. Given the system memory size N, the m parameter should be set to a value close to N/M. Setting m too low would unnecessarily limit the number of processes that could be active at the same time.

The w Parameter

The w parameter controls the number of one-second intervals during which the po/fr fraction (explained in the The h Parameter) must remain below 1/h before suspended processes are reactivated. The default value of one second is close to the minimum value allowed, which is zero. A value of one second aggressively attempts to reactivate processes as soon as a one-second safe period has occurred. Large values of w run the risk of unnecessarily poor response times for suspended processes while the processor is idle for lack of active processes to run.

To alter the wait time to reactivate processes after two seconds, enter the following:

# /usr/samples/kernel/schedtune -w 2

The e Parameter

Each time a suspended process is reactivated, it is exempt from suspension for a period of e elapsed seconds. This ensures that the high cost (in disk I/O) of paging in the pages of a suspended process results in a reasonable opportunity for progress. The default value of e is 2 seconds.

To alter this parameter, enter the following:

# /usr/samples/kernel/schedtune -e 1

Suppose thrashing is caused occasionally by an application that uses lots of memory but runs for about T seconds. The default system setting for e (2 seconds) would probably cause this application swapping in and out T/2 times on a busy system. In this case, resetting e to a longer time would help this application to progress. System performance would improve when this offending application is pushed through quickly.