[ Previous | Next | Contents | Glossary | Home | Search ]
AIX Versions 3.2 and 4 Performance Tuning Guide

Identifying the Performance-Limiting Resource

This topic includes the following major sections:

Starting with an Overview of System Performance

Perhaps the best tool for an overall look at resource utilization while running a multiuser workload is the vmstat command. The vmstat command reports CPU and disk-I/O activity as well as memory utilization data. The command

$ vmstat 5

causes the vmstat command to begin writing a one-line summary report of system activity every 5 seconds. Since no count was specified following the interval, reporting continues until the command is cancelled.

The following vmstat report was made on a system running AIXwindows and several synthetic applications (some low-activity intervals have been removed):

procs    memory             page              faults        cpu     
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa 
 0  0  8793    81   0   0   0   1    7   0 125   42  30  1  2 95  2
 0  0  8793    80   0   0   0   0    0   0 155  113  79 14  8 78  0
 0  0  8793    57   0   3   0   0    0   0 178   28  69  1 12 81  6
 0  0  9192    66   0   0  16  81  167   0 151   32  34  1  6 77 16
 0  0  9193    65   0   0   0   0    0   0 117   29  26  1  3 96  0
 0  0  9193    65   0   0   0   0    0   0 120   30  31  1  3 95  0
 0  0  9693    69   0   0  53 100  216   0 168   27  57  1  4 63 33
 0  0  9693    69   0   0   0   0    0   0 134   96  60 12  4 84  0
 0  0 10193    57   0   0   0   0    0   0 124   29  32  1  3 94  2
 0  0 11194    64   0   0  38 201 1080   0 168   29  57  2  8 62 29
 0  0 11194    63   0   0   0   0    0   0 141  111  65 12  7 81  0
 0  0  5480   755   3   1   0   0    0   0 154  107  71 13  8 78  2
 0  0  5467  5747   0   3   0   0    0   0 167   39  68  1 16 79  5
 0  1  4797  5821   0  21   0   0    0   0 191  192 125 20  5 42 33
 0  1  3778  6119   0  24   0   0    0   0 188  170  98  5  8 41 46
 0  0  3751  6139   0   0   0   0    0   0 145   24  54  1 10 89  0

The columns of interest for this initial assessment are pi and po in the page category and the four columns in the cpu category.

By "approaching its limits," we mean that some parts of the workload are already experiencing a slowdown due to the critical resource. The longer response times may not be subjectively significant yet, but an increase in that element of the workload will cause a rapid deterioration of performance.

If vmstat indicates a significant amount of I/O wait time, an iostat will give more detailed information. The command

$ iostat 5 3

causes iostat to begin writing summary reports of I/O activity and CPU utilization every 5 seconds. Since a count of 3 was specified following the interval, reporting will stop after the third report.

The following iostat report was made on a system running the same workload as the vmstat reports above, but at a different time. The first report is for the cumulative activity since the preceding boot, while subsequent reports are for activity during the preceding 5-second interval:

tty:    tin      tout     cpu:   % user    % sys    % idle   %iowait
        0.0       4.3            0.2       0.6       98.8      0.4
     
Disks:        % tm_act     Kbps      tps    msps   Kb_read   Kb_wrtn
hdisk0           0.0       0.2       0.0              7993      4408
hdisk1           0.0       0.0       0.0              2179      1692
hdisk2           0.4       1.5       0.3             67548     59151
cd0              0.0       0.0       0.0                 0         0
     
tty:    tin      tout     cpu:   % user    % sys     % idle  %iowait
        0.0      30.3               8.8      7.2       83.9     0.2 
     
Disks:        % tm_act     Kbps      tps    msps   Kb_read   Kb_wrtn
hdisk0           0.2       0.8       0.2                 4         0
hdisk1           0.0       0.0       0.0                 0         0
hdisk2           0.0       0.0       0.0                 0         0
cd0              0.0       0.0       0.0                 0         0
     
tty:   tin      tout      cpu:   % user    % sys     % idle  %iowait
       0.0       8.4               0.2      5.8        0.0      93.8 
     
Disks:        % tm_act     Kbps      tps    msps   Kb_read   Kb_wrtn
hdisk0           0.0       0.0       0.0                 0         0
hdisk1           0.0       0.0       0.0                 0         0
hdisk2          98.4     575.6      61.9               396      2488
cd0              0.0       0.0       0.0                 0         0

The first report, which displays cumulative activity since the last boot, shows that the I/O on this system is unbalanced. Most of the I/O (86.9% of kilobytes read and 90.7% of kilobytes written) is to hdisk2 , which contains both the operating system and the paging space. The cumulative CPU utilization since boot statistic is usually meaningless, unless the system is used consistently 24 hours a day.

The second report shows a small amount of disk activity reading from hdisk0 , which contains a separate file system for the system's primary user. The CPU activity arises from two application programs and iostat itself. Although iostat's output is redirected to a file, the output is not voluminous, and the system is not sufficiently memory-constrained to force any output during this interval.

In the third report, we have artificially created a near-thrashing condition by running a program that allocates, and stores into, a large amount of memory (about 26MB in this example). hdisk2 is active 98.4% of the time, which results in 93.8% I/O wait. The fact that a single program that uses more than three-fourths of the system's memory (32MB) can cause the system to thrash reminds us of the limits of VMM memory load control. Even with a more homogeneous workload, we need to understand the memory requirements of the components.

If vmstat indicates that there is a significant amount of CPU idle time when the system seems subjectively to be running slowly, you may be experiencing delays due to kernel lock contention. In AIX Version 4, this possibility can be investigated with the lockstat command if the Performance Toolbox is installed on your system.

Determining the Limiting Factor for a Single Program

If you are the sole user of a system, you can get a general idea of whether a program is I/O or CPU dependent by using the time command as follows:

$ time cp foo.in foo.out
real    0m0.13s
user    0m0.01s
sys     0m0.02s
Note: Examples of the time command here and elsewhere in this guide use the version that is built into the Korn shell. The official time command (/usr/bin/time) reports with a lower precision and has other disadvantages.

In this example, the fact that the real, elapsed time for the execution of the cp (.13 seconds) is significantly greater than the sum (.03 seconds) of the user and system CPU times indicates that the program is I/O bound. This occurs primarily because foo.in has not been read recently. Running the same command a few seconds later against the same file gives:

real    0m0.06s
user    0m0.01s
sys     0m0.03s

Most or all of the pages of foo.in are still in memory because there has been no intervening process to cause them to be reclaimed and because the file is small compared with the amount of RAM on the system. A small foo.out would also be buffered in memory, and a program using it as input would show little disk dependency.

If you are trying to determine the disk dependency of a program, you have to be sure that its input is in an authentic state. That is, if the program will normally be run against a file that has not been accessed recently, you must make sure that the file used in measuring the program is not in memory. If, on the other hand, a program is usually run as part of a standard sequence in which it gets its input from the output of the preceding program, you should prime memory to ensure that the measurement is authentic. For example,

$ cp foo.in /dev/null

would have the effect of priming memory with the pages of foo.in .

The situation is more complex if the file is large compared to RAM. If the output of one program is the input of the next and the entire file won't fit in RAM, the second program will end up reading pages at the head of the file, which displace pages at the end. Although this situation is very hard to simulate authentically, it is nearly equivalent to one in which no disk caching takes place.

The case of a file that is (perhaps just slightly) larger than RAM is a special case of the RAM versus disk analysis discussed in the next section.

Disk or Memory?

Just as a large fraction of real memory is available for buffering files, the system's page space is available as temporary storage for program working data that has been forced out of RAM. Suppose that you have a program that reads little or no data and yet shows the symptoms of being I/O dependent. Worse, the ratio of real time to user + system time does not improve with successive runs. The program is probably memory-limited, and its I/O is to, and possibly from, the paging space. A way to check on this possibility is shown in the following vmstatit shell script. The vmstatit script summarizes the voluminous vmstat -s report, which gives cumulative counts for a number of system activities since the system was started:

vmstat -s >temp.file   # cumulative counts before the command
time $1                # command under test
vmstat -s >>temp.file  # cumulative counts after execution
grep "pagi.*ins" temp.file >>results   # extract only the data
grep "pagi.*outs" temp.file >>results  # of interest

If the shell script is run as follows:

$ vmstatit "cp file1 file2"  2>results

the result in results is:

real    0m0.03s
user    0m0.01s
sys     0m0.02s
     2323 paging space page ins
     2323 paging space page ins
     4850 paging space page outs
     4850 paging space page outs

The fact that the before-and-after paging statistics are identical confirms our belief that the cp command is not paging bound. An extended variant of the vmstatit shell script can be used to show the true situation:

vmstat -s >temp.file
time $1             
vmstat -s >>temp.file
echo "Ordinary Input:"               >>results
grep "^[ 0-9]*page ins"    temp.file >>results
echo "Ordinary Output:"              >>results
grep "^[ 0-9]*page outs"   temp.file >>results
echo "True Paging Output:"           >>results
grep "pagi.*outs"          temp.file >>results
echo "True Paging Input:"            >>results
grep "pagi.*ins"           temp.file >>results

Because all ordinary I/O in the AIX operating system is processed via the VMM, the vmstat -s command reports ordinary program I/O as page ins and page outs. When the above version of the vmstatit shell script was run against the cp command of a large file that had not been read recently, the result was:

real    0m2.09s
user    0m0.03s
sys     0m0.74s
Ordinary Input:
    46416 page ins
    47132 page ins
Ordinary Output:
   146483 page outs
   147012 page outs
True Paging Output:
     4854 paging space page outs
     4854 paging space page outs
True Paging Input:
     2527 paging space page ins
     2527 paging space page ins

The time command output confirms the existence of an I/O dependency. The increase in page ins shows the I/O necessary to satisfy the cp command. The increase in page outs indicates that the file is large enough to force the writing of dirty pages (not necessarily its own) from memory. The fact that there is no change in the cumulative paging-space-I/O counts confirms that the cp command does not build data structures large enough to overload the memory of the test machine.

The order in which this version of the vmstatit script reports I/O is intentional. Typical programs read file input and then write file output. Paging activity, on the other hand, typically begins with the writing out of a working-segment page that does not fit. The page is read back in only if the program tries to access it. The fact that the test system has experienced almost twice as many paging space page outs as paging space page ins since it was booted indicates that at least some of the programs that have been run on this system have stored data in memory that was not accessed again before the end of the program. "Memory-Limited Programs" provides more information. See also "Monitoring and Tuning Memory Use".

To show the effects of memory limitation on these statistics, the following example observes a given command in an environment of adequate memory (32MB) and then artificially shrinks the system using the rmss command (see "Assessing Memory Requirements via the rmss Command"). The command sequence

$ cc -c ed.c
$ vmstatit "cc -c ed.c" 2>results

first primes memory with the 7944-line source file and the executable file of the C compiler, then measures the I/O activity of the second execution:

real    0m7.76s
user    0m7.44s
sys     0m0.15s
Ordinary Input:
    57192 page ins
    57192 page ins
Ordinary Output:
   165516 page outs
   165553 page outs
True Paging Output:
    10846 paging space page outs
    10846 paging space page outs
True Paging Input:
     6409 paging space page ins
     6409 paging space page ins

Clearly, this is not I/O limited. There is not even any I/O necessary to read the source code. If we then issue the command:

# rmss -c 8

to change the effective size of the machine to 8MB, and perform the same sequence of commands, we get:

real    0m9.87s
user    0m7.70s
sys     0m0.18s
Ordinary Input:
    57625 page ins
    57809 page ins
Ordinary Output:
   165811 page outs
   165882 page outs
True Paging Output:
    11010 paging space page outs
    11061 paging space page outs
True Paging Input:
     6623 paging space page ins
     6701 paging space page ins

The symptoms of I/O dependency are present:

The fact that the elapsed time is longer than in the memory-unconstrained situation, and the existence of significant amounts of paging-space I/O, make it clear that the compiler is being hampered by insufficient memory.

Note: This example illustrates the effects of memory constraint. No effort was made to minimize the use of memory by other processes, so the absolute size at which the compiler was forced to page in this environment does not constitute a meaningful measurement.

To avoid working with an artificially shrunken machine until the next restart, run

# rmss -r

to release back to the operating system the memory that the rmss command had sequestered, thus restoring the system to its normal capacity.


[ Previous | Next | Contents | Glossary | Home | Search ]