09/16/96, 4FAX# 6220 Performance Tuning -- The vmstat Tool SPECIAL NOTICES Information in this document is correct to the best of our knowledge at the time of this writing. Please send feedback by fax to "AIXServ Information" at (512) 823-4009. Please use this information with care. IBM will not be responsible for damages of any kind resulting from its use. The use of this information is the sole responsibility of the customer and depends on the customer's ability to eval- uate and integrate this information into the customer's operational environment. ABOUT THIS DOCUMENT | This document provides an overview of the output of the | vmstat command and is applicable to AIX version 3.2 - 4.2. INTRODUCTION Although a system may have sufficient real resources, it may perform below expectations if logical resources are not allocated properly. Use vmstat to determine real and logical resource utiliza- tion. It samples kernel tables and counters, then normal- izes the results and presents them in an appropriate format. By default, vmstat sends its report to standard out, but it can be executed with the output redirected. vmstat is normally invoked with an interval and a count specified. The interval is the length of time in seconds over which vmstat is to gather and report data. The count is the number of intervals to run. If no parameters are specified, vmstat will report a single record of statistics since the system was booted. There may have been inactivity or fluctuations in the workload so the results may not rep- resent current activity. Be aware that the first record in the output presents statistics since boot (except when invoked with the -f or -s option). In many instances, this data can be ignored. vmstat reports statistics about processes, virtual memory, paging activity, faults, CPU activity, and disk transfers. Options and parameters recognized by this tool are indicated by the usage prompt: vmstat [-fs] [Drives] [Interval] [Count] The figure listed below lists output for AIX 3.2.5 where the smallest work unit is called a process (PROC). In 4.1 the smallest work unit is a thread (KTHR). The "r" and "b" columns exists in both versions. The end result is that in the 4.1 output, the "r" and "b" columns represent the number of "threads" not processes placed on these queues. Performance Tuning -- The vmstat Tool 1 09/16/96, 4FAX# 6220 +--------------------------------------------------------------------+ | | | procs memory page faults cpu | | ----- -------- ----------------- -------------- ----------- | | r b avm fre re pi po fr sr cy in sy cs us sy id wa | | 0 0 6747 1253 0 0 0 0 0 0 114 10 22 0 1 26 0 | | 1 0 6747 1253 0 0 0 0 0 0 113 118 43 17 4 79 0 | | 0 0 6747 1253 0 0 0 0 0 0 118 99 33 8 3 89 0 | | | +--------------------------------------------------------------------+ Figure 1. Sample Output from "vmstat 1 3" PROCS The columns under the "procs" heading in the output provide information about the average number of processes on various queues. r The "r" column indicates the average number of processes on the run queue at one-second intervals. In 4.1 the "r" column indicates the number of kernel threads placed in run queue. This field indicates the number of "run-able" processes. The system counts the number of ready-to-run processes once per second and adds that number to an internal counter. vmstat then subtracts the initial value of this counter from the end value and divides the result by the number of seconds in the measurement interval. This value is typi- cally less than five with a stable workload. If this increases rapidly you should probably look for an applica- tion problem. A consistently high number may indicate that the CPU is pacing the system. If there are many processes (especially CPU-intensive ones) competing for the CPU resource, it's quite possible they will be scheduled in round-robin fashion. If each one executes for a complete or partial time slice, then the number of "run-able" processes could easily exceed 100. b The "b" column shows the average number of processes on the wait queue at one-second intervals. In 4.1 the "b" column indicates the number of kernel threads placed in wait queue (awaiting resource, awaiting input/output). Processes are placed on the wait queue when scheduled for execution waiting for one of their process pages to be paged in. Once a second the system counts the processes waiting and adds that number to an internal counter. vmstat then subtracts the initial value from the end value and divides the result by the number of seconds in the measurement interval. This value is usually near zero. Don't confuse this with "wa" -- waiting on input/output (I/O). Performance Tuning -- The vmstat Tool 2 09/16/96, 4FAX# 6220 MEMORY The information under the "memory" heading provides informa- tion about real and virtual memory. avm The avm column gives the average number of pages allocated to paging space. (In AIX, a page contains 4,096 bytes of data.) When a process executes, space for working storage is allo- cated on the paging device(s) (backing store). This can be used to calculate the amount of paging space assigned to executing processes. The number in the avm field divided by 256 will yield the number of megabytes (MB) allocated to page space, systemwide. The "lsps -a" command will also provide information on individual paging space. It is recommended that enough paging space be configured on the system so that the paging space used does not approach 100 percent. When fewer than 128 unallocated pages remain on the paging device(s), the system will begin to kill proc- esses to free some paging space. fre The "fre" column shows the average number of free memory frames. A frame is a 4,096 byte area of real memory. The system maintains a buffer of memory frames, called the free list, that will be readily accessible when the Virtual Memory Manager (VMM) needs space. The nominal size of the free list varies depending on the amount of real memory installed. On systems with 64 MB of memory or more, the minimum value (MINFREE) is 120 frames. For systems with less than 64 MB, the value is two times the number of MB of real memory, less eight. For example, a system with 32 MB would have a MINFREE value of 56 free frames. If the fre value is substantially above the MAXFREE value (which is defined as MINFREE plus eight), then it is unlikely that the system is thrashing (continuously paging in and out). However, if the system is experiencing thrashing, you can be assured that the fre value will be small. Most UNIX and AIX operating systems will use nearly all available memory for disk caching, so you need not be alarmed in the event the fre value oscillates between MINFREE and MAXFREE. PAGE The information under the "page" heading includes informa- tion about page faults and paging activity. Performance Tuning -- The vmstat Tool 3 09/16/96, 4FAX# 6220 re The "re" column shows the number (rate) of pages reclaimed. Reclaimed pages can satisfy an address translation fault without initiating a new I/O request (the page is still in memory). This includes pages that have been put on the free list but are accessed again before they are reassigned. It includes pages previously requested by VMM for which I/O has not yet been completed or those pre-fetched by VMM's read- ahead mechanism but hidden from the faulting segment. pi The "pi" column details the number (rate) of pages paged in from paging space. Paging space is the part of virtual memory that resides on disk. It is used as an "overflow" when memory is over- committed. Paging consists of paging logical volumes dedi- cated to the storage of working set pages that have been stolen from real memory. When a stolen page is referenced by the process, a page fault occurs and the page must be read into memory from paging space. There is no "good" number for this due to the variety of configurations of hardware, software, and applications. Some analysts have said that five page-ins per second should be the upper limit. This theoretical maximum should not be rigidly adhered to but used as a reference. This field is important as a key indicator of paging space activity. Look at it this way: if a page-in occurs, then there must have been a previous page-out for that page. It is also likely in a memory constrained environment that each page-in will force a different page to be stolen and, therefore, paged out. po The "po" column shows the number (rate) of pages paged out to paging space. Whenever a page of working storage is stolen, it is written to paging space. If not referenced again, it will remain on the paging device until the process terminates or disclaims the space. Subsequent references to addresses contained within the faulted-out pages result in page faults, and the pages are paged in individually by the system. When a process terminates normally, any paging space allocated to that process is freed. If the system is reading in a sig- nificant number of persistent pages, you may see an increase in po without corresponding increases in pi. This does not necessarily indicate thrashing, but may warrant investi- gation into data access patterns of the application(s). Performance Tuning -- The vmstat Tool 4 09/16/96, 4FAX# 6220 fr The "fr" column details the number (rate) of pages freed. As the VMM page-replacement code routine scans the Page Frame Table (PFT), it uses criteria to select which pages are to be stolen to replenish the free list of available memory frames. The total pages stolen -- both working (com- putational) and file (persistent) pages -- by the VMM are reported as a rate per second. Just because a page has been freed, it does not mean that any I/O has taken place. For example, if a persistent storage (file) page has not been modified, it will not be written back to the disk. If I/O is not necessary, minimal system resources are required to free a page. sr The "sr" column details the number (rate) of pages scanned by the page-placement algorithm. The VMM page-replacement code scans the PFT and steals pages until the number of frames on the free list is at least the MAXFREE value. The page-replacement code may have to scan many entries in the Page Frame Table before it can steal enough to satisfy the free list requirements. With stable, unfragmented memory, the scan rate and free rate may be nearly equal. On systems with multiple processes using many different pages, the pages are more volatile and disjointed. In this scenario, the scan rate may greatly exceed the free rate. cy The "cy" column provides the rate of complete scans of the Page Frame Table. cy shows how many times (per-second) the page-replacement code has scanned the Page Frame Table. Since the free list can be replenished without a complete scan of the PFT and because all of the vmstat fields are reported as integers, this field is usually zero. FAULTS The information under the "faults" heading in the vmstat output provides information about process control. in The "in" column shows the number (rate) of device inter- rupts. This column shows the number of hardware or device inter- rupts (per second) observed over the measurement interval. Examples of interrupts are disk request completions and the 10 millisecond clock interrupt. Since the latter occurs 100 times per second, the in field is always greater than 100. Performance Tuning -- The vmstat Tool 5 09/16/96, 4FAX# 6220 sy The "sy" column details the number (rate) of system calls. Resources are available to user processes through well- defined system calls. These calls instruct the kernel to perform operations for the calling process and exchange data between the kernel and the process. Since workloads and applications vary and different calls perform different functions, it is impossible to say how many system calls per-second are too many. cs The "cs" column shows the number (rate) of context switches. The physical CPU resource is subdivided into logical time slices of 10 milliseconds each. Assuming a process is scheduled for execution, it will run until its time slice expires, is preempted, or it voluntarily gives up control of the CPU. When another process is given control of the CPU, the context, or working environment, of the previous process must be saved and the context of the current process must be loaded. AIX has a very efficient context switching proce- dure, so each switch is inexpensive in terms of resources. Any significant increase in context switches should be cause for further investigation. CPU The information under the "cpu" heading in the vmstat output provides a breakdown of CPU usage. us The "us" column shows the percent of CPU time spent in user mode. Processes execute in user mode or system (kernel) mode. When in user mode, a process executes within its code and does not require kernel resources to perform computations, manage memory, set variables, etc. sy The "sy" column details the percent of CPU time spent in system mode. If a process needs kernel resources, it must execute a call and go into system mode to make that resource available. I/O to a drive, for example, requires a call to open the device, seek, and read/write data. This field shows percent of time the CPU was in system mode. Optimum use would have the CPU working 100 percent of the time. This holds true in the case of a single-user system with no need to share the CPU. Generally, if us+sy time is below 90 percent, a single-user system is not considered CPU constrained. However, if us+sy time on a multi-user system exceeds 80 percent, the processes may spend time waiting in the run queue. Response time and throughput might suffer. Performance Tuning -- The vmstat Tool 6 09/16/96, 4FAX# 6220 id The "id" column shows time (percent) the CPU is idle with no pending disk I/O. If there are no processes available for execution (the run queue is empty), the system dispatches a process called wait. The ps report (with the -k or g option) identifies this as kproc with a process ID (PID) of 514. In 4.1 the PID of the wait process was changed from 514 to 516. Don't worry if your ps report shows a high aggregate time for this process. It means you have had significant periods of time when no other processes could run. If there are no I/Os pending to a local disk, all time charged to "wait" is clas- sified as idle time. wa The "wa" column details CPU idle time (percent) with pending local disk I/O. If there is at least one outstanding I/O to a local disk when "wait" is running, the time is classified as "waiting on I/O." A wa value over 40 percent could indicate that the disk subsystem may not be balanced properly, or it may be the result of a disk-intensive workload. If there is only one process available for execution -- often the case on a technical workstation -- there may be no way to avoid waiting on I/O. SUMMARY STATISTICS vmstat with the -s option reports absolute counts of various events since the system was booted. There are 23 separate events reported in the "vmstat -s" output; here are four that have proven most helpful. The 19 remaining fields contain a variety of activities from address translation faults to lock misses to system calls. The information in those 19 fields is also valuable, although less frequently used. page ins The "page ins" field shows the number systemwide page-ins. When a page is read from disk to memory, this count is incremented. It is a count of VMM-initiated read operations and, with the page outs field, represents the real I/O (disk reads/writes) initiated by the VMM. page outs The "page outs" field shows the number of systemwide page- outs. When a page is written to the disk, this is count incre- mented. It is a total count of VMM-initiated write oper- ations and, with the page ins field, represents the total amount of real I/O initiated by the VMM. Performance Tuning -- The vmstat Tool 7 09/16/96, 4FAX# 6220 paging space page ins The "paging space page ins" field is the count of ONLY pages read from paging space. paging space page outs The "paging space page outs" field is the count of ONLY pages written to paging space. Using the Summary Statistics The four fields above can be used to indicate how much of the system's I/O is for persistent storage. If the value for paging space page ins is subtracted from the (systemwide) value for page ins, the result will be the number of pages that were read from persistent storage (files). Likewise, if the value for paging space page outs is subtracted from the (systemwide) value for page outs, the result will be the number of persistent pages (files) that were written to disk. Remember, these are counts since system initialization. If you need counts for a given time interval, execute "vmstat -s" at the time you want to start monitoring and again at the end of the interval. The deltas between like fields of successive reports will be the count for the interval. It is easier to redirect the output of the reports to a file and then perform the math. The remaining fields produced by the s, f, and Drives options of vmstat are fully documented in InfoExplorer and the AIX Tuning Guide, AIX Version 3.2 for RISC System/6000 -- Performance Monitoring/Tuning Guide, publication number SC23-2365. Performance Tuning -- The vmstat Tool 8 09/16/96, 4FAX# 6220 READER'S COMMENTS Please fax this form to (512) 823-4009, attention "AIXServ Informa- tion". You may also e-mail comments to: elizabet@austin.ibm.com. These comments should include the same customer information requested below. Use this form to tell us what you think about this document. If you have found errors in it, or if you want to express your opinion about it (such as organization, subject matter, appearance) or make sug- gestions for improvement, this is the form to use. If you need technical assistance, contact your local branch office, point of sale, or 1-800-CALL-AIX (for information about support offer- ings). These services may be billable. Faxes on a variety of sub- jects may be ordered free of charge from 1-800-IBM-4FAX. Outside the U.S. call 415-855-4329 using a fax machine phone. When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. NOTE: If you have a problem report or item number, supplying that number may help us determine why a procedure did or did not work in your specific situation. Problem Report or Item #: Branch Office or Customer #: Be sure to print your name and fax number below if you would like a reply: Name: Fax Number: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ END OF DOCUMENT (vmstat.perf.tune.lxp, 4FAX# 6220) Performance Tuning -- The vmstat Tool 9