When you are monitoring disk I/O, use the following to determine your course of action:
Before you make significant changes in your disk configuration or tuning parameters, it is a good idea to build a baseline of measurements that record the current configuration and performance.
AIX 4.3.3 and later contain enhancements to the method used to compute the percentage of CPU time spent waiting on disk I/O (wio time). The method used in AIX 4.3.2 and earlier versions of the operating system can, under certain circumstances, give an inflated view of wio time on SMPs. The wio time is reported by the commands sar (%wio), vmstat (wa) and iostat (% iowait).
Another change is that the wa column details the percentage of time the CPU was idle with pending disk I/O to not only local, but also NFS-mounted disks.
At each clock interrupt on each processor (100 times a second per processor), a determination is made as to which of the four categories (usr/sys/wio/idle) to place the last 10 ms of time. If the CPU was busy in usr mode at the time of the clock interrupt, then usr gets the clock tick added into its category. If the CPU was busy in kernel mode at the time of the clock interrupt, then the sys category gets the tick. If the CPU was not busy, a check is made to see if any I/O to disk is in progress. If any disk I/O is in progress, the wio category is incremented. If no disk I/O is in progress and the CPU is not busy, the idle category gets the tick.
The inflated view of wio time results from all idle CPUs being categorized as wio regardless of the number of threads waiting on I/O. For example, systems with just one thread doing I/O could report over 90 percent wio time regardless of the number of CPUs it has.
The change in AIX 4.3.3 is to only mark an idle CPU as wio if an outstanding I/O was started on that CPU. This method can report much lower wio times when just a few threads are doing I/O and the system is otherwise idle. For example, a system with four CPUs and one thread doing I/O will report a maximum of 25 percent wio time. A system with 12 CPUs and one thread doing I/O will report a maximum of 8.3 percent wio time.
Also, NFS now goes through the buffer cache, and waits in those routines are accounted for in the wa statistics.
Begin the assessment by running the iostat command with an interval parameter during your system's peak workload period or while running a critical application for which you need to minimize I/O delays. The following shell script runs the iostat command in the background while a copy of a large file runs in the foreground so that there is some I/O to measure:
# iostat 5 3 >io.out & # cp big1 /dev/null
This example leaves the following three reports in the io.out file:
tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 1.3 0.2 0.6 98.9 0.3 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.3 0.0 29753 48076 hdisk1 0.1 0.1 0.0 11971 26460 hdisk2 0.2 0.8 0.1 91200 108355 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.8 0.8 0.6 9.7 50.2 39.5 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 47.0 674.6 21.8 3376 24 hdisk1 1.2 2.4 0.6 0 12 hdisk2 4.0 7.9 1.8 8 32 cd0 0.0 0.0 0.0 0 0 tty: tin tout avg-cpu: % user % sys % idle % iowait 2.0 2.0 0.2 1.8 93.4 4.6 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 0.0 0.0 0.0 0 0 hdisk1 0.0 0.0 0.0 0 0 hdisk2 4.8 12.8 3.2 64 0 cd0 0.0 0.0 0.0 0 0
The first report is the summary since the last reboot and shows the overall balance (or, in this case, imbalance) in the I/O to each of the hard disks. hdisk1 is almost idle and hdisk2 receives about 63 percent of the total I/O (from Kb_read and Kb_wrtn).
Note: The system maintains a history of disk activity. If the history is disabled (smitty chgsys -> Continuously maintain DISK I/O history [false]), the following message displays when you run the iostat command:
Disk history since boot not available.The interval disk I/O statistics are unaffected by this.
The second report shows the 5-second interval during which cp ran. Examine this information carefully. The elapsed time for this cp was about 2.6 seconds. Thus, 2.5 seconds of high I/O dependency are being averaged with 2.5 seconds of idle time to yield the 39.5 percent % iowait reported. A shorter interval would have given a more accurate characterization of the command itself, but this example demonstrates what you must consider when you are looking at reports that show average activity across intervals.
The two columns of TTY information (tin and tout) in the iostat output show the number of characters read and written by all TTY devices. This includes both real and pseudo TTY devices. Real TTY devices are those connected to an asynchronous port. Some pseudo TTY devices are shells, telnet sessions, and aixterm windows.
Because the processing of input and output characters consumes CPU resources, look for a correlation between increased TTY activity and CPU utilization. If such a relationship exists, evaluate ways to improve the performance of the TTY subsystem. Steps that could be taken include changing the application program, modifying TTY port parameters during file transfer, or perhaps upgrading to a faster or more efficient asynchronous communications adapter.
In Shell Script fastport.sh for Fast File Transfers, you can find the fastport.sh script, which is intended to condition a TTY port for fast file transfers in raw mode; for example, when a FAX machine is to be connected. Using the script might improve CPU performance by a factor of 3 at 38400 baud.
The CPU statistics columns (% user, % sys, % idle, and % iowait) provide a breakdown of CPU usage. This information is also reported in the vmstat command output in the columns labeled us, sy, id, and wa. For a detailed explanation for the values, see The vmstat Command. Also note the change made to % iowait described in Wait I/O Time Reporting.
On systems running one application, high I/O wait percentage might be related to the workload. On systems with many processes, some will be running while others wait for I/O. In this case, the % iowait can be small or zero because running processes "hide" some wait time. Although % iowait is low, a bottleneck can still limit application performance.
If the iostat command indicates that a CPU-bound situation does not exist, and % iowait time is greater than 20 percent, you might have an I/O or disk-bound situation. This situation could be caused by excessive paging due to a lack of real memory. It could also be due to unbalanced disk load, fragmented data or usage patterns. For an unbalanced disk load, the same iostat report provides the necessary information. But for information about file systems or logical volumes, which are logical resources, you must use tools such as the filemon or fileplace commands.
When you suspect a disk I/O performance problem, use the iostat command. To avoid the information about the TTY and CPU statistics, use the -d option. In addition, the disk statistics can be limited to the important disks by specifying the disk names.
Remember that the first set of data represents all activity since system startup.
Taken alone, there is no unacceptable value for any of the above fields because statistics are too closely related to application characteristics, system configuration, and type of physical disk drives and adapters. Therefore, when you are evaluating data, look for patterns and relationships. The most common relationship is between disk utilization (%tm_act) and data transfer rate (tps).
To draw any valid conclusions from this data, you have to understand the application's disk data access patterns such as sequential, random, or combination, as well as the type of physical disk drives and adapters on the system. For example, if an application reads/writes sequentially, you should expect a high disk transfer rate (Kbps) when you have a high disk busy rate (%tm_act). Columns Kb_read and Kb_wrtn can confirm an understanding of an application's read/write behavior. However, these columns provide no information on the data access patterns.
Generally you do not need to be concerned about a high disk busy rate (%tm_act) as long as the disk transfer rate (Kbps) is also high. However, if you get a high disk busy rate and a low disk transfer rate, you may have a fragmented logical volume, file system, or individual file.
Discussions of disk, logical volume and file system performance sometimes lead to the conclusion that the more drives you have on your system, the better the disk I/O performance. This is not always true because there is a limit to the amount of data that can be handled by a disk adapter. The disk adapter can also become a bottleneck. If all your disk drives are on one disk adapter, and your hot file systems are on separate physical volumes, you might benefit from using multiple disk adapters. Performance improvement will depend on the type of access.
To see if a particular adapter is saturated, use the iostat command and add up all the Kbps amounts for the disks attached to a particular disk adapter. For maximum aggregate performance, the total of the transfer rates (Kbps) must be below the disk adapter throughput rating. In most cases, use 70 percent of the throughput rate. In operating system versions later than 4.3.3 the -a or -A option will display this information.
To prove that the system is I/O bound, it is better to use the iostat command. However, the vmstat command could point to that direction by looking at the wa column, as discussed in The vmstat Command. Other indicators for I/O bound are:
To display a statistic about the logical disks (a maximum of four disks is allowed), use the following command:
# vmstat hdisk0 hdisk1 1 8 kthr memory page faults cpu disk xfer ---- ---------- ----------------------- ------------ ----------- ------ r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 2 3 4 0 0 3456 27743 0 0 0 0 0 0 131 149 28 0 1 99 0 0 0 0 0 3456 27743 0 0 0 0 0 0 131 77 30 0 1 99 0 0 0 1 0 3498 27152 0 0 0 0 0 0 153 1088 35 1 10 87 2 0 11 0 1 3499 26543 0 0 0 0 0 0 199 1530 38 1 19 0 80 0 59 0 1 3499 25406 0 0 0 0 0 0 187 2472 38 2 26 0 72 0 53 0 0 3456 24329 0 0 0 0 0 0 178 1301 37 2 12 20 66 0 42 0 0 3456 24329 0 0 0 0 0 0 124 58 19 0 0 99 0 0 0 0 0 3456 24329 0 0 0 0 0 0 123 58 23 0 0 99 0 0 0
The disk xfer part provides the number of transfers per second to the specified physical volumes that occurred in the sample interval. One to four physical volume names can be specified. Transfer statistics are given for each specified drive in the order specified. This count represents requests to the physical device. It does not imply an amount of data that was read or written. Several logical requests can be combined into one physical request.
This column shows the number of hardware or device interrupts (per second) observed over the measurement interval. Examples of interrupts are disk request completions and the 10 millisecond clock interrupt. Since the latter occurs 100 times per second, the in field is always greater than 100. But the vmstat command also provides a more detailed output about the system interrupts.
The -i parameter displays the number of interrupts taken by each device since system startup. But, by adding the interval and, optionally, the count parameter, the statistic since startup is only displayed in the first stanza; every trailing stanza is a statistic about the scanned interval.
# vmstat -i 1 2 priority level type count module(handler) 0 0 hardware 0 i_misc_pwr(a868c) 0 1 hardware 0 i_scu(a8680) 0 2 hardware 0 i_epow(954e0) 0 2 hardware 0 /etc/drivers/ascsiddpin(189acd4) 1 2 hardware 194 /etc/drivers/rsdd(1941354) 3 10 hardware 10589024 /etc/drivers/mpsdd(1977a88) 3 14 hardware 101947 /etc/drivers/ascsiddpin(189ab8c) 5 62 hardware 61336129 clock(952c4) 10 63 hardware 13769 i_softoff(9527c) priority level type count module(handler) 0 0 hardware 0 i_misc_pwr(a868c) 0 1 hardware 0 i_scu(a8680) 0 2 hardware 0 i_epow(954e0) 0 2 hardware 0 /etc/drivers/ascsiddpin(189acd4) 1 2 hardware 0 /etc/drivers/rsdd(1941354) 3 10 hardware 25 /etc/drivers/mpsdd(1977a88) 3 14 hardware 0 /etc/drivers/ascsiddpin(189ab8c) 5 62 hardware 105 clock(952c4) 10 63 hardware 0 i_softoff(9527c)
Note: The output will differ from system to system, depending on hardware and software configurations (for example, the clock interrupts may not be displayed in the vmstat -i output although they will be accounted for under the in column in the normal vmstat output). Check for high numbers in the count column and investigate why this module has to execute so many interrupts.
The sar command is a standard UNIX command used to gather statistical data about the system. With its numerous options, the sar command provides queuing, paging, TTY, and many other statistics. With AIX 4.3.3, the sar -d option generates real-time disk I/O statistics.
# sar -d 3 3 AIX konark 3 4 0002506F4C00 08/26/99 12:09:50 device %busy avque r+w/s blks/s avwait avserv 12:09:53 hdisk0 1 0.0 0 5 0.0 0.0 hdisk1 0 0.0 0 1 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 12:09:56 hdisk0 0 0.0 0 0 0.0 0.0 hdisk1 0 0.0 0 1 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 12:09:59 hdisk0 1 0.0 1 4 0.0 0.0 hdisk1 0 0.0 0 1 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0 Average hdisk0 0 0.0 0 3 0.0 0.0 hdisk1 0 0.0 0 1 0.0 0.0 cd0 0 0.0 0 0 0.0 0.0
The fields listed by the sar -d command are as follows:
The lslv command shows, among other information, the logical volume fragmentation. To check logical volume fragmentation, use the command lslv -l lvname, as follows:
# lslv -l hd2 hd2:/usr PV COPIES IN BAND DISTRIBUTION hdisk0 114:000:000 22% 000:042:026:000:046
The output of COPIES shows the logical volume hd2 has only one copy. The IN BAND shows how well the intrapolicy, an attribute of logical volumes, is followed. The higher the percentage, the better the allocation efficiency. Each logical volume has its own intrapolicy. If the operating system cannot meet this requirement, it chooses the best way to meet the requirements. In our example, there are a total of 114 logical partitions (LP); 42 LPs are located on middle, 26 LPs on center, and 46 LPs on inner-edge. Since the logical volume intrapolicy is center, the in-band is 22 percent (26 / (42+26+46). The DISTRIBUTION shows how the physical partitions are placed in each part of the intrapolicy; that is:
edge : middle : center : inner-middle : inner-edge
See Position on Physical Volume for additional information about physical partitions placement.
If the workload shows a significant degree of I/O dependency, you can investigate the physical placement of the files on the disk to determine if reorganization at some level would yield an improvement. To see the placement of the partitions of logical volume hd11 within physical volume hdisk0, use the following:
# lslv -p hdisk0 hd11 hdisk0:hd11:/home/op USED USED USED USED USED USED USED USED USED USED 1-10 USED USED USED USED USED USED USED 11-17 USED USED USED USED USED USED USED USED USED USED 18-27 USED USED USED USED USED USED USED 28-34 USED USED USED USED USED USED USED USED USED USED 35-44 USED USED USED USED USED USED 45-50 USED USED USED USED USED USED USED USED USED USED 51-60 0052 0053 0054 0055 0056 0057 0058 61-67 0059 0060 0061 0062 0063 0064 0065 0066 0067 0068 68-77 0069 0070 0071 0072 0073 0074 0075 78-84
Look for the rest of hd11 on hdisk1 with the following:
# lslv -p hdisk1 hd11 hdisk1:hd11:/home/op 0035 0036 0037 0038 0039 0040 0041 0042 0043 0044 1-10 0045 0046 0047 0048 0049 0050 0051 11-17 USED USED USED USED USED USED USED USED USED USED 18-27 USED USED USED USED USED USED USED 28-34 USED USED USED USED USED USED USED USED USED USED 35-44 USED USED USED USED USED USED 45-50 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 51-60 0011 0012 0013 0014 0015 0016 0017 61-67 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 68-77 0028 0029 0030 0031 0032 0033 0034 78-84
From top to bottom, five blocks represent edge, middle, center, inner-middle, and inner-edge, respectively.
In the previous example, logical volume hd11 is fragmented within physical volume hdisk1, with its first logical partitions in the inner-middle and inner regions of hdisk1, while logical partitions 35-51 are in the outer region. A workload that accessed hd11 randomly would experience unnecessary I/O wait time as longer seeks might be needed on logical volume hd11. These reports also indicate that there are no free physical partitions in either hdisk0 or hdisk1.
To see how the file copied earlier, big1, is stored on the disk, we can use the fileplace command. The fileplace command displays the placement of a file's blocks within a logical volume or within one or more physical volumes.
To determine whether the fileplace command is installed and available, run the following command:
# lslpp -lI perfagent.tools
Use the following command:
# fileplace -pv big1 File: big1 Size: 3554273 bytes Vol: /dev/hd10 Blk Size: 4096 Frag Size: 4096 Nfrags: 868 Compress: no Inode: 19 Mode: -rwxr-xr-x Owner: hoetzel Group: system Physical Addresses (mirror copy 1) Logical Fragment ---------------------------------- ---------------- 0001584-0001591 hdisk0 8 frags 32768 Bytes, 0.9% 0001040-0001047 0001624-0001671 hdisk0 48 frags 196608 Bytes, 5.5% 0001080-0001127 0001728-0002539 hdisk0 812 frags 3325952 Bytes, 93.5% 0001184-0001995 868 frags over space of 956 frags: space efficiency = 90.8% 3 fragments out of 868 possible: sequentiality = 99.8%
This example shows that there is very little fragmentation within the file, and those are small gaps. We can therefore infer that the disk arrangement of big1 is not significantly affecting its sequential read-time. Further, given that a (recently created) 3.5 MB file encounters this little fragmentation, it appears that the file system in general has not become particularly fragmented.
Occasionally, portions of a file may not be mapped to any blocks in the volume. These areas are implicitly filled with zeroes by the file system. These areas show as unallocated logical blocks. A file that has these holes will show the file size to be a larger number of bytes than it actually occupies (that is, the ls -l command will show a large size, whereas the du command will show a smaller size or the number of blocks the file really occupies on disk).
The fileplace command reads the file's list of blocks from the logical volume. If the file is new, the information may not be on disk yet. Use the sync command to flush the information. Also, the fileplace command will not display NFS remote files (unless the command runs on the server).
Note: If a file has been created by seeking to various locations and writing widely dispersed records, only the pages that contain records will take up space on disk and appear on a fileplace report. The file system does not fill in the intervening pages automatically when the file is created. However, if such a file is read sequentially (by the cp or tar commands, for example) the space between records is read as binary zeroes. Thus, the output of such a cp command can be much larger than the input file, although the data is the same.
Higher space efficiency means files are less fragmented and probably provide better sequential file access. A higher sequentiality indicates that the files are more contiguously allocated, and this will probably be better for sequential file access.
If you find that your sequentiality or space efficiency values become low, you can use the reorgvg command to improve logical volume utilization and efficiency (see Reorganizing Logical Volumes). To improve file system utilization and efficiency, see Reorganizing File Systems.
In this example, the Largest fragment physical address - Smallest fragment physical address + 1 is: 0002539 - 0001584 + 1 = 956 fragments; total used fragments is: 8 + 48 + 812 = 868; the space efficiency is 868 / 956 (90.8 percent); the sequentiality is (868 - 3 + 1) / 868 = 99.8 percent.
Because the total number of fragments used for file storage does not include the indirect blocks location, but the physical address does, the space efficiency can never be 100 percent for files larger than 32 KB, even if the file is located on contiguous fragments.
I/O to and from paging spaces is random, mostly one page at a time. The vmstat reports indicate the amount of paging-space I/O taking place. Both of the following examples show the paging activity that occurs during a C compilation in a machine that has been artificially shrunk using the rmss command. The pi and po (paging-space page-ins and paging-space page-outs) columns show the amount of paging-space I/O (in terms of 4096-byte pages) during each 5-second interval. The first report (summary since system reboot) has been removed. Notice that the paging activity occurs in bursts.
# vmstat 5 8 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 1 72379 434 0 0 0 0 2 0 376 192 478 9 3 87 1 0 1 72379 391 0 8 0 0 0 0 631 2967 775 10 1 83 6 0 1 72379 391 0 0 0 0 0 0 625 2672 790 5 3 92 0 0 1 72379 175 0 7 0 0 0 0 721 3215 868 8 4 72 16 2 1 71384 877 0 12 13 44 150 0 662 3049 853 7 12 40 41 0 2 71929 127 0 35 30 182 666 0 709 2838 977 15 13 0 71 0 1 71938 122 0 0 8 32 122 0 608 3332 787 10 4 75 11 0 1 71938 122 0 0 0 3 12 0 611 2834 733 5 3 75 17
The following "before and after" vmstat -s reports show the
accumulation of paging activity. Remember that it is the paging space
page ins and paging space page outs that represent true paging-space
I/O. The (unqualified) page ins and page outs report total I/O, that is
both paging-space I/O and the ordinary file I/O, performed by the paging
mechanism. The reports have been edited to remove lines that are
irrelevant to this discussion.
# vmstat -s # before | # vmstat -s # after |
6602 page ins
3948 page outs 544 paging space page ins 1923 paging space page outs 0 total reclaims | 7022 page ins
4146 page outs 689 paging space page ins 2032 paging space page outs 0 total reclaims |
The fact that more paging-space page-ins than page-outs occurred during the compilation suggests that we had shrunk the system to the point that thrashing begins. Some pages were being repaged because their frames were stolen before their use was complete.
The technique just discussed can also be used to assess the disk I/O load generated by a program. If the system is otherwise idle, the following sequence:
# vmstat -s >statout # testpgm # sync # vmstat -s >> statout # egrep "ins|outs" statout
yields a before and after picture of the cumulative disk activity counts, such as:
5698 page ins 5012 page outs 0 paging space page ins 32 paging space page outs 6671 page ins 5268 page outs 8 paging space page ins 225 paging space page outs
During the period when this command (a large C compile) was running, the system read a total of 981 pages (8 from paging space) and wrote a total of 449 pages (193 to paging space).
The filemon command uses the trace facility to obtain a detailed picture of I/O activity during a time interval on the various layers of file system utilization, including the logical file system, virtual memory segments, LVM, and physical disk layers. Data can be collected on all the layers, or layers can be specified with the -O layer option. The default is to collect data on the VM, LVM, and physical layers. Both summary and detailed reports are generated. Since it uses the trace facility, the filemon command can be run only by the root user or by a member of the system group.
To determine whether the filemon command is installed and available, run the following command:
# lslpp -lI perfagent.tools
Tracing is started by the filemon command, optionally suspended with the trcoff subcommand and resumed with the trcon subcomand, and terminated with the trcstop subcommand (you may want to issue the command nice -n -20 trcstop to stop the filemon command since the filemon command is currently running at priority 40). As soon as tracing is terminated, the filemon command writes its report to stdout.
Note: Only data for those files opened after the filemon command was started will be collected, unless you specify the -u flag.
The filemon command can read the I/O trace data from a specified file, instead of from the real-time trace process. In this case, the filemon report summarizes the I/O activity for the system and period represented by the trace file. This offline processing method is useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time.
The trcrpt -r command must be executed on the trace logfile and redirected to another file, as follows:
# gennames > gennames.out # trcrpt -r trace.out > trace.rpt
At this point an adjusted trace logfile is fed into the filemon command to report on I/O activity captured by a previously recorded trace session as follows:
# filemon -i trace.rpt -n gennames.out | pg
In this example, the filemon command reads file system trace events from the input file trace.rpt. Because the trace data is already captured on a file, the filemon command does not put itself in the background to allow application programs to be run. After the entire file is read, an I/O activity report for the virtual memory, logical volume, and physical volume levels is displayed on standard output (which, in this example, is piped to the pg command).
If the trace command was run with the -C all flag, then run the trcrpt command also with the -C all flag (see Formatting a Report from trace -C Output).
The following sequence of commands gives an example of the filemon command usage:
# filemon -o fm.out -O all; cp /smit.log /dev/null ; trcstop
The report produced by this sequence, in an otherwise-idle system, is as follows:
Thu Aug 19 11:30:49 1999 System: AIX texmex Node: 4 Machine: 000691854C00 0.369 secs in measured interval Cpu utilization: 9.0% Most Active Files ------------------------------------------------------------------------ #MBs #opns #rds #wrs file volume:inode ------------------------------------------------------------------------ 0.1 1 14 0 smit.log /dev/hd4:858 0.0 1 0 13 null 0.0 2 4 0 ksh.cat /dev/hd2:16872 0.0 1 2 0 cmdtrace.cat /dev/hd2:16739 Most Active Segments ------------------------------------------------------------------------ #MBs #rpgs #wpgs segid segtype volume:inode ------------------------------------------------------------------------ 0.1 13 0 5e93 ??? 0.0 2 0 22ed ??? 0.0 1 0 5c77 persistent Most Active Logical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.06 112 0 151.9 /dev/hd4 / 0.04 16 0 21.7 /dev/hd2 /usr Most Active Physical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.10 128 0 173.6 /dev/hdisk0 N/A ------------------------------------------------------------------------ Detailed File Stats ------------------------------------------------------------------------ FILE: /smit.log volume: /dev/hd4 (/) inode: 858 opens: 1 total bytes xfrd: 57344 reads: 14 (0 errs) read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 1.709 min 0.002 max 19.996 sdev 5.092 FILE: /dev/null opens: 1 total bytes xfrd: 50600 writes: 13 (0 errs) write sizes (bytes): avg 3892.3 min 1448 max 4096 sdev 705.6 write times (msec): avg 0.007 min 0.003 max 0.022 sdev 0.006 FILE: /usr/lib/nls/msg/en_US/ksh.cat volume: /dev/hd2 (/usr) inode: 16872 opens: 2 total bytes xfrd: 16384 reads: 4 (0 errs) read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 0.042 min 0.015 max 0.070 sdev 0.025 lseeks: 10 FILE: /usr/lib/nls/msg/en_US/cmdtrace.cat volume: /dev/hd2 (/usr) inode: 16739 opens: 1 total bytes xfrd: 8192 reads: 2 (0 errs) read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 0.062 min 0.049 max 0.075 sdev 0.013 lseeks: 8 ------------------------------------------------------------------------ Detailed VM Segment Stats (4096 byte pages) ------------------------------------------------------------------------ SEGMENT: 5e93 segtype: ??? segment flags: reads: 13 (0 errs) read times (msec): avg 1.979 min 0.957 max 5.970 sdev 1.310 read sequences: 1 read seq. lengths: avg 13.0 min 13 max 13 sdev 0.0 SEGMENT: 22ed segtype: ??? segment flags: inode reads: 2 (0 errs) read times (msec): avg 8.102 min 7.786 max 8.418 sdev 0.316 read sequences: 2 read seq. lengths: avg 1.0 min 1 max 1 sdev 0.0 SEGMENT: 5c77 segtype: persistent segment flags: pers defer reads: 1 (0 errs) read times (msec): avg 13.810 min 13.810 max 13.810 sdev 0.000 read sequences: 1 read seq. lengths: avg 1.0 min 1 max 1 sdev 0.0 ------------------------------------------------------------------------ Detailed Logical Volume Stats (512 byte blocks) ------------------------------------------------------------------------ VOLUME: /dev/hd4 description: / reads: 5 (0 errs) read sizes (blks): avg 22.4 min 8 max 40 sdev 12.8 read times (msec): avg 4.847 min 0.938 max 13.792 sdev 4.819 read sequences: 3 read seq. lengths: avg 37.3 min 8 max 64 sdev 22.9 seeks: 3 (60.0%) seek dist (blks): init 6344, avg 40.0 min 8 max 72 sdev 32.0 time to next req(msec): avg 70.473 min 0.224 max 331.020 sdev 130.364 throughput: 151.9 KB/sec utilization: 0.06 VOLUME: /dev/hd2 description: /usr reads: 2 (0 errs) read sizes (blks): avg 8.0 min 8 max 8 sdev 0.0 read times (msec): avg 8.078 min 7.769 max 8.387 sdev 0.309 read sequences: 2 read seq. lengths: avg 8.0 min 8 max 8 sdev 0.0 seeks: 2 (100.0%) seek dist (blks): init 608672, avg 16.0 min 16 max 16 sdev 0.0 time to next req(msec): avg 162.160 min 8.497 max 315.823 sdev 153.663 throughput: 21.7 KB/sec utilization: 0.04 ------------------------------------------------------------------------ Detailed Physical Volume Stats (512 byte blocks) ------------------------------------------------------------------------ VOLUME: /dev/hdisk0 description: N/A reads: 7 (0 errs) read sizes (blks): avg 18.3 min 8 max 40 sdev 12.6 read times (msec): avg 5.723 min 0.905 max 20.448 sdev 6.567 read sequences: 5 read seq. lengths: avg 25.6 min 8 max 64 sdev 22.9 seeks: 5 (71.4%) seek dist (blks): init 4233888, avg 171086.0 min 8 max 684248 sdev 296274.2 seek dist (%tot blks):init 48.03665, avg 1.94110 min 0.00009 max 7.76331 sdev 3.36145 time to next req(msec): avg 50.340 min 0.226 max 315.865 sdev 108.483 throughput: 173.6 KB/sec utilization: 0.10
Using the filemon command in systems with real workloads would result in much larger reports and might require more trace buffer space. Space and CPU time consumption for the filemon command can degrade system performance to some extent. Use a nonproduction system to experiment with the filemon command before starting it in a production environment. Also, use offline processing and on systems with many CPUs use the -C all flag with the trace command.
Note: Although the filemon command reports average, minimum, maximum, and standard deviation in its detailed-statistics sections, the results should not be used to develop confidence intervals or other formal statistical inferences. In general, the distribution of data points is neither random nor symmetrical.
The global reports list the most active files, segments, logical volumes, and physical volumes during the measured interval. They are shown at the beginning of the filemon report. By default, the logical file and virtual memory reports are limited to the 20 most active files and segments, respectively, as measured by the total amount of data transferred. If the -v flag has been specified, activity for all files and segments is reported. All information in the reports is listed from top to bottom as most active to least active.
The most active files are smit.log on logical volume hd4 and file null. The application utilizes the terminfo database for screen management; so the ksh.cat and cmdtrace.cat are also busy. Anytime the shell needs to post a message to the screen, it uses the catalogs for the source of the data.
To identify unknown files, you can translate the logical volume name, /dev/hd1, to the mount point of the file system, /home, and use the find or the ncheck command:
# find / -inum 858 -print /smit.log
or
# ncheck -i 858 / /: 858 /smit.log
If the command is still active, the virtual memory analysis tool svmon can be used to display more information about a segment, given its segment ID (segid), as follows: svmon -D segid. See The svmon Command for a detailed discussion.
In our example, the segtype ??? means that the system cannot identify the segment type, and you must use the svmon command to get more information.
The utilization is presented in percentage, 0.06 indicates 6 percent busy during measured interval.
Note: Logical volume I/O requests start before and end after physical volume I/O requests. Total logical volume utilization will appear therefore to be higher than total physical volume utilization.
The utilization is presented in percentage, 0.10 indicates 10 percent busy during measured interval.
The detailed reports give additional information for the global reports. There is one entry for each reported file, segment, or volume in the detailed reports. The fields in each entry are described below for the four detailed reports. Some of the fields report a single value; others report statistics that characterize a distribution of many values. For example, response-time statistics are kept for all read or write requests that were monitored. The average, minimum, and maximum response times are reported, as well as the standard deviation of the response times. The standard deviation is used to show how much the individual response times deviated from the average. Approximately two-thirds of the sampled response times are between average minus standard deviation (avg - sdev) and average plus standard deviation (avg + sdev). If the distribution of response times is scattered over a large range, the standard deviation will be large compared to the average response time.
Detailed file statistics are provided for each file listed in the Most Active Files report. These stanzas can be used to determine what access has been made to the file. In addition to the number of total bytes transferred, opens, reads, writes, and lseeks, the user can also determine the read/write size and times.
The read sizes and write sizes will give you an idea of how efficiently your application is reading and writing information. Use a multiple of 4 KB pages for best results.
Each element listed in the Most Active Segments report has a corresponding stanza that shows detailed information about real I/O to and from memory.
By examining the reads and read-sequence counts, you can determine if the access is sequential or random. For example, if the read-sequence count approaches the reads count, the file access is more random. On the other hand, if the read-sequence count is significantly smaller than the read count and the read-sequence length is a high value, the file access is more sequential. The same logic applies for the writes and write sequence.
Each element listed in the Most Active Logical Volumes / Most Active Physical Volumes reports will have a corresponding stanza that shows detailed information about the logical/physical volume. In addition to the number of reads and writes, the user can also determine read and write times and sizes, as well as the initial and average seek distances for the logical / physical volume.
A long seek time can increase I/O response time and result in decreased application performance. By examining the reads and read sequence counts, you can determine if the access is sequential or random. The same logic applies to the writes and write sequence.
Following are some guidelines for using the filemon command:
Because the filemon command can potentially consume some CPU power, use this tool with discretion, and analyze the system performance while taking into consideration the overhead involved in running the tool. Tests have shown that in a CPU-saturated environment:
In general, a high % iowait indicates that the system has an application problem, a memory shortage, or an inefficient I/O subsystem configuration. For example, the application problem might be due to requesting a lot of I/O, but not doing much with the data. Understanding the I/O bottleneck and improving the efficiency of the I/O subsystem is the key in solving this bottleneck. Disk sensitivity can come in a number of forms, with different resolutions. Some typical solutions might include:
Each technique is discussed later in this chapter.