Date: July 24, 2006
One way to determine if there is an I/O bottleneck in AIX's Logical Volume Manager (LVM) layer is to run the "filemon" command. The filemon output shows I/O response times for both Logical Volume (LV) and Physical Volume (PV) layers. If the difference is over 2-3 ms, there could be an I/O bottleneck in LVM.
If so, check the LVM buffer statistics at peak I/O loads. Run the "vmstat -v" several times, and look for increasing "I/Os blocked" in any of the buffers. (The rate of change is important, not absolute values). Use the "ioo" command to increase constrained LVM buffers. (Consult the documentation or IBM Support for more information)
The filemon.pl.gz PERL script examines filemon output for differences in LV and PV layers (note you'll have to unzip the file using "gunzip filemon.pl.gz"). Here's a quick example of how to use.
# filemon -T 320000 -o filemon.out; sleep 20; trcstop # filemon.pl filemon.out
The filemon.pl output from a "healthy" system would look like the following. Both Read(ms) and Write(ms) differences are small.
(....skipped lines...) *** LV - PV Differences *** Name Reads Read(ms) Writes Write(ms) Thruput(KBs) Difference 485 1.2 0 -0.0 -8.6
The output from a system with a serious I/O bottleneck would look like the following. The Read(ms) and Write(ms) differences are large.
(...skipped lines....) *** LV - PV Differences *** Name Reads Read(ms) Writes Write(ms) Thruput(KBs) Difference -11575 6.0 16087 4780.8 12800.8
The difference in I/O response times for reads and writes are 6.0 and 4780.9 ms, respectively.
The number of read/writes to the LV and PV layers can be significantly different. Some of the reasons include insufficient filesystem buffers, inodelocking, queuing, syncd, etc.
Thanks to Dan Braden for help with this tip!
Bruce Spencer,
baspence@us.ibm.com
July 24, 2006