Relevance of High %iowait in Server Performance ..Update

Subject: Relevance of High %iowait in Server Performance ..Update

Audience: Administrators

Date: April 19, 2006

First posting: February 10, 2004

In the past, a high wait I/O was a good predictor of a bottleneck. However, today it's a bit more complicated. High wait I/O may not imply an I/O bottleneck. You'll have to consider other factors, which I'll list below.

The reason high wait I/O may not be a good predictor is because CPU performance has significantly outpaced disk performance. Here's a situation I ran into that illustrates the point. A customer replaced an older server with a new pSeries server. The new server had 6x the CPU performance. They used the same SAN for disk storage. After the upgrade, wait I/O increased from 33% to 75% during batch jobs. The customer was concerned there was a problem. The following table shows what happened.

Metric Old Server New Server
Relative CPU Performance 1 6
Wait I/O 33% 75%
Batch Time = CPU + I/O:
CPU Time
I/O Time
------------------
Total Time (minutes)
40
20
--
60
7
20
--
27

Metric	Old Server	New Server
Relative CPU Performance	1	6
Wait I/O	33%	75%
Batch Time = CPU + I/O: CPU Time I/O Time ------------------ Total Time (minutes)	40 20 -- 60	7 20 -- 27

The server upgrade improved CPU performance, but not I/O performance. The faster CPU's were able to do their job faster, hence they spent a higher percentage of their time waiting for disk I/O. Since batch time decreased by half and they were well within their SLA, the high wait I/O wasn't a real problem.

So how do you know if there is a serious I/O bottleneck when wait I/O is high? Here's my checklist:

Are we currently meeting service levels? Batch times, user response times, ...
Track performance. Is the current "wait I/O" abnormal for your system? Does it coincide with a problem?
Are there enough physical disks to handle the load? http://www.aixtips.com/AIXtip/sizing_disk_for_io.htm
What is the disk response time per "filemon"? Reads should be under 10ms, and writes under 5 ms.

If you really have a bottleneck, consider the following:

Remove any AIX bottlenecks (vmo, ioo settings). Perhaps, open an IBM "perfpmr".
Spread the data over more disk drives (see #2 above) and I/O adapters
If running a database, can old data be archived? Databases fill up over time. Indexes become longer, and it requires more I/O's to retrieve/store the same data. I've seen archiving reduce wait I/O by over half.

Bruce Spencer,
baspence@us.ibm.com

April 19, 2006