Date: July 15, 2001
Although AIX requires relatively minimal tuning, there are several settings that can be modified to enhance performance. Here's a list of AIX 4.3.3 settings I used for a recent "I/O intensive" database benchmark. The benchmark measured the throughput and response time of an Oracle 8.1(64 bit) database server with over 20,000 simulated Siebel 6 users on a p680 server with 24 CPU's, 64 GB memory, 4 Fibre adapters and Shark Raid 5 disk.
Standard Disclaimer: these settings are provided only as a tuning guideline. Every system is different, so "your mileage may vary."
Disclaimers aside, these settings resulted in excellent benchmark performance. The p680 server delivered 2-5 times the throughput at less than half the response time of the competitor's top end server with 54 CPUs, and 56 GB memory.
Benchmark Overview
Siebel 6 is a Customer Relationship Management application. It uses a two tier architecture that includes "fat" PC clients that run the application and a backend database server. The database server workload is mostly "read" queries.
The benchmark measured the database performance in terms of throughput and query response time. The load was generated by up to 800 simulated users running queries. The queries ranged in complexity from simple lookups to long running reports requiring non-indexed table scans and up to 16 way table joins.
The server ran AIX 4.3.3 ML7 and Oracle 8.1 (64 bit). The Oracle tables resided on a JFS filesystem. The client connections were JDBC (Java 1.1.8). The server was a p680 with 24 CPU's, 64 GB memory, 100 Mbit Ethernet, 4 Fibre adapters connected to Shark "Raid 5" disk. The Shark was configured as 8 "6+P" Raid 5 LUN's, with 32 GB Cache..
AIX Settings
The AIX tuning for this benchmark focused on I/O settings for multiuser database access. We were concerned with two types of database workloads: simple lookup queries and long running reporting queries.
The following tables list the AIX settings used in the benchmark. The tables are organized by CPU, memory, I/O and network settings. In a few cases, the setting affects two categories (ie memory and I/O), In these cases, my assignment to a category was arbitrary.
CPU Settings
Description |
Benchmark Setting |
Command Used To Change Setting |
Command Used to View Setting |
Increase the maximum number of processes allowed per user (for the 20,000+ connections) |
100,000 |
chdev -l sys0 -a maxuproc='100000' |
lsattr -El sys0 |
Memory Settings
Description |
Benchmark Setting |
Command Used To Change Setting |
Command Used to View Setting |
numfsbufs - increases buffers for high I/O rateChanged in response to large "fsbufwaitcnt" count (vmtune -a). Arbitrarily set to 10x default. |
930 (10x default) |
vmtune -b 930 |
vmtune vmtune -a |
Memory Comments
The benchmark system had sufficient memory to avoid paging. If paging had occurred, we would have tuned memory to favor paging out permanent storage (executables, filesystem metadata) over paging out working storage (Oracle's SGA). To do so, see the "vmtune" command and the "minperm" and "maxperm" settings.
I/O Settings
Description |
Benchmark Setting |
Command Used To Change Setting |
Command Used to View Setting |
Minimize potential I/O bottlenecks by randomizing data layout and sequence (see comments). |
|
smit mklv |
lslv -l <lvname> |
Improve sequential read performance by increasing "read ahead". (see comments) |
Maxpgahead=256 |
vmtune -R256 |
vmtune |
Allocate sufficient free memory to complement "read ahead setting":
|
maxfree=minfree + maxpgahead maxfree=736 |
vmtune -F 736 |
vmtune |
Resize JFSLOG to support large filesystems. Stripe to minimize JFSLOG bottleneck (see comments) |
Size = 256 MB Stripe across 8 disks (size=128k) |
See comments |
lslv -l <jfslogname> I/O activity: filemon |
Configure and enable Asynch I/O servers for Oracle (see comments) |
Min=150 Max=300 Enable |
smit chgaio (requires reboot) |
smit chgaio Active AIO servers => pstat -a |grep -c aios |
I/O Comments
Randomize data layout: reduce possible I/O contention by maximizing data spread across disks (mklv -e x). In addition, each LV should be laid out in unique sequence across the disks. This minimizes I/O bottlenecks that can occur when multiple serial reads (table scans, data loads) "tailgate" each other.
To specify a layout sequence, first create the LV specifying the layout sequence. Second create a filesystem on the "previously defined" LV. Here's an example of how to create 2 LV's across the same disks, but with different sequencing.
mkly -y lv01 -e x datavg hdisk1 hdisk2 hdisk3
mklv -y lv02 -e x datavg hdisk3 hdisk2 hdisk1
Improve sequential read by increasing "read ahead: When AIX senses a sequential read, it attempts to improve I/O performance by increasing the number of blocks it reads on the next I/O. The vmtune "maxpgahead" flag determines the number of blocks AIX can read ahead. The maxpgahead parameter must be a power of 2 (2, 4, 8, 16, …).
In our benchmark, we sized "maxpgahead" so that the maximum read ahead would roughly match the size of the RAID 5 stripe across the disks (6+P). We used the following formula to calculate "maxpgahead", then rounded up to the nearest multiple of 2:
maxpgahead Í (Raid 5 stripe size)* (#Raid-5 number of disks) / (4k/blocks )
Í (128k) * 6 disks / (4k) = 192 blocks = 256 (rounding up to nearest power of 2)
JFSLOG is a logical volume that is used by AIX to journal file changes. There's at least one JFSLOG in each Volume Group containing a filesystem. The JFSLOG helps maintain data integrity, but can become a bottleneck during heavy write traffic.
The JFSLOG should be tuned for filesystem size and I/O rate. The default JFSLOG is undersized for large filesystems. The sizing rule of thumb is 2 MB JFSLOG per GB of filesystem. To support 150GB of filesystems, we needed a 300 MB JFSLOG. To prevent the JFSLOG from becoming an I/O bottleneck, we "Raid 0" striped it across multiple disks.
The procedure for recreating a striped JFSLOG is below. In this example, the volume group name is "datavg", the JFSLOG is named jfslog00, and we want to "Raid 0" stripe it with 128k blocks over hdisks 6, 7, 8 and 9.
Asynchronous I/O improves write performance by grouping acknowledgments. Applications must be written to take advantage of AIO, as is the case with Oracle. AIO tuning involves specifying the number of AIO servers, and enabling them (smit aio). We set minimum AIX at 150, based on a SupportLine's recommendation. The maximum setting of 300 is double the minimum setting.
During the benchmark, we noted the minimum setting of 150 was probably too high. The number of active AIO servers should be between the min/max setting (pstat -a | grep -c aios). Being overconfigured doesn't hurt, so we didn't change the setting.
Network
Description |
Benchmark Setting |
Command Used To Change Setting |
Command Used to View Setting |
All settings were set in advance based on experience. No network bottlenecks observed |
sb_max = 1310720 tcp_sendspace=221184 tcp_recvspace=221184 |
no -o sb_max=1310720 no -o tcp_sendspace= no -o tcp_recvspace= |
no -a |
All of the networks setting were changed before the start of the benchmark. The changes were based on prior experience, not bottlenecks.
Comments/Observations
System performance: the CPU averaged less than 30% utilization. Wait I/O averaged below 15%. Disk activity was equally distributed across all disks. No system bottlenecks were observed during the benchmark.
Read-only databases can have significant write content. This benchmark exhibited periods of high "write" activity, even though the queries were read-only. In order of activity, the sources of write I/O included:
Memory requirements: The primary users of memory were Oracle's SGA and the JDBC connections. The Oracle SGA used 4-5 GB. (It could have been larger, but was constrained to match the size competitive server.) Each JDBC connection between the client and the database required roughly 3 MB of memory on the p680. The p680 ran out of memory at about 20,000 connections, the excess connections being paged to disk. The large number of connections did not significantly affect the database performance (after the excess connections were paged to disk).
Data Caching: AIX and Oracle did a good job of caching data in memory. There was almost no disk I/O with 400 active users running simple queries. Most I/O was associated mainly with writing temp files.
Because of the low disk activity, I would recommend the default 8 GB Shark cache over the 32 GB cache used in the benchmark.
Mixing simple and complex queries required more resources than if run separately. There appeared to be an interaction between simple lookup and long running queries that reduced throughput. The effect is difficult to quantify, but I'd estimate it to be on the order of 10-15% added overhead.
Bruce Spencer,
baspence@us.ibm.com