This chapter discusses performance issues related to POWER4-based servers and contains the following major sections:
The POWER4 microprocessor provides many performance improvements over previous microprocessor architectures:
Also, It is
The following table compares key aspects of different microprocessors used on the IBM server line.
POWER3 | RS64 | POWER4 | |
---|---|---|---|
Frequency | 450 MHz | 750 MHz | > 1 GHz |
Fixed Point Units | 3 | 2 | 2 |
Floating Point Units | 2 | 1 | 2 |
Load/Store Units | 2 | 1 | 2 |
Branch/Other Units | 1 | 1 | 2 |
Dispatch Width | 4 | 4 | 5 |
Branch Prediction | Dynamic | Static | Dynamic |
I-cache size | 32 KB | 128 KB | 64 KB |
D-cache size | 128 KB | 128 KB | 32 KB |
L2-cache size | 1, 4, 8 MB | 2, 4, 8, 16 MB | 1.44 |
L3-cache size | N/A | N/A | Scales with number of processors |
Data Prefetch | Yes | No | Yes |
Beginning with AIX 5.1 running on POWER4-based systems, the operating system provides several scalability advantages over previous systems, both in terms of workload and performance. Workload scalability refers to the ability to handle an increasing application-workload. Performance scalability refers to maintaining an acceptable level of performance as software resources increase to meet the demands of larger workloads.
The following are some of the most important scalability changes introduced in AIX 5.1.
AIX 4.3.3 and AIX 5.1 enable memory pages to be maintained in real memory all the time. This mechanism is called pinning memory. Pinning a memory region prohibits the pager from stealing pages from the pages that back the pinned memory region.
For more information on pinned memory, see Resource Management Overview.
The maximum real-memory size supported by the 64-bit kernel is 256 GB. This size is based upon the boot-time real memory requirements of hardware systems and possible I/O configurations that the 64-bit kernel supports. No minimum paging-space size requirement exists for the 64-bit kernel. This is because deferred paging-space allocation support was introduced into the kernel base in AIX 4.3.3.
Beginning with AIX 5.1, the POWER4 processor in the IBM eServer pSeries systems supports two virtual page sizes. It supports the traditional POWER architecture 4 KB page size, as well as the 16 MB page size, which is referred to as large page. AIX supports large page usage by both 32- and 64-bit applications and both the 32- and 64-bit versions of the AIX kernel support large pages.
Large page usage is primarily intended to provide performance improvements to high performance computing (HPC) applications. Memory-access intensive applications that use large amounts of virtual memory may obtain performance improvements by using large pages. The large page performance improvements are attributable to reduced translation lookaside buffer (TLB) misses due to the TLB being able to map a larger virtual memory range. Large pages also improve memory prefetching by eliminating the need to restart prefetch operations on 4 KB boundaries.
AIX must be configured to use large pages. Large pages are pinned(locked into physcal memory). They are not pageable and they cannot be used interchangeably with memory for standard programs. The default is to not have any memory allocated to the large page physical memory pool, so the amount of physical memory to be used to back large pages must be specified. The vmtune command is used to configure the size of the large page physical memory pool. The following command allocates 4 GB to the large page physical memory pool:
# vmtune -g 16777216 -L 256
The -g flag specifies the large page size in bytes. The allowable values are 16777216 (16 MB) or 268435456 (256 MB). The -L flag is the number of the -g sized blocks that are allocated to the large page physical memory pool. While the 268435456 (256 MB) size is supported by the vmtune command, on POWER4 architecture machines, the storage is managed in 16 MB size pages.
Before the new size large page memory pool can take effect, run the bosboot command and then reboot.
To use large pages for shared memory, the SHM_PIN parameter for the shmget() subroutine must be enabled for every system boot. It might be beneficial to include the following vmtune command in the /etc/inittab file to automatically enable the SHM_PIN parameter during system boot:
# vmtune -S 1
For more detailed information, see AIX Support for Large Page.
Beginning with AIX 5.1, the operating system provides a 64-bit kernel that addresses bottlenecks which could have limited throughput on 32-way systems. POWER4 systems are optimized for the 64-bit kernel, which is intended to increase scalability of RS/6000 IBM eServer pSeries systems. It is optimized for running 64-bit applications on POWER4 systems. The code base for the 64-bit kernel is almost identical to that for the 32-bit kernel. However, 64-bit code is built using a more advanced compiler.
The 64-bit kernel also improves scalability by allowing you to use larger sizes of physical memory. The 32-bit kernel is limited to 96 GB of physical memory.
The performance of 64-bit applications running on the 64-bit kernel on POWER4-based systems should be greater than, or equal to, the same application running on the same hardware with the 32-bit kernel. The 64-bit kernel allows 64-bit applications to be supported without requiring system call parameters to be remapped or reshaped. The 64-bit kernel applications use a more advanced compiler that is optimized specifically for the POWER4 system.
In most instances, 32-bit applications can run on the 64-bit kernel without performance degradation. However, 32-bit applications on the 64-bit kernel will typically have slightly lower performance than on the 32-bit call because of parameter reshaping. This performance degradation is typically not greater than 5%. For example, calling the fork() comand might result in significantly more overhead.
The performance of 64-bit applications under the 64-bit kernel on non-POWER4 systems may be lower than that of the same applications on the same hardware under the 32-bit kernel. The non-POWER4 systems are intended as a bridge to POWER4 systems and lack some of the support that is needed for optimal 64-bit kernel performance.
The performance of 64-bit kernel extensions on POWER4 systems should be the same or better than their 32-bit counterparts on the same hardware. However, performance of 64-bit kernel extensions on non-POWER4 machines may be lower than that of 32-bit kernel extensions on the same hardware because of the lack of optimization for 64-bit kernel performance on non-POWER4 systems.
Enhanced JFS (also known as JFS2) provides better scalability than JFS. Additionally JFS2 is the default file system for the 64-bit kernel. You can choose to use either JFS, which is the recommended file system for 32-bit environments, or Enhanced JFS, which is recommended for 64-bit kernel. For more information on Enhanced JFS, see Monitoring and Tuning File Systems.
Monitoring and Tuning File Systems
IBM Redbook The POWER4 Processor Introduction and Tuning Guide