Date: September 23, 2001
Do you need to consolidate applications on fewer servers, but are concerned about meeting Service Levels?
Do you have a poorly behaving application that overloads the system to the point you can't log in?
Would you like a simple tool to view CPU, memory and I/O utilization by application?
If so, try AIX's Work Load Manager. WLM allows you to specify the amount of CPU, memory, and I/O bandwidth** available by user, group or application during peak periods. The benefit is that WLM can address all of the above issues.
To illustrate, I've attached two files containing a 10 minute WLM demonstration that can be run on your server. The demonstration includes two setsof WLM profiles and a load generation program to simulate CPU, memory or I/O activity. See below for instructions.
WLM Demo Download: wlm_demo.tar.Z
Bruce Spencer, AIX Tip of the Week: Work Load Manager Starter Kit
Instructions from below in PDF format: wlm_demo.pdf
baspence@us.ibm.com
September 22, 2001
Background
UNIX servers typically run a single application or database due to the difficulty in maintaining service levels with mixed workloads. This can result in inefficient server utilization and higher operating costs. One solution is AIX's Work Load Manager which help maintain service levels by arbitrating contention for CPU, memory and I/O bandwidth .
I've found the best way to get started is to use WLM. Therefore this tip is a "hands on" demonstration that you can run on your system. It includes the WLM profiles and a load generation program that illustrates two basic ways WLM can be used to manage workloads.
Demonstration
This demonstration illustrates two ways to configure WLM to resolve CPU contention among three users. In the first case, we simulate three users sharing the same server. When the system is under load, we want to "time slice" the CPU unequally so that user1 gets 50% of the CPU, user2 gets 30% and user3 gets 20%.
In the second case, we simulate a server where we have one primary user and two secondary users. The primary user is granted full access to the system so that it takes precedence over all other applications. Secondary users get whatever is left over. If the primary user needs resources, the secondary users must wait.
Although this demonstration only controls user access to the CPU, it is trivial to extend this to control to groups and applications accessing CPU, memory and I/O.
Setup
Test 1: Concurrent Users with Different Time Slices
Objective: provide concurrent access to multiple groups. However, each group has different priority when the system is under load.
Method: Use "shares" to prioritize CPU access.
WLM's Formula for determining CPU Access Under Load
CPU-User(I) = shares(I)/(total active shares)
Predicted CPU Utilization
Who's Active |
Active Shares |
User1 CPU (50 Shares) |
User2 CPU (30 Shares) |
User3 CPU (20 Shares) |
All users |
100 |
50% |
30% |
20% |
User 2, 3 |
50 |
- |
60% |
40% |
User 1, 2 |
70 |
- |
71% |
29% |
The following results are from a RS/6000 C10 (uniprocessor) at AIX 4.3.3 ML7.
.
System: CPUshares = 10 memorymin = 1 memorymax = 100 Default: User1: description = "High priority jobs" tier = 1 CPUshares = 5 User2: description = "Medium priority" tier = 1 CPUshares = 3 User3: description = "Low priority" tier = 1 CPUshares = 2
Comment: in the output above, User1, User2 and User3 are "classes" (groups that access resources), not user ids. In this demo, the "class" and user id use the same name, as I put user1 in the class User1, user 2 in class User2, etc. But in practice, the class and user id names will not match, as a "class" may contain multiple user ids, group ids, and executable names.
For the moment, ignore the System class and its 10 CPU shares. The System class includes all root processes and daemons. System is in "tier 0" which means it gets any resources it needs before "tier 1" users. So the WLM formula for CPU access, only considers shares in the same tier.
Name CPU MEM Unclassified 0 20 System 3 41 Default 0 4 User1 0 0 User2 0 0 User3 92 7
Comment: this illustrates a WLM advantage over physical partitioning. When there is no contention, any process can have full access to the system resources. In contrast, an application in a partitioned system can only access resources within its partition, even if all other partitions are idle. Jobs in a WLM environment have more resources and generally perform better than if partitioning were used on the same system.
Name CPU MEM Unclassified 0 20 System 3 41 Default 0 4 User1 0 0 User2 52 9 User3 40 10
Comment: not quite the 60/40 user2/user3 split we expected, but close enough. Over the long term it should average out.
Name CPU MEM Unclassified 0 20 System 3 43 Default 0 4 User1 43 7 User2 28 10 User3 20 10
Comment: the observed 43/28/20 ratio is close to the expected 50/30/20 ratio
kill %1
kill %2
kill %3
Test 1 Conclusions
:
Test 2: Primary User Gets All Resources, Secondary Users Get Unused Cycles
Objective: Primary application has full access to the server. Lower priority applications may only access whatever is left over. When the primary application needs resources, lower priority applications wait. .
Method: Use "tiers" to control access. Tiers range from 0-9, where 9 is the highest priority, and 9 the lowest. Users in tier 0 get all the resources they need. Users in tier 1 can access any left over resources after tier 0 users. Tier 2 users get any left over resources after tier 1, and so on. (Shares have no effect between tiers. For example, a tier 0 user with 1 share has priority over a tier 1 user with 1000 shares.) In the demonstration, the users were assigned the following tiers.
Root = tier 0
User1 = tier 1
User2 = tier 2
User3 = tier 3
System: CPUshares = 10 memorymin = 1 memorymax = 100 Default: User3: description = "High priority jobs" tier = 3 CPUshares = 5 User2: description = "Medium priority" tier = 2 CPUshares = 3 User1: description = "Low priority" tier = 1 CPUshares = 2
Comment: we're using tiers to control access. In this example, each user is in a different tier. CPUshares have no meaning when using tiers. Tier 0 has the highest priority. Tier 9 has the lowest. All users in tier "n" get resources before users in tier "n+", regardless of CPUshares.
Name CPU MEM Unclassified 0 20 System 4 40 Default 0 4 User1 0 0 User2 0 0 User3 87 4
Comment: user3 (lowest priority user) can access the entire system if there is no contention for resources.
Name CPU MEM Unclassified 0 20 System 3 41 Default 0 8 User1 0 0 User2 89 9 User3 2 8
Comment: working as expected. User2 is in tier 2, which has priority over user3 in tier 3. Notice that shares have no effect.
Name CPU MEM Unclassified 0 20 System 3 42 Default 0 4 User1 88 10 User2 2 10 User3 1 8
Comment: as expected. The highest priority user gets full access if needed. Lower priority tiers must wait.
kill %1
kill %2
kill %3
Test 2 Conclusions
Discussion
Here are a couple suggestions for follow on tests:
There are many ways to configure WLM. Configurations will differ by applications, Service Level Agreements, and administrator preferences. As an administrator, my preferences is to add classes starting at "tier 1" and above. I leave "tier 0" for "root" processes only. This means root processes have precedence over all applications (assuming the application is started by a non-root id). This allows root access when there are application problems that might otherwise "hang" the system. This configuration has minimal impact on performance. As we saw in the test, System overhead was only 3%.
Another useful WLM function is the wlmstat command. As we saw in the tests, wlmstat summarizes CPU/Memory utilization by class. This output can be used for performance monitoring, problem determination, and capacity planning. You can use wlmstat, even if you are not using WLM to control workloads. To do so, define the "classes" according to your needs, and start WLM in the "passive" or "monitor only" mode (wlmcntrl -p).
Summary
Bruce Spencer
IBM
baspence@us.ibm.com
Appendix
WLM Command Summary
Function |
Command |
Smit Fastpath |
smit wlm |
Start WLM |
wlmcntrl |
Stop WLM |
wlmcntrl -o |
Check to See if WLM is Running |
wlmcntrl -q |
View System Performance |
wlmstat [interval] [repetitions] |
List Configuration |
lsclass -f |
WLM Documentation
IBM Redbooks: http://www.redbooks.ibm.com (Search => WLM)
AIX Documentation: http://www.rs6000.ibm.com/cgi-bin/ds_form (Search => WLM)