AIX Tip of the Week

Work Load Manager Demonstration

Audience: System Administrators

Date: September 23, 2001

Do you need to consolidate applications on fewer servers, but are concerned about meeting Service Levels?

Do you have a poorly behaving application that overloads the system to the point you can't log in?

Would you like a simple tool to view CPU, memory and I/O utilization by application?

If so, try AIX's Work Load Manager. WLM allows you to specify the amount of CPU, memory, and I/O bandwidth** available by user, group or application during peak periods. The benefit is that WLM can address all of the above issues.

To illustrate, I've attached two files containing a 10 minute WLM demonstration that can be run on your server. The demonstration includes two setsof WLM profiles and a load generation program to simulate CPU, memory or I/O activity. See below for instructions.

WLM Demo Download: wlm_demo.tar.Z
Instructions from below in PDF format: wlm_demo.pdf

Bruce Spencer,
baspence@us.ibm.com


AIX Tip of the Week: Work Load Manager Starter Kit

September 22, 2001

Background

UNIX servers typically run a single application or database due to the difficulty in maintaining service levels with mixed workloads. This can result in inefficient server utilization and higher operating costs. One solution is AIX's Work Load Manager which help maintain service levels by arbitrating contention for CPU, memory and I/O bandwidth .

I've found the best way to get started is to use WLM. Therefore this tip is a "hands on" demonstration that you can run on your system. It includes the WLM profiles and a load generation program that illustrates two basic ways WLM can be used to manage workloads.

Demonstration

This demonstration illustrates two ways to configure WLM to resolve CPU contention among three users. In the first case, we simulate three users sharing the same server. When the system is under load, we want to "time slice" the CPU unequally so that user1 gets 50% of the CPU, user2 gets 30% and user3 gets 20%.

In the second case, we simulate a server where we have one primary user and two secondary users. The primary user is granted full access to the system so that it takes precedence over all other applications. Secondary users get whatever is left over. If the primary user needs resources, the secondary users must wait.

Although this demonstration only controls user access to the CPU, it is trivial to extend this to control to groups and applications accessing CPU, memory and I/O.

Setup

  1. If necessary, install the bos.rte.control fileset from the base AIX installation CDROM.
  2. Create three user ids: user1, user2, user3
  3. Unpack the wlm_demo.tar.Z file

 

 

Test 1: Concurrent Users with Different Time Slices

Objective: provide concurrent access to multiple groups. However, each group has different priority when the system is under load.

Method: Use "shares" to prioritize CPU access.

WLM's Formula for determining CPU Access Under Load

CPU-User(I) = shares(I)/(total active shares)

Predicted CPU Utilization

Who's Active

Active Shares

User1 CPU

(50 Shares)

User2 CPU

(30 Shares)

User3 CPU

(20 Shares)

All users

100

50%

30%

20%

User 2, 3

50

-

60%

40%

User 1, 2

70

-

71%

29%

The following results are from a RS/6000 C10 (uniprocessor) at AIX 4.3.3 ML7.

.

  1. Log on as root
  2. Check the WLM configuration: lsclass -f
  3. System:
            CPUshares = 10
    	memorymin = 1
    	memorymax = 100
    
    Default:
    
    User1:
    	description = "High priority jobs"
    	tier   = 1
    	CPUshares = 5
    
    User2:
    	description = "Medium priority"
    	tier   = 1
    	CPUshares = 3
    
    User3:
    	description = "Low priority"
    	tier   = 1
    	CPUshares = 2
    

    Comment: in the output above, User1, User2 and User3 are "classes" (groups that access resources), not user ids. In this demo, the "class" and user id use the same name, as I put user1 in the class User1, user 2 in class User2, etc. But in practice, the class and user id names will not match, as a "class" may contain multiple user ids, group ids, and executable names.

    For the moment, ignore the System class and its 10 CPU shares. The System class includes all root processes and daemons. System is in "tier 0" which means it gets any resources it needs before "tier 1" users. So the WLM formula for CPU access, only considers shares in the same tier.

  4. Start WLM using "test1" profiles: wlmcntrl -d /etc/wlm/test1
  5. Start user3's workload: su - user3 -c /etc/wlm/loadgen -t 500 >/dev/null &
  6. View steady state utilization by "class" with only user3 active: wlmstat 20 3
  7. Name   		CPU MEM
    Unclassified   	0   20
    System		3   41
    Default		0   4
    User1  		0   0
    User2		0   0
    User3  		92  7
    

    Comment: this illustrates a WLM advantage over physical partitioning. When there is no contention, any process can have full access to the system resources. In contrast, an application in a partitioned system can only access resources within its partition, even if all other partitions are idle. Jobs in a WLM environment have more resources and generally perform better than if partitioning were used on the same system.

  8. Start user2 workload: su - user2 -c /etc/wlm/loadgen -t 500 >/dev/null&
  9. View the steady state utilization by "class" with user2 and user3 active: wlmstat 20 3
  10. Name 		CPU MEM
    Unclassified 	0   20
    System   	3   41
    Default   	0   4
    User1  		0   0
    User2		52  9
    User3  		40  10
    

    Comment: not quite the 60/40 user2/user3 split we expected, but close enough. Over the long term it should average out.

  11. Start user3 workload: su - user3 -c /etc/wlm/loadgen -t 500 > /dev/null&
  12. View the steady state utilization by "class" with all three users: wlmstat 20 3
  13. Name 		CPU MEM
    Unclassified	0   20
    System 		3   43
    Default		0   4
    User1		43  7
    User2		28  10
    User3  		20  10
    

    Comment: the observed 43/28/20 ratio is close to the expected 50/30/20 ratio

  14. Stop loadgen programs
  15. kill %1
    kill %2
    kill %3

  16. Stop WLM: wlmcntrl -o

Test 1 Conclusions:

 

 

Test 2: Primary User Gets All Resources, Secondary Users Get Unused Cycles

Objective: Primary application has full access to the server. Lower priority applications may only access whatever is left over. When the primary application needs resources, lower priority applications wait. .

Method: Use "tiers" to control access. Tiers range from 0-9, where 9 is the highest priority, and 9 the lowest. Users in tier 0 get all the resources they need. Users in tier 1 can access any left over resources after tier 0 users. Tier 2 users get any left over resources after tier 1, and so on. (Shares have no effect between tiers. For example, a tier 0 user with 1 share has priority over a tier 1 user with 1000 shares.) In the demonstration, the users were assigned the following tiers.

Root = tier 0

User1 = tier 1

User2 = tier 2

User3 = tier 3

  1. Log on as root, "cd /etc/wlm"
  2. Check the WLM configuration: lsclass -f
  3. System:
    	CPUshares = 10
    	memorymin = 1
    	memorymax = 100
    
    Default:
    
    User3:
    	description = "High priority jobs"
    	tier   = 3
    	CPUshares = 5
    
    User2:
    	description = "Medium priority"
    	tier   = 2
    	CPUshares = 3
    
    User1:
    	description = "Low priority"
    	tier   = 1
    	CPUshares = 2
    

    Comment: we're using tiers to control access. In this example, each user is in a different tier. CPUshares have no meaning when using tiers. Tier 0 has the highest priority. Tier 9 has the lowest. All users in tier "n" get resources before users in tier "n+", regardless of CPUshares.

  4. Start WLM using "test2" profiles: wlmcntrl -d /etc/wlm/test2
  5. Start user3's workload: su - user3 -c /etc/wlm/loadgen -t 500 >/dev/null &
  6. View steady state system utilization by "class" with user3: wlmstat 20 3
  7. Name 		CPU MEM
    Unclassified	0  20
    System		4  40
    Default		0   4
    User1  		0   0
    User2		0   0
    User3  		87  4
    

    Comment: user3 (lowest priority user) can access the entire system if there is no contention for resources.

  8. Start user2's workload: su - user2 -c /etc/wlm/loadgen -t 500 >/dev/null&
  9. View steady state utilization by "class" with two users: wlmstat 20 3
  10. Name            CPU MEM
    Unclassified   	0   20
    System   	3   41
    Default   	0   8
    User1  		0   0
    User2		89  9
    User3  		2   8
    

    Comment: working as expected. User2 is in tier 2, which has priority over user3 in tier 3. Notice that shares have no effect.

  11. Start user1's workload: su - user1 -c /etc/wlm/loadgen -t 500 >/dev/null&
  12. View steady state utilization by "class" with three users: wlmstat 20 3
  13. Name 		CPU MEM
    Unclassified    0   20
    System   	3   42
    Default   	0   4
    User1  		88 10
    User2		2  10
    User3  		1   8
    

    Comment: as expected. The highest priority user gets full access if needed. Lower priority tiers must wait.

  14. Stop the jobs
  15. kill %1
    kill %2
    kill %3

  16. Stop WLM: wlmcntrl -o

 

Test 2 Conclusions

 

Discussion

Here are a couple suggestions for follow on tests:

There are many ways to configure WLM. Configurations will differ by applications, Service Level Agreements, and administrator preferences. As an administrator, my preferences is to add classes starting at "tier 1" and above. I leave "tier 0" for "root" processes only. This means root processes have precedence over all applications (assuming the application is started by a non-root id). This allows root access when there are application problems that might otherwise "hang" the system. This configuration has minimal impact on performance. As we saw in the test, System overhead was only 3%.

Another useful WLM function is the wlmstat command. As we saw in the tests, wlmstat summarizes CPU/Memory utilization by class. This output can be used for performance monitoring, problem determination, and capacity planning. You can use wlmstat, even if you are not using WLM to control workloads. To do so, define the "classes" according to your needs, and start WLM in the "passive" or "monitor only" mode (wlmcntrl -p).

Summary

  1. WLM controls access only when there is contention for resources
  2. Use "shares" to control concurrent access
  3. Use "tiers" to run non-essential jobs in the background

 

Bruce Spencer
IBM
baspence@us.ibm.com

 

Appendix

 

WLM Command Summary

Function

Command

Smit Fastpath

smit wlm

Start WLM

wlmcntrl

Stop WLM

wlmcntrl -o

Check to See if WLM is Running

wlmcntrl -q

View System Performance

wlmstat [interval] [repetitions]

List Configuration

lsclass -f

   

 

WLM Documentation

IBM Redbooks: http://www.redbooks.ibm.com (Search => WLM)

AIX Documentation: http://www.rs6000.ibm.com/cgi-bin/ds_form (Search => WLM)