Work Load Manager Demonstration

Work Load Manager Demonstration

Audience: System Administrators

Date: September 23, 2001

Do you need to consolidate applications on fewer servers, but are concerned about meeting Service Levels?

Do you have a poorly behaving application that overloads the system to the point you can't log in?

Would you like a simple tool to view CPU, memory and I/O utilization by application?

If so, try AIX's Work Load Manager. WLM allows you to specify the amount of CPU, memory, and I/O bandwidth** available by user, group or application during peak periods. The benefit is that WLM can address all of the above issues.

To illustrate, I've attached two files containing a 10 minute WLM demonstration that can be run on your server. The demonstration includes two setsof WLM profiles and a load generation program to simulate CPU, memory or I/O activity. See below for instructions.

WLM Demo Download: wlm_demo.tar.Z
Instructions from below in PDF format: wlm_demo.pdf

Bruce Spencer,
baspence@us.ibm.com

AIX Tip of the Week: Work Load Manager Starter Kit

September 22, 2001

Background

UNIX servers typically run a single application or database due to the difficulty in maintaining service levels with mixed workloads. This can result in inefficient server utilization and higher operating costs. One solution is AIX's Work Load Manager which help maintain service levels by arbitrating contention for CPU, memory and I/O bandwidth .

I've found the best way to get started is to use WLM. Therefore this tip is a "hands on" demonstration that you can run on your system. It includes the WLM profiles and a load generation program that illustrates two basic ways WLM can be used to manage workloads.

Demonstration

This demonstration illustrates two ways to configure WLM to resolve CPU contention among three users. In the first case, we simulate three users sharing the same server. When the system is under load, we want to "time slice" the CPU unequally so that user1 gets 50% of the CPU, user2 gets 30% and user3 gets 20%.

In the second case, we simulate a server where we have one primary user and two secondary users. The primary user is granted full access to the system so that it takes precedence over all other applications. Secondary users get whatever is left over. If the primary user needs resources, the secondary users must wait.

Although this demonstration only controls user access to the CPU, it is trivial to extend this to control to groups and applications accessing CPU, memory and I/O.

Setup

If necessary, install the bos.rte.control fileset from the base AIX installation CDROM.
Create three user ids: user1, user2, user3
Unpack the wlm_demo.tar.Z file

Copy the "wlm_demo.tar.Z" file to /etc/wlm (if using ftp, transfer in binary mode).
Change directory to /etc/wlm

Unpack the tar file: "zcat wlm_demo.tar.Z | tar -xvf -"

You should see 2 directories: test1 and test2 and the loadgen program.

Test1 and test2 directories contain the WLM configuration profiles

Loadgen is a program that simulates load (CPU, memory and I/O)

Verify the loadgen program is executable by all: "chmod a+x loadgen"

Test 1: Concurrent Users with Different Time Slices

Objective: provide concurrent access to multiple groups. However, each group has different priority when the system is under load.

Method: Use "shares" to prioritize CPU access.

WLM's Formula for determining CPU Access Under Load

CPU-User(I) = shares(I)/(total active shares)

Predicted CPU Utilization

Who's Active	Active Shares	User1 CPU (50 Shares)	User2 CPU (30 Shares)	User3 CPU (20 Shares)
All users	100	50%	30%	20%
User 2, 3	50	-	60%	40%
User 1, 2	70	-	71%	29%

The following results are from a RS/6000 C10 (uniprocessor) at AIX 4.3.3 ML7.

Log on as root
Check the WLM configuration: lsclass -f

System: CPUshares = 10 memorymin = 1 memorymax = 100 Default: User1: description = "High priority jobs" tier = 1 CPUshares = 5 User2: description = "Medium priority" tier = 1 CPUshares = 3 User3: description = "Low priority" tier = 1 CPUshares = 2

Comment: in the output above, User1, User2 and User3 are "classes" (groups that access resources), not user ids. In this demo, the "class" and user id use the same name, as I put user1 in the class User1, user 2 in class User2, etc. But in practice, the class and user id names will not match, as a "class" may contain multiple user ids, group ids, and executable names.

For the moment, ignore the System class and its 10 CPU shares. The System class includes all root processes and daemons. System is in "tier 0" which means it gets any resources it needs before "tier 1" users. So the WLM formula for CPU access, only considers shares in the same tier.

Start WLM using "test1" profiles: wlmcntrl -d /etc/wlm/test1

Start user3's workload: su - user3 -c /etc/wlm/loadgen -t 500 >/dev/null &

View steady state utilization by "class" with only user3 active: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 3 41 Default 0 4 User1 0 0 User2 0 0 User3 92 7

Comment: this illustrates a WLM advantage over physical partitioning. When there is no contention, any process can have full access to the system resources. In contrast, an application in a partitioned system can only access resources within its partition, even if all other partitions are idle. Jobs in a WLM environment have more resources and generally perform better than if partitioning were used on the same system.

Start user2 workload: su - user2 -c /etc/wlm/loadgen -t 500 >/dev/null&

View the steady state utilization by "class" with user2 and user3 active: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 3 41 Default 0 4 User1 0 0 User2 52 9 User3 40 10

Comment: not quite the 60/40 user2/user3 split we expected, but close enough. Over the long term it should average out.

Start user3 workload: su - user3 -c /etc/wlm/loadgen -t 500 > /dev/null&

View the steady state utilization by "class" with all three users: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 3 43 Default 0 4 User1 43 7 User2 28 10 User3 20 10

Comment: the observed 43/28/20 ratio is close to the expected 50/30/20 ratio

Stop loadgen programs

kill %1
kill %2
kill %3

Stop WLM: wlmcntrl -o

Test 1 Conclusions:

Use "shares" to control concurrent access to a system
Put "classes" (access groups) in the same "tier"

WLM only controls access when there is contention.
The WLM overhead is less than 3% (System class)

Test 2: Primary User Gets All Resources, Secondary Users Get Unused Cycles

Objective: Primary application has full access to the server. Lower priority applications may only access whatever is left over. When the primary application needs resources, lower priority applications wait. .

Method: Use "tiers" to control access. Tiers range from 0-9, where 9 is the highest priority, and 9 the lowest. Users in tier 0 get all the resources they need. Users in tier 1 can access any left over resources after tier 0 users. Tier 2 users get any left over resources after tier 1, and so on. (Shares have no effect between tiers. For example, a tier 0 user with 1 share has priority over a tier 1 user with 1000 shares.) In the demonstration, the users were assigned the following tiers.

Root = tier 0

User1 = tier 1

User2 = tier 2

User3 = tier 3

Log on as root, "cd /etc/wlm"
Check the WLM configuration: lsclass -f

System: CPUshares = 10 memorymin = 1 memorymax = 100 Default: User3: description = "High priority jobs" tier = 3 CPUshares = 5 User2: description = "Medium priority" tier = 2 CPUshares = 3 User1: description = "Low priority" tier = 1 CPUshares = 2

Comment: we're using tiers to control access. In this example, each user is in a different tier. CPUshares have no meaning when using tiers. Tier 0 has the highest priority. Tier 9 has the lowest. All users in tier "n" get resources before users in tier "n+", regardless of CPUshares.

Start WLM using "test2" profiles: wlmcntrl -d /etc/wlm/test2

Start user3's workload: su - user3 -c /etc/wlm/loadgen -t 500 >/dev/null &

View steady state system utilization by "class" with user3: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 4 40 Default 0 4 User1 0 0 User2 0 0 User3 87 4

Comment: user3 (lowest priority user) can access the entire system if there is no contention for resources.

Start user2's workload: su - user2 -c /etc/wlm/loadgen -t 500 >/dev/null&

View steady state utilization by "class" with two users: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 3 41 Default 0 8 User1 0 0 User2 89 9 User3 2 8

Comment: working as expected. User2 is in tier 2, which has priority over user3 in tier 3. Notice that shares have no effect.

Start user1's workload: su - user1 -c /etc/wlm/loadgen -t 500 >/dev/null&

View steady state utilization by "class" with three users: wlmstat 20 3

Name CPU MEM Unclassified 0 20 System 3 42 Default 0 4 User1 88 10 User2 2 10 User3 1 8

Comment: as expected. The highest priority user gets full access if needed. Lower priority tiers must wait.

Stop the jobs

kill %1
kill %2
kill %3

Stop WLM: wlmcntrl -o

Test 2 Conclusions

Use "tiers" when you have a high priority application that takes precedence over all other jobs.
Shares have no relevance between tiers.

Discussion

Here are a couple suggestions for follow on tests:

Experiment by adding more users to each class and running different combinations of users and configurations
Try the "loadgen" memory load generator
Try the "loadgen" I/O load generator (AIX 5 only)
Try your own applications (such as multiple database instances)

There are many ways to configure WLM. Configurations will differ by applications, Service Level Agreements, and administrator preferences. As an administrator, my preferences is to add classes starting at "tier 1" and above. I leave "tier 0" for "root" processes only. This means root processes have precedence over all applications (assuming the application is started by a non-root id). This allows root access when there are application problems that might otherwise "hang" the system. This configuration has minimal impact on performance. As we saw in the test, System overhead was only 3%.

Another useful WLM function is the wlmstat command. As we saw in the tests, wlmstat summarizes CPU/Memory utilization by class. This output can be used for performance monitoring, problem determination, and capacity planning. You can use wlmstat, even if you are not using WLM to control workloads. To do so, define the "classes" according to your needs, and start WLM in the "passive" or "monitor only" mode (wlmcntrl -p).

Summary

WLM controls access only when there is contention for resources

Otherwise, jobs have full access to resources

CPU overhead is less than 3%

Use "shares" to control concurrent access

"Shares" should be in same "tier"

Use "tiers" to run non-essential jobs in the background

"Shares" have no relevance between "tiers"

Bruce Spencer
IBM
baspence@us.ibm.com

Appendix

WLM Command Summary

Function	Command
Smit Fastpath	smit wlm
Start WLM	wlmcntrl
Stop WLM	wlmcntrl -o
Check to See if WLM is Running	wlmcntrl -q
View System Performance	wlmstat [interval] [repetitions]
List Configuration	lsclass -f

WLM Documentation

IBM Redbooks: http://www.redbooks.ibm.com (Search => WLM)

AIX Documentation: http://www.rs6000.ibm.com/cgi-bin/ds_form (Search => WLM)