[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Performance Management Guide


Controlling Contention for the CPU

AIX Version 4 introduced the use of threads to control processor time consumption, but most of the system management tools still refer to the process in which a thread is running, rather than the thread itself.

Controlling the Priority of User Processes

User-process priorities can be manipulated using the nice or renice command or the setpri() subroutine, and displayed with the ps command. An overview of priority is provided in Process and Thread Priority.

Priority calculation is employed to accomplish the following:

The degree to which the user priorities can be manipulated is release-dependent. The algorithm for calculating priority value in AIX 4.3.1 and prior releases is more limiting than the current algorithm. The algorithm was changed in AIX 4.3.2 so that user priorities can be manipulated more than in previous releases. There is improved distinction between foreground and background processes. Using a given nice value will have a greater effect on CPU utilization. See Priority Calculation.

Running a Command with the nice Command

Any user can run a command at a less-favorable-than-normal priority by using the nice command. Only the root user can use the nice command to run commands at a more-favorable-than-normal priority. In this case, the nice command values range between -20 and 19.

With the nice command, the user specifies a value to be added to or subtracted from the standard nice value. The modified nice value is used for the process that runs the specified command. The priority of the process is still non-fixed; that is, the priority value is still recalculated periodically based on the CPU usage, nice value, and minimum user-process-priority value.

The standard nice value of a foreground process is 20 (24 for a ksh background process). The following command would cause the vmstat command to be run in the foreground with a nice value of 25 (instead of the standard 20), resulting in a less favorable priority.

# nice -n 5 vmstat 10 3 > vmstat.out

If you use the root login, the vmstat command can be run at a more favorable priority with the following:

# nice -n -5 vmstat 10 3 > vmstat.out

If you were not using root login and issued the preceeding example nice command, the vmstat command would still be run but at the standard nice value of 20, and the nice command would not issue any error message.

Setting a Fixed Priority with the setpri Subroutine

An application that runs under the root user ID can use the setpri() subroutine to set its own priority or that of another process. For example:

retcode = setpri(0,59);

would give the current process a fixed priority of 59. If the setpri() subroutine fails, it returns -1.

The following program accepts a priority value and a list of process IDs and sets the priority of all of the processes to the specified value.

/*
   fixprocpri.c
   Usage: fixprocpri priority PID . . .
*/
 
#include <sys/sched.h>
#include <stdio.h>
#include <sys/errno.h>
 
main(int argc,char **argv)
{
   pid_t ProcessID;
   int Priority,ReturnP;
 
   if( argc < 3 ) {
      printf(" usage - setpri priority pid(s) \n");
      exit(1);
   }
 
   argv++;
   Priority=atoi(*argv++);
   if ( Priority < 50 ) {
      printf(" Priority must be >= 50 \n");
      exit(1);
   }
 
   while (*argv) {
      ProcessID=atoi(*argv++);
      ReturnP = setpri(ProcessID, Priority);
      if ( ReturnP > 0 )
          printf("pid=%d new pri=%d  old pri=%d\n",
            (int)ProcessID,Priority,ReturnP);
      else {
          perror(" setpri failed ");
            exit(1);
      }
   }
}

Displaying Process Priority with the ps Command

The -l (lowercase L) flag of the ps command displays the nice values and current priority values of the specified processes. For example, you can display the priorities of all of the processes owned by a given user with the following:

# ps -lu hoetzel
       F S UID  PID PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  241801 S 200 7032 7286   0  60 20 1b4c   108           pts/2  0:00 ksh
  200801 S 200 7568 7032   0  70 25 2310    88  5910a58  pts/2  0:00 vmstat
  241801 S 200 8544 6494   0  60 20 154b   108           pts/0  0:00 ksh

The output shows the result of the nice -n 5 command described previously. Process 7568 has an inferior priority of 70. (The ps command was run by a separate session in superuser mode, hence the presence of two TTYs.)

If one of the processes had used the setpri(10758, 59) subroutine to give itself a fixed priority, a sample ps -l output would be as follows:

       F S UID   PID  PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  200903 S   0 10758 10500   0  59 -- 3438    40  4f91f98  pts/0  0:00 fixpri

Modifying the Priority with the renice Command

The renice command alters the nice value, and thus the priority, of one or more processes that are already running. The processes are identified either by process ID, process group ID, or the name of the user who owns the processes.

For AIX Version 4, the syntax of the renice command has been changed to complement the alternative syntax of the nice command, which uses the -n flag to identify the nice-value increment.

The renice command cannot be used on fixed-priority processes. A non-root user can specify a value to be added to, but not subtracted from the nice value of one or more running processes. The modification is done to the nice values of the processes. The priority of these processes is still non-fixed. Only the root user can use the renice command to alter the priority value within the range of -20 to 20, or subtract from the nice value of one or more running processes.

To continue the example, use the renice command to alter the nice value of the vmstat process that you started with nice.

# renice -n -5 7568
# ps -lu hoetzel
       F S UID  PID PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  241801 S 200 7032 7286   0  60 20 1b4c   108           pts/2  0:00 ksh
  200801 S 200 7568 7032   0  60 20 2310    92  5910a58  pts/2  0:00 vmstat
  241801 S 200 8544 6494   0  60 20 154b   108           pts/0  0:00 ksh

Now the process is running at a more favorable priority that is equal to the other foreground processes. To undo the effects of this, you could issue the following:

# renice -n 5 7569
# ps -lu hoetzel
       F S UID  PID PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  241801 S 200 7032 7286   0  60 20 1b4c   108           pts/2  0:00 ksh
  200801 S 200 7568 7032   1  70 25 2310    92  5910a58  pts/2  0:00 vmstat
  241801 S 200 8544 6494   0  60 20 154b   108           pts/0  0:00 ksh

In these examples, the renice command was run by the root user. When run by an ordinary user ID, there are two major limitations to the use of the renice command:

Clarification of the nice and renice Command Syntax

The nice and renice commands have different ways of specifying the amount that is to be added to the standard nice value of 20.

For AIX Version 4, the syntax of the renice command has been changed to complement the alternative syntax of the nice command, which uses the -n flag to identify the nice-value increment.

Command Command Resulting nice Value Best Priority Value in AIX 4.3.1 Best Priority Value in AIX 4.3.2 and Subsequent Releases
nice -n 5 renice -n 5 25 65 70
nice -n +5 renice -n +5 25 65 70
nice -n -5 renice -n -5 15 55 55

Tuning the Thread-Priority-Value Calculation

This section discusses tuning using the following:

The schedtune command and several enhancements to the CPU scheduler permit changes to the parameters used to calculate the priority value for each thread. See Process and Thread Priority for background information on priority.

To determine whether the schedtune program is installed and available, run the following command:

# lslpp -lI bos.adt.samples

Priority Calculation

The formula for calculating the priority value is:

priority value = base priority + nice penalty + (CPU penalty based on recent CPU usage)

The recent CPU usage value of a given thread is incremented by 1 each time that thread is in control of the CPU when the timer interrupt occurs (every 10 milliseconds). The recent CPU usage value is displayed as the C column in the ps command output. The maximum value of recent CPU usage is 120.

The default algorithm calculates the CPU penalty by dividing recent CPU usage by 2. The CPU-penalty-to-recent-CPU-usage ratio is therefore 0.5. This ratio is controlled by a value called R (the default is 16). The formula is as follows:

CPU_penalty = C * R/32

Once a second, the default algorithm divides the recent CPU usage value of every thread by 2. The recent-CPU-usage-decay factor is therefore 0.5. This factor is controlled by a value called D (the default is 16). The formula is as follows:

C = C * D/32

For some users, the algorithm in AIX 4.3.1 does not allow enough distinction between foreground and background processes. The algorithm for calculating priority value was changed in AIX 4.3.2 to increase the impact on priorities when the nice value is changed. As the units of CPU time increase, the priority decreases with the nice effect. Using schedtune -r -d can give additional control over the priority calculation by setting new values for R and D. See Tuning with the schedtune Command for further information.

The algorithm guarantees that threads whose nice values are the default of 20 will behave exactly as they did in AIX 4.3.1 and prior releases. When the nice value is not equal to the default, the priority will be affected more due to the use of two formulas.

Begin with the following equation:

p_nice = base priority + nice value

Now use the following formula:

If p_nice > 60,
   then x_nice = (p_nice * 2) - 60,
   else x_nice = p_nice.

If the nice value is greater than 20, then it has double the impact on the priority value than if it was less than or equal to 20. The new priority calculation (ignoring integer truncation) is as follows:

priority value = x_nice + [(x_nice + 4)/64 * C*(R/32)]

Note: If the nice value is the default, the formulas for AIX 4.3.1 and AIX 4.3.2 yield the same results.

Tuning with the schedtune Command

Tuning is accomplished through two options of the schedtune command: -r and -d. Each option specifies a parameter that is an integer from 0 through 32. The parameters are applied by multiplying by the parameter's value and then dividing by 32. The default r and d values are 16, which yields the same behavior as the original algorithm [(D=R=16)/32=0.5]. The new range of values permits a far wider spectrum of behaviors. For example:

# /usr/samples/kernel/schedtune -r 0

[(R=0)/32=0, (D=16)/32=0.5] would mean that the CPU penalty was always 0, making priority a function of the nice value only. No background process would get any CPU time unless there were no dispatchable foreground processes at all. The priority values of the threads would effectively be constant, although they would not technically be fixed-priority threads.

# /usr/samples/kernel/schedtune -r 5

[(R=5)/32=0.15625, (D=16)/32=0.5] would mean that a foreground process would never have to compete with a background process started with the command nice -n 10. The limit of 120 CPU time slices accumulated would mean that the maximum CPU penalty for the foreground process would be 18.

# /usr/samples/kernel/schedtune -r 6 -d 16

[(R=6)/32=0.1875, (D=16)/32=0.5] would mean that, if the background process were started with the command nice -n 10, it would be at least one second before the background process began to receive any CPU time. Foreground processes, however, would still be distinguishable on the basis of CPU usage. Long-running foreground processes that should probably be in the background would ultimately accumulate enough CPU usage to keep them from interfering with the true foreground.

# /usr/samples/kernel/schedtune -r 32 -d 32

[(R=32)/32=1, (D=32)/32=1] would mean that long-running threads would reach a C value of 120 and remain there, contending on the basis of their nice values. New threads would have priority, regardless of their nice value, until they had accumulated enough time slices to bring them within the priority value range of the existing threads.

Here are some guidelines for R and D:

If you conclude that one or both parameters need to be modified to accommodate your workload, you can enter the schedtune command while logged on as root user. The changed values will remain until the next schedtune command modifies them or until the next system boot. Values can be reset to their defaults with the command schedtune -D, but remember that all schedtune parameters are reset by that command, including VMM memory load control parameters. To make a change to the parameters that will persist across boots, add an appropriate line at the end of the /etc/inittab file.

Example of a Priority Calculation

The example shows R=4 and D=31 and assumes no other runnable threads:

current_effective_priority
                  | base process priority
                  |    | nice value
                  |    |    | count (time slices consumed)
                  |    |    |    | (schedtune -r)
                  |    |    |    |     |
  time 0          p = 40 + 20 + (0   * 4/32) =   60
  time 10 ms      p = 40 + 20 + (1   * 4/32) =   60
  time 20 ms      p = 40 + 20 + (2   * 4/32) =   60
  time 30 ms      p = 40 + 20 + (3   * 4/32) =   60
  time 40 ms      p = 40 + 20 + (4   * 4/32) =   60
  time 50 ms      p = 40 + 20 + (5   * 4/32) =   60
  time 60 ms      p = 40 + 20 + (6   * 4/32) =   60
  time 70 ms      p = 40 + 20 + (7   * 4/32) =   60
  time 80 ms      p = 40 + 20 + (8   * 4/32) =   61
  time 90 ms      p = 40 + 20 + (9   * 4/32) =   61
  time 100ms      p = 40 + 20 + (10  * 4/32) =   61
        .
  (skipping forward to 1000msec or 1 second)
        .
  time 1000ms     p = 40 + 20 + (100 * 4/32) =   72
  time 1000ms     swapper recalculates the accumulated CPU usage counts of
                  all processes. For the above process:
                            new_CPU_usage = 100 * 31/32 = 96  (if d=31)
                  after decaying by the swapper: p = 40 + 20 + ( 96 * 4/32) = 72
                  (if d=16, then p = 40 + 20 + (100/2 * 4/32) = 66)
  time 1010ms     p = 40 + 20 + ( 97 * 4/32) =   72
  time 1020ms     p = 40 + 20 + ( 98 * 4/32) =   72
  time 1030ms     p = 40 + 20 + ( 99 * 4/32) =   72
                 ..
  time 1230ms     p = 40 + 20 + (119 * 4/32) =   74
  time 1240ms     p = 40 + 20 + (120 * 4/32) =   75     count <= 120
  time 1250ms     p = 40 + 20 + (120 * 4/32) =   75
  time 1260ms     p = 40 + 20 + (120 * 4/32) =   75
                 ..
  time 2000ms     p = 40 + 20 + (120 * 4/32) =   75
  time 2000ms     swapper recalculates the counts of all processes. For above                                                    process  120 * 31/32 = 116
  time 2010ms     p = 40 + 20 + (117 * 4/32) =   74

Modifying the Scheduler Time Slice with the schedtune Command

The length of the scheduler time slice can be modified with the schedtune command. To change the time slice, use the schedtune -t option.

In AIX Version 4, the value of -t is the number of ticks for the time slice and only SCHED_RR threads will use the nondefault time slice value (see Scheduling Policy for Threads for a description of fixed priority threads).

Changing the time slice takes effect instantly and does not require a reboot.

A thread running with SCHED_OTHER or SCHED_RR scheduling policy can use the CPU for up to a full time slice (the default time slice being 1 clock tick), a clock tick being 10 ms.

In some situations, too much context switching is occurring and the overhead of dispatching threads can be more costly than allowing these threads to run for a longer time slice. In these cases, increasing the time slice might have a positive impact on the performance of fixed-priority threads. Use the vmstat and sar commands for determining the number of context switches per second.

In an environment in which the length of the time slice has been increased, some applications might not need or should not have the full time slice. These applications can give up the processor explicitly with the yield() system call (as can programs in an unmodified environment). After a yield() call, the calling thread is moved to the end of the dispatch queue for its priority level.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]