[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

System Management Guide:
Operating System and Devices

Monitoring and Managing Processes

This section describes procedures that you, as the system administrator, can use to manage processes. See Monitoring and Managing Processes in the AIX 5L Version 5.2 System Management Concepts: Operating System and Devices and see also AIX 5L Version 5.2 System User's Guide: Operating System and Devices for basic information on managing your own processes; for example, restarting or stopping a process that you started or scheduling a process for a later time.

Process Monitoring

The ps command is the primary tool for observing the processes in the system. Most of the flags of the ps command fall into one of two categories:

The most widely useful variants of ps for system-management purposes are:

ps -ef Lists all nonkernel processes, with the userid, process ID, recent CPU usage, total CPU usage, and the command that started the process (including its parameters).
ps -fu UserID Lists all of the processes owned by UserID, with the process ID, recent CPU usage, total CPU usage, and the command that started the process (including its parameters).

To identify the current heaviest users of CPU time, you could enter:

ps -ef | egrep -v "STIME|$LOGNAME" | sort +3 -r | head -n 15

This command lists, in descending order, the 15 most CPU-intensive processes other than those owned by you.

For more specialized uses, the following two tables are intended to simplify the task of choosing ps flags by summarizing the effects of the flags.

Process-Specifying Flags
  -A -a -d -e -G
-g
-k -p -t -U
-u
a g t x
All processes Y - - - - - - - - - Y - -
Not processes group leaders and not associated with a terminal - Y - - - - - - - - - - -
Not process group leaders - - Y - - - - - - - - - -
Not kernel processes - - - Y - - - - - - - - -
Members of specified-process groups - - - - Y - - - - - - - -
Kernel processes - - - - - Y - - - - - - -
Those specified in process number list - - - - - - Y - - - - - -
Those associated with tty(s) in the list - - - - - - - Y
(n ttys)
- - - Y
(1 tty)
-
Specified user processes - - - - - - - - Y - - - -
Processes with terminals - - - - - - - - - Y - - -
Not associated with a tty - - - - - - - - - - - - Y

 

Column-Selecting Flags
Default1 -f -l -U
-u
Default2 e l s u v
PID Y Y Y Y Y Y Y Y Y Y
TTY Y Y Y Y Y Y Y Y Y Y
TIME Y Y Y Y Y Y Y Y Y Y
CMD Y Y Y Y Y Y Y Y Y Y
USER - Y - - - - - - Y -
UID - - Y Y - - Y - - -
PPID - Y Y - - - Y - - -
C - Y Y - - - Y - - -
STIME - Y - - - - - - Y -
F - - Y - - - - - - -
S/STAT - - Y - Y Y Y Y Y Y
PIR - - Y - - - Y - - -
NI/NICE - - Y - - - Y - - -
ADDR - - Y - - - Y - - -
SIZE - - - - - - - - Y -
SZ - Y - - - Y - Y - -
WCHAN - - Y - - - Y - - -
RSS - - - - - - Y - Y Y
SSIZ - - - - - - - Y - -
%CPU - - - - - - - - Y Y
%MEM - - - - - - - - Y Y
PGIN - - - - - - - - - Y
LIM - - - - - - - - - Y
TSIZ - - - - - - - - - Y
TRS - - - - - - - - - Y
Environment
(following the command)
- - - - - Y - - - -

If ps is given with no flags or with a process-specifying flag that begins with a minus sign, the columns displayed are those shown for Default1. If the command is given with a process-specifying flag that does not begin with minus, Default2 columns are displayed. The -u or -U flag is both a process-specifying and column-selecting flag.

The following are brief descriptions of the contents of the columns:

PID Process ID
TTY Terminal or pseudo-terminal associated with the process
TIME Cumulative CPU time consumed, in minutes and seconds
CMD Command the process is running
USER Login name of the user to whom the process belongs
UID Numeric user ID of the user to whom the process belongs
PPID ID of the parent process of this process
C Recently used CPU time
STIME Time the process started, if less than 24 hours. Otherwise the date the process is started
F Eight-character hexadecimal value describing the flags associated with the process (see the detailed description of the ps command)
S/STAT Status of the process (see the detailed description of the ps command)
PRI Current priority value of the process
NI/NICE Nice value for the process
ADDR Segment number of the process stack
SIZE (-v flag) The virtual size of the data section of the process (in kilobytes)
SZ (-l and l flags) The size in kilobytes of the core image of the process.
WCHAN Event on which the process is waiting
RSS Sum of the numbers of working-segment and code-segment pages in memory times 4
SSIZ Size of the kernel stack
%CPU Percentage of time since the process started that it was using the CPU
%MEM Nominally, the percentage of real memory being used by the process, this measure does not correlate with any other memory statistics
PGIN Number of page ins caused by page faults. Since all I/O is classified as page faults, this is basically a measure of I/O volume
LIM Always xx
TSIZ Size of the text section of the executable file
TRS Number of code-segment pages times 4
Environment Value of all the environment variables for the process

Altering Process-Priority

Basically, if you have identified a process that is using too much CPU time, you can reduce its effective priority by increasing its nice value with the renice command. For example:

renice +5 ProcID

The nice value of the ProcID's would increase process from the normal 20 of a foreground process to 25. You must have root authority to reset the process ProcID's nice value to 20. Type:

renice -5 ProcID

Terminating a Process

Normally, you use the kill command to end a process. The kill command sends a signal to the designated process. Depending on the type of signal and the nature of the program that is running in the process, the process might end or might keep running. The signals you send are:

SIGTERM (signal 15) is a request to the program to terminate. If the program has a signal handler for SIGTERM that does not actually terminate the application, this kill may have no effect. This is the default signal sent by kill.
SIGKILL (signal 9) is a directive to kill the process immediately. This signal cannot be caught or ignored.

It is typically better to issue SIGTERM rather than SIGKILL. If the program has a handler for SIGTERM, it can clean up and terminate in an orderly fashion. Type:

kill -term ProcessID

(The -term could be omitted.) If the process does not respond to the SIGTERM, type:

kill -kill ProcessID

You might notice occasional defunct processes, also called zombies, in your process table. These processes are no longer executing, have no system space allocated, but still retain their PID number. You can recognize a zombie process in the process table because it displays <defunct> in the CMD column. For example:

 UID   PID  PPID   C    STIME    TTY  TIME CMD
                             .
                             .
                             .
 lee 22392 20682   0   Jul 10      -  0:05 xclock
 lee 22536 21188   0   Jul 10  pts/0  0:00 /bin/ksh
 lee 22918 24334   0   Jul 10  pts/1  0:00 /bin/ksh
 lee 23526 22536  22                  0:00 <defunct>
 lee 24334 20682   0   Jul 10      ?  0:00 aixterm
 lee 24700     1   0   Jul 16      ?  0:00 aixterm
root 25394 26792   2   Jul 16  pts/2  0:00 ksh
 lee 26070 24700   0   Jul 16  pts/3  0:00 /bin/ksh
 lee 26792 20082   0   Jul 10  pts/2  0:00 /bin/ksh
root 27024 25394   2 17:10:44  pts/2  0:00 ps -ef

Zombie processes continue to exist in the process table until the parent process dies or the system is shut down and restarted. In the example shown above, the parent process (PPID) is the ksh command. When the Korn shell is exited, the defunct process is removed from the process table.

Sometimes a number of these defunct processes collect in your process table because an application has forked several child processes and has not exited. If this becomes a problem, the simplest solution is to modify the application so its sigaction subroutine ignores the SIGCHLD signal. For more information, see the sigaction subroutine in AIX 5L Version 5.2 Technical Reference: Base Operating System and Extensions Volume 2.

Binding or Unbinding a Process

On multiprocessor systems, you can bind a process to a processor or unbind a previously bound process from:

Prerequisites

You must have root user authority to bind or unbind a process you do not own.

Binding or Unbinding a Process Tasks
Task SMIT Fast Path Command or File
Binding a Process smit bindproc bindprocessor -q
Unbinding a Process smit ubindproc bindprocessor -u

Fixing Stalled or Unwanted Processes

Stalled or unwanted processes can cause problems with your terminal. Some problems produce messages on your screen that give information about possible causes.

To perform the following procedures, you must have either a second terminal, a modem, or a network login. If you do not have any of these, fix the terminal problem by rebooting your machine.

Choose the appropriate procedure for fixing your terminal problem:

Free a Terminal Taken Over by Processes

Identify and stop stalled or unwanted processes by doing the following:

  1. Determine the active processes running on the screen by typing the following ps command:

    ps -ef | pg

    The ps command shows the process status. The -e flag writes information about all processes (except kernel processes), and the f flag generates a full listing of processes including what the command name and parameters were when the process was created. The pg command limits output to a single page at a time, so information does not quickly scroll off the screen.

    Suspicious processes include system or user processes that use up excessive amounts of a system resource such as CPU or disk space. System processes such as sendmail, routed, and lpd frequently become runaways. Use the ps -u command to check CPU usage.

  2. Determine who is running processes on this machine by using the who command:

    who

    The who command displays information about all users currently on this system, such as login name, workstation name, date, and time of login.

  3. Determine if you need to stop, suspend, or change the priority of a user process.
    Note
    You must have root authority to stop processes other than your own. If you terminate or change the priority of a user process, contact the process owner and explain what you have done.

Respond to Screen Messages

Respond to and recover from screen messages by doing the following:

  1. Make sure the DISPLAY environment variable is set correctly. Use either of the following methods to check the DISPLAY environment:
  2. Reset the terminal to its defaults using the following stty command:

    stty sane

    The stty sane command restores the "sanity" of the terminal drivers. The command outputs an appropriate terminal resetting code from the /etc/termcap file (or /usr/share/lib/terminfo if available).

  3. If the Return key does not work correctly, reset it by entering:

    ^J stty sane ^J

    The ^J represents the Ctrl-J key sequence.

RT_MPC and RT_GRQ

The use of multiple queues increases the processor affinity of threads, but there is a special situation where you might want to counteract this effect. When there is only one run queue, a thread that has been awakened (the waking thread) by another running thread (the waker thread) would normally be able to use the CPU immediately on which the waker thread was running. With multiple run queues, the waking thread may be on the run queue of another CPU which cannot notice the waking thread until the next scheduling decision. This may result in up to a 10 ms delay.

This is similar to scenarios in earlier releases of this operating system which migjht have occurred using the bindprocessor option. If all CPUs are constantly busy, and there are a number of interdependent threads waking up, there are two options available.

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]