To locate the processes dominating CPU usage, there are two standard tools, the ps command and the acctcom command. Another tool to use is the topas monitor, which is described in Using the topas Monitor.
The ps command is a flexible tool for identifying the programs that are running on the system and the resources they are using. It displays statistics and status information about processes on the system, such as process or thread ID, I/O activity, CPU and memory utilization. In this chapter, we discuss only the options and output fields that are relevant for CPU.
Three of the possible ps output columns report CPU use, each in a different way.
The following shell script:
# ps -ef | egrep -v "STIME|$LOGNAME" | sort +3 -r | head -n 15
is a tool for focusing on the highest recently used CPU-intensive user processes in the system (the header line has been reinserted for clarity):
UID PID PPID C STIME TTY TIME CMD mary 45742 54702 120 15:19:05 pts/29 0:02 ./looper root 52122 1 11 15:32:33 pts/31 58:39 xhogger root 4250 1 3 15:32:33 pts/31 26:03 xmconsole allcon root 38812 4250 1 15:32:34 pts/31 8:58 xmconstats 0 3 30 root 27036 6864 1 15:18:35 - 0:00 rlogind root 47418 25926 0 17:04:26 - 0:00 coelogin <d29dbms:0> bick 37652 43538 0 16:58:40 pts/4 0:00 /bin/ksh bick 43538 1 0 16:58:38 - 0:07 aixterm luc 60062 27036 0 15:18:35 pts/18 0:00 -ksh
Recent CPU use is the fourth column (C). The looping program's process easily heads the list. Observe that the C value may understate the looping process' CPU usage, because the scheduler stops counting at 120.
The ps command, run periodically, displays the CPU time under the TIME column and the ratio of CPU time to real time under the %CPU column. Look for the processes that dominate usage. The au and v options give similar information on user processes. The options aux and vg display both user and system processes.
The following example is taken from a four-way SMP system:
# ps au USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 19048 24.6 0.0 28 44 pts/1 A 13:53:00 2:16 /tmp/cpubound root 19388 0.0 0.0 372 460 pts/1 A Feb 20 0:02 -ksh root 15348 0.0 0.0 372 460 pts/4 A Feb 20 0:01 -ksh root 20418 0.0 0.0 368 452 pts/3 A Feb 20 0:01 -ksh root 16178 0.0 0.0 292 364 0 A Feb 19 0:00 /usr/sbin/getty root 16780 0.0 0.0 364 392 pts/2 A Feb 19 0:00 -ksh root 18516 0.0 0.0 360 412 pts/0 A Feb 20 0:00 -ksh root 15746 0.0 0.0 212 268 pts/1 A 13:55:18 0:00 ps au
The %CPU is the percentage of CPU time that has been allocated to that process since the process was started. It is calculated as follows:
(process CPU time / process duration) * 100
Imagine two processes: The first starts and runs five seconds, but does not finish; then the second starts and runs five seconds but does not finish. The ps command would now show 50 percent %CPU for the first process (five seconds CPU for 10 seconds of elapsed time) and 100 percent for the second (five seconds CPU for five seconds of elapsed time).
On an SMP, this value is divided by the number of available CPUs on the system. Looking back at the previous example, this is the reason why the %CPU value for the cpubound process will never exceed 25, because the example is run on a four-way processor system. The cpubound process uses 100 percent of one processor, but the %CPU value is divided by the number of available CPUs.
The ps command can display threads and the CPUs that threads or processes are bound to by using the ps -mo THREAD command. The following is an example:
# ps -mo THREAD USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMAND root 20918 20660 - A 0 60 1 - 240001 pts/1 - -ksh - - - 20005 S 0 60 1 - 400 - - -
The TID column shows the thread ID, the BND column shows processes and threads bound to a processor.
It is normal to see a process named kproc (PID of 516 in operating system version 4) using CPU time. When there are no threads that can be run during a time slice, the scheduler assigns the CPU time for that time slice to this kernel process (kproc), which is known as the idle or wait kproc. SMP systems will have an idle kproc for each processor. With operating system versions later than AIX 4.3.3 the name shown in the output is wait.
For complete details about the ps command, see the AIX 5L Version 5.1 Commands Reference.
The acctcom command displays historical data on CPU usage if the accounting system is activated. Starting the accounting system puts a measurable overhead on the system. Therefore, activate accounting only if absolutely needed. To activate the accounting system, do the following:
# touch acctfile
# /usr/sbin/acct/accton acctfile
# /usr/sbin/acct/accton
# /usr/sbin/acct/acctcom acctfile COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) #accton root pts/2 19:57:18 19:57:18 0.02 0.02 184.00 #ps root pts/2 19:57:19 19:57:19 0.19 0.17 35.00 #ls root pts/2 19:57:20 19:57:20 0.09 0.03 109.00 #ps root pts/2 19:57:22 19:57:22 0.19 0.17 34.00 #accton root pts/2 20:04:17 20:04:17 0.00 0.00 0.00 #who root pts/2 20:04:19 20:04:19 0.02 0.02 0.00
If you reuse the same file, you can see when the newer processes were started by looking for the accton process (this was the process used to turn off accounting the first time).