This document explains how defunct processes may occur, how to determine their cause, and what information to provide to software vendors if further problem determination is necessary. This document applies to all versions of AIX 3.1 through 4.x.
The AIX Operating System implements a hierarchy of processes in that each process has a parent process. A process which terminates may notify the parent process and the parent process may collect status information for that child. When the parent process fails to collect the status information in a timely manner, a "defunct" process may remain.
Because a "defunct" process is already stopped, killing the process will not have any effect. The defunct process does not use any system resources, such as CPU or disk, and uses a minimum amount of memory for storing the exit status and resource usage information.
The parent of a child process is responsible for reaping (removing) the child process when it terminates. To find the parent process, run the ps -ef command. The "PPID" column gives the process ID of the parent.
If the PPID for the defunct process is "1", the parent is the "init" process. The "init" process is the ancestor of all processes on the system. Under normal conditions the "init" process reaps all defunct process with a PPID of 1.
If the parent process ID is "1", determine if the defunct process has been terminated for more than a few minutes. (To do this, wait a few minutes and then see if the process is still there.) It is acceptable for defunct processes owned by 'init' to exist for one or two minutes, particularly on heavily loaded systems. Defunct processes are frequently created by complex shell scripts as a result of the way the command shells are designed. This continual stream of defunct processes is normal and does not indicate a problem.
A problem may occur when 'init' has not completed processing /etc/inittab and is waiting on a specific entry (most commonly an /etc/rc script) to terminate. During this time, 'init' ignores all other terminated child processes and waits for the specific child process to end. The most common indication that a hung /etc/inittab task is causing the problem is that the number of defunct processes owned by init grows without bound. In AIX 4.1.3 and greater, 'init' has been enhanced to effectively handle defunct processes, even while processing inittab wait entries. This function is available in AIX 3.2.5 as well via the installation of APAR IX50480.
To determine if 'init' has finished processing /etc/inittab, find the entries in /etc/inittab where the third field is "wait" or "once". For example:
qdaemon:2:wait:/bin/startsrc -sqdaemon
For 3.1 - 3.2 Entries:
Match the first field ("qdaemon" in the above example) with the "id=" field in the "who -a" output to see if those processes have exited. The "who -a" output has entries of the form:
Name ST Line Time Activity PID Hostname/Exit . . May 21 18:19 old 9354 id=qdaemon term=0 exit=0
If the process has exited, "exit=" will be shown in the "who -a" output. If it has not exited, there will be no "exit=" field for that entry in "who -a". In the above example, the qdaemon process (which has "wait" for the third field in /etc/inittab) has exited with a status of "0".
If an entry with "wait" as the action does not have an "exit=" value, that entry has become hung and must exit or be stopped before 'init' begins reaping child processes that have terminated, unless APAR ix50480 has been installed.
Match the first field ("qdaemon" in the above example) with the 6th field of the 'who -d' output. Each "wait" entry in the inittab file should have an entry in the 'who -d' output. If there is no entry in the output, the process did not finish. In this example, qdaemon has finished, so it is not the problem process.
. . Sep 20 10:01 old 6536 id=qdaemon term=0 exit=0
Look for a process that started from inittab but never made it to the 'who -d' output. If an entry with "wait" as the action does not have an entry, the process has become hung and must exit or be stopped before 'init' begins reaping child processes that have terminated. If on level 4.1.3 or greater, however, the child processes that have been terminated will continue to be reaped.
Note: if the /etc/inittab file has a entry like the following, remove it:
install_assist:2:wait:/usr/lib/lpd/pio/etc/pioinit > /dev/null 2>&1
This entry should have been removed by Installation Assistant after the initial installation. This process will hang and cause the defunct processes.
If the parent process ID is other than "1", the process referenced by that PID is responsible for reaping child processes that have terminated.
One cause of unreaped child processes is shell pipelines. The shell is implemented such that a pipeline causes commands to have a parent-child relationship. Many of the commands on the system do not expect to have child processes because they produce no child processes of their own. These processes are then unprepared to handle a child process that has terminated. This is particularly visible when a shell pipeline includes one or more short-lived processes and one or more long-lived processes. The short-lived process will end and be inherited by its potentially long-lived parent. This defunct process will be present as long as the parent process is still running.
Any program that creates child processes of its own should accept responsibility for the child processes when they end. When 'init' must process "orphaned" defunct processes, the load on the system is increased, because 'init' must perform additional work to determine if the terminated process was a child of init. These programs should be coded to handle child processes that have terminated. If programs do not do so, this should be reported to their respective software vendors as a potential defect.
Should you determine a process is defunct and the process should reasonably have been reaped by the parent, the following information will probably be required by the software vendor of the program in order to analyze the cause of the problem:
SLOT=`expr <PPID> / 256` (echo u $SLOT ; echo trace -k $SLOT) | crash > <filename><filename> is the file to which the output should be written.