[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

System Management Concepts: Operating System and Devices


System Hang Detection

All processes (also known as threads) run at a priority. This priority is numerically inverted in the range 0-126. Zero is highest priority and 126 is the lowest priority. The default priority for all threads is 60. The priority of a process can be lowered by any user with the nice command. Anyone with root authority can also raise a process's priority.

The kernel scheduler always picks the highest priority runnable thread to put on a CPU. It is therefore possible for a sufficient number of high priority threads to completely tie up the machine such that low priority threads can never run. If the running threads are at a priority higher than the default of 60, this can lock out all normal shells and logins to the point where the system appears hung.

The System Hang Detection (SHD) feature provides a mechanism to detect this situation and allow the system administrator a means to recover. This feature is implemented as a daemon (shdaemon) that runs at the highest process priority. This daemon queries the kernel for the lowest priority thread run over a specified interval. If the priority is above a configured threshold, the daemon can take one of several actions. Each of these actions can be independently enabled, and each can be configured to trigger at any priority and over any time interval. The actions and their defaults are:

    Action           Default    Default     Default   Default
                     Enabled    Priority    Timeout   Device
 
1)  Log an error        no         60          2
2)  Console message     no         60          2      /dev/console
3)  High priority       yes        60          2      /dev/tty0
    login shell
4)  Run a command at    no         60          2
    high priority
5)  Crash and reboot    no         39          5

For more information on system hang detection, see System Hang Management in AIX 5L Version 5.1 System Management Guide: Operating System and Devices.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]