Diagnostic Controller

[ Previous | Next | Contents | Home | Search ]

AIX Version 4.3 Understanding the Diagnostic Subsystem for AIX

Diagnostic Controller

The Diagnostic Controller function is started when the root user enters the diag command. Various flags that allow operations to be performed directly may be specified as input. For example, a flag may specify that the system or a particular resource is to be tested or that the system is to be run unattended. If no flags are specified, then the Diagnostic Controller presents menus to determine what the user wants to do.

Diagnostic object classes define the resources and tasks available for the Diagnostic Controller to work with. Predefined data in these object classes specify various attributes about the resources and tasks that may be available on the system.

The Customized Device object class (CuDv)contains information describing the resource instances actually defined to the system. A defined resource instance may or may not have a corresponding device driver that is used to control it. A resource may be a rack, drawer, adapter, disk, memory card, floating point chip, planar, bus, and so on.

The Diagnostic Controller is a data-driven program. It uses information found in both the CuDv and the Predefined Diagnostic Resources object class (PDiagRes) to generate a list of supported resources. This list of supported resources is used to build the Resource Selection menu.

Given the user's selection from the Resource Selection Menu, the Diagnostic Controller employs the PDiagRes object class to determine the appropriate Diagnostic Application (DA) to start. The Diagnostic Controller waits for the DA to complete. Diagnostic Application status is returned by the exit system call.

The Diagnostic Controller employs a system-wide view of the configuration enabling the Diagnostic Controller to walk through the configuration database testing resources. For example, if a resource fails its tests, the Diagnostic Controller may attempt to test other resources until the problem has been isolated. The Diagnostic Controller understands the dependencies between the resources. The term "resource" is used in a generic sense and includes adapters, as well as terminal devices.

The Diagnostic Controller analyzes the conclusions made by the Diagnostic Applications and generates a Problem Report. The Problem Report lists the field replaceable units (FRUs) that should be replaced, the probability of failure associated with each FRU, and the reason why the diagnosis was made.

The Diagnostic Controller writes its analysis to the directory /etc/lpp/diagnostics/data, and the diagrpt command, or "Display Previous Diagnostic Results" task, can be used at a later date to retrieve these results.

Control Flow of the Diagnostic Controller

Invoking the diag command without any flags starts the Diagnostic Controller which performs the following:

Displays the Operating Instructions menu. The version number will reflect the version of the Diagnostic code installed.
Displays the Function Selection menu, and starts the command associated with the user's selection.

Invoking the diag command with flags starts the Diagnostic Controller and passes the flags on to the Controller.

The Diagnostic Controller performs the following tasks:

Initialize the user interface. It is assumed that if there is no display and keyboard, then the initialization will fail.
- If -a, then performs configuration management.
- If -s, then performs system checkout once.
- If -S#, then runs diagnostics on the resources indicated by the Test Suite ID.
- If a flag was not specified, Diagnostics prompts the user.
From the Function Selection Menu, allows the user to select one of the following:
- Select Diagnostics
- Select Advanced Diagnostics
- Select Task Selection Menu
- Select Resource Selection Menu
If Diagnostics or Advanced Diagnostics is selected, then the following happens:
- The Diagnostic Mode Selection menu is displayed, to determine if System Verification or Problem Determination should be run.
- If Problem Determination is chosen, then the Diagnostic Controller automatically scans the error log for any PERMANENT HARDWARE errors that have been logged within the last 7 days to determine if any devices should be automatically tested. A problem report may be generated.
- Walks the configuration database to determine which resources in the current configuration can be tested. This information is presented in the Resource Selection Menu.
- If Advanced Diagnostics Routines is chosen, and the system is in Online Service mode of operation, the Diagnostic Controller will display the Test Method menu to determine if the tests should be repeated.
- Initializes the input parameters to the Diagnostic Application (DA), which are contained in the TMInput - Test Mode Input object class.
- Runs the Diagnostic Application (DA) of the resource to be tested.
- Waits for the DA to complete.
- The Diagnostic Controller then:
  - Performs isolation process.
  - Presents conclusions to the screen.
  - If no trouble is found, diagnostics exits with a return value of 0. Otherwise, a value of 1 is returned if the hardware was tested bad.
If Task Selection Menu is selected, then the following happens:
- The Diagnostic Controller displays a list of Tasks that are available for the system.
- After a task has been selected, a Resource Selection Menu will appear if the selected task supports a resource selection. After selection of a Resource, the task is called with the selected resource name as a command-line argument.
- If the selected task does not support resource selection, then the task is invoked.
If Resource Selection Menu is selected, then the following happens:
- The Diagnostic Controller displays a list of Resources available on the system.
- After a Resource has been selected, a Task Selection Menu will appear containing the commonly supported tasks for each selected Resource. After selection of a task, the task is invoked.

Return Status

The Diagnostic Controller returns the following values:

Diagnostic Controller Return Values
DIAG_EXIT_GOOD	0	No problems found
DIAG_EXIT_DEVICE_ERROR	1	Error running diagnostics
DIAG_EXIT_INTERRUPT	2	Received an interrupt while running diagnostics
DIAG_EXIT_NO_DEVICE	3	Device to test was not found in system configuration
DIAG_EXIT_BUSY	4	Another Dctrl program is running
DIAG_EXIT_LOCK_ERROR	5	Cannot create lock file for diagnostic controller
DIAG_EXIT_OBJCLASS_ERROR	6	Error accessing ODM database
DIAG_EXIT_USAGE	7	Usage error
DIAG_EXIT_SCREEN	8	Screen size incorrect
DIAG_EXIT_NoPDiagDev	9	Device not supported by diagnostics

[ Previous | Next | Contents | Home | Search ]