[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

Understanding the Diagnostic Subsystem for AIX

Diagnostic Controller

The Diagnostic Controller function is started when the root user enters the diag command. Various flags that allow operations to be performed directly may be specified as input. For example, a flag may specify that the system or a particular resource is to be tested or that the system is to be run unattended. If no flags are specified, then the Diagnostic Controller presents menus to determine what the user wants to do.

Diagnostic object classes define the resources and tasks available for the Diagnostic Controller to work with. Predefined data in these object classes specify various attributes about the resources and tasks that may be available on the system.

The Customized Device object class (CuDv)contains information describing the resource instances actually defined to the system. A defined resource instance may or may not have a corresponding device driver that is used to control it. A resource may be a rack, drawer, adapter, disk, memory card, floating point chip, planar, bus, and so on.

The Diagnostic Controller is a data-driven program. It uses information found in both the CuDv and the Predefined Diagnostic Resources object class (PDiagRes) to generate a list of supported resources. This list of supported resources is used to build the Resource Selection menu.

The Diagnostic Controller supports dynamic reconfiguration of processors by updating the Resource Selection menu if a reconfiguration operation occurs while the diagnostic controller is running.

Given the user's selection from the Resource Selection Menu, the Diagnostic Controller employs the PDiagRes object class to determine the appropriate Diagnostic Application (DA) to start. The Diagnostic Controller waits for the DA to complete. Diagnostic Application status is returned by the exit system call.

The Diagnostic Controller employs a system-wide view of the configuration enabling the Diagnostic Controller to walk through the configuration database testing resources. For example, if a resource fails its tests, the Diagnostic Controller may attempt to test other resources until the problem has been isolated. The Diagnostic Controller understands the dependencies between the resources. The term "resource" is used in a generic sense and includes adapters, as well as terminal devices.

The Diagnostic Controller analyzes the conclusions made by the Diagnostic Applications and generates a Problem Report. The Problem Report lists the field replaceable units (FRUs) that should be replaced, the probability of failure associated with each FRU, and the reason why the diagnosis was made.

The Diagnostic Controller writes its analysis to the directory /etc/lpp/diagnostics/data, and the diagrpt command, or "Display Previous Diagnostic Results" task, can be used at a later date to retrieve these results.

In addition, notification of problems can be sent to external programs registered with the Diagnostic Controller. The registration is by ODM objects in the PDiagAtt class. There are 2 possible registrations:

For Systems attached to a Hardware Management Console:

PDiagAtt:
        DType = <fileset nickname>
        DSClass = ""
        attribute = "notify_service"
        value = ""
        rep = "s"
        DClass = ""
        DApp = <complete path to external notification program>

The program specified in DApp of the notify_service attribute is invoked when the system is managed by a Hardware Management Console (HMC). The program is invoked with the diagnostic event log sequence number of the diagnostic conclusion. The diagnostic event log API can be used to extract the specific data of the diagnostic analysis and perform any customized notifications.

The <fileset nickname> is any 15 character (or less) string that represents which fileset ships this stanza. Diagnostics does not use the nickname, but a unique value per fileset is required in DType to facilitate installing and updating the attribute because the same attribute name can be shipped in other filesets. For example, fileset devices.chrp.base.diag would ship a stanza like:

PDiagAtt:
        DType = "DevChrBasDia"
        DSClass = ""
        attribute = "notify_service"
        value = ""
        rep = "s"
        DClass = ""
        DApp = /usr/lpp/diagnostics/bin/diagServiceEvent

For Systems not attached to a Hardware Management Console:

PDiagAtt:
        DType = <fileset nickname>
        DSClass = ""
        attribute = "notify_extern"
        value = ""
        rep = "s"
        DClass = ""
        DApp = <complete path to external notification program>

The program specified in DApp of the notify_extern attribute is invoked when the system is not managed by a Hardware Management Console (HMC). The program is invoked with the diagnostic event log sequence number of the diagnostic conclusion. The diagnostic event log API can be used to extract the specific data of the diagnostic analysis and perform any customized notifications.

The <fileset nickname> is any 15 character (or less) string that represents which fileset ships this stanza. Diagnostics does not use the nickname, but a unique value, per fileset, is required in DType to facilitate installing and updating the attribute because the same attribute name can be shipped in other filesets. For example, fileset devices.chrp.base.diag would ship a stanza like:

PDiagAtt:
        DType = "DevChrBasDia"
        DSClass = ""
        attribute = "notify_extern"
        value = ""
        rep = "s"
        DClass = ""
        DApp = /usr/lpp/diagnostics/bin/diagServiceEvent

Control Flow of the Diagnostic Controller

Invoking the diag command without any flags starts the Diagnostic Controller which performs the following:

  1. Displays the Operating Instructions menu. The version number will reflect the version of the Diagnostic code installed.
  2. Displays the Function Selection menu, and starts the command associated with the user's selection.

Invoking the diag command with flags starts the Diagnostic Controller and passes the flags on to the Controller.

The Diagnostic Controller performs the following tasks:

  1. Initialize the user interface. It is assumed that if there is no display and keyboard, then the initialization will fail.
  2. From the Function Selection Menu, allows the user to select one of the following:
  3. If Diagnostics or Advanced Diagnostics is selected, then the following happens:
  4. If Task Selection Menu is selected, then the following happens:
  5. If Resource Selection Menu is selected, then the following happens:

Return Status

The Diagnostic Controller returns the following values:

Diagnostic Controller Value Meaning
DIAG_EXIT_GOOD 0 No problems found
DIAG_EXIT_DEVICE_ERROR 1 Error running diagnostics
DIAG_EXIT_INTERRUPT 2 Received an interrupt while running diagnostics
DIAG_EXIT_NO_DEVICE 3 Device to test was not found in system configuration
DIAG_EXIT_BUSY 4 Another Dctrl program is running
DIAG_EXIT_LOCK_ERROR 5 Cannot create lock file for diagnostic controller
DIAG_EXIT_OBJCLASS_ERROR 6 Error accessing ODM database
DIAG_EXIT_USAGE 7 Usage error
DIAG_EXIT_SCREEN 8 Screen size incorrect
DIAG_EXIT_NoPDiagDev 9 Device not supported by diagnostics
DIAG_EXIT_NO_DIAGSUPPORT 10 Diagnostics is not supported
DIAG_EXIT_NOT_MISSING 11 Device is not missing
DIAG_EXIT_NO_AUTHORIZATION 12 User is not authorized to run diagnostics
DIAG_EXIT_KERNSUPPORT 13 Device is not supported on the 64-bit kernel

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]