The Diagnostic Controller function is started when the root user enters the diag command. Various flags that allow operations to be performed directly may be specified as input. For example, a flag may specify that the system or a particular resource is to be tested or that the system is to be run unattended. If no flags are specified, then the Diagnostic Controller presents menus to determine what the user wants to do.
Diagnostic object classes define the resources and tasks available for the Diagnostic Controller to work with. Predefined data in these object classes specify various attributes about the resources and tasks that may be available on the system.
The Customized Device object class (CuDv)contains information describing the resource instances actually defined to the system. A defined resource instance may or may not have a corresponding device driver that is used to control it. A resource may be a rack, drawer, adapter, disk, memory card, floating point chip, planar, bus, and so on.
The Diagnostic Controller is a data-driven program. It uses information found in both the CuDv and the Predefined Diagnostic Resources object class (PDiagRes) to generate a list of supported resources. This list of supported resources is used to build the Resource Selection menu.
The Diagnostic Controller supports dynamic reconfiguration of processors by updating the Resource Selection menu if a reconfiguration operation occurs while the diagnostic controller is running.
Given the user's selection from the Resource Selection Menu, the Diagnostic Controller employs the PDiagRes object class to determine the appropriate Diagnostic Application (DA) to start. The Diagnostic Controller waits for the DA to complete. Diagnostic Application status is returned by the exit system call.
The Diagnostic Controller employs a system-wide view of the configuration enabling the Diagnostic Controller to walk through the configuration database testing resources. For example, if a resource fails its tests, the Diagnostic Controller may attempt to test other resources until the problem has been isolated. The Diagnostic Controller understands the dependencies between the resources. The term "resource" is used in a generic sense and includes adapters, as well as terminal devices.
The Diagnostic Controller analyzes the conclusions made by the Diagnostic Applications and generates a Problem Report. The Problem Report lists the field replaceable units (FRUs) that should be replaced, the probability of failure associated with each FRU, and the reason why the diagnosis was made.
The Diagnostic Controller writes its analysis to the directory /etc/lpp/diagnostics/data, and the diagrpt command, or "Display Previous Diagnostic Results" task, can be used at a later date to retrieve these results.
In addition, notification of problems can be sent to external programs registered with the Diagnostic Controller. The registration is by ODM objects in the PDiagAtt class. There are 2 possible registrations:
For Systems attached to a Hardware Management Console:
PDiagAtt: DType = <fileset nickname> DSClass = "" attribute = "notify_service" value = "" rep = "s" DClass = "" DApp = <complete path to external notification program>
The program specified in DApp of the notify_service attribute is invoked when the system is managed by a Hardware Management Console (HMC). The program is invoked with the diagnostic event log sequence number of the diagnostic conclusion. The diagnostic event log API can be used to extract the specific data of the diagnostic analysis and perform any customized notifications.
The <fileset nickname> is any 15 character (or less) string that represents which fileset ships this stanza. Diagnostics does not use the nickname, but a unique value per fileset is required in DType to facilitate installing and updating the attribute because the same attribute name can be shipped in other filesets. For example, fileset devices.chrp.base.diag would ship a stanza like:
PDiagAtt: DType = "DevChrBasDia" DSClass = "" attribute = "notify_service" value = "" rep = "s" DClass = "" DApp = /usr/lpp/diagnostics/bin/diagServiceEvent
For Systems not attached to a Hardware Management Console:
PDiagAtt: DType = <fileset nickname> DSClass = "" attribute = "notify_extern" value = "" rep = "s" DClass = "" DApp = <complete path to external notification program>
The program specified in DApp of the notify_extern attribute is invoked when the system is not managed by a Hardware Management Console (HMC). The program is invoked with the diagnostic event log sequence number of the diagnostic conclusion. The diagnostic event log API can be used to extract the specific data of the diagnostic analysis and perform any customized notifications.
The <fileset nickname> is any 15 character (or less) string that represents which fileset ships this stanza. Diagnostics does not use the nickname, but a unique value, per fileset, is required in DType to facilitate installing and updating the attribute because the same attribute name can be shipped in other filesets. For example, fileset devices.chrp.base.diag would ship a stanza like:
PDiagAtt: DType = "DevChrBasDia" DSClass = "" attribute = "notify_extern" value = "" rep = "s" DClass = "" DApp = /usr/lpp/diagnostics/bin/diagServiceEvent
Invoking the diag command without any flags starts the Diagnostic Controller which performs the following:
Invoking the diag command with flags starts the Diagnostic Controller and passes the flags on to the Controller.
The Diagnostic Controller performs the following tasks:
The Diagnostic Controller returns the following values:
Diagnostic Controller | Value | Meaning |
---|---|---|
DIAG_EXIT_GOOD | 0 | No problems found |
DIAG_EXIT_DEVICE_ERROR | 1 | Error running diagnostics |
DIAG_EXIT_INTERRUPT | 2 | Received an interrupt while running diagnostics |
DIAG_EXIT_NO_DEVICE | 3 | Device to test was not found in system configuration |
DIAG_EXIT_BUSY | 4 | Another Dctrl program is running |
DIAG_EXIT_LOCK_ERROR | 5 | Cannot create lock file for diagnostic controller |
DIAG_EXIT_OBJCLASS_ERROR | 6 | Error accessing ODM database |
DIAG_EXIT_USAGE | 7 | Usage error |
DIAG_EXIT_SCREEN | 8 | Screen size incorrect |
DIAG_EXIT_NoPDiagDev | 9 | Device not supported by diagnostics |
DIAG_EXIT_NO_DIAGSUPPORT | 10 | Diagnostics is not supported |
DIAG_EXIT_NOT_MISSING | 11 | Device is not missing |
DIAG_EXIT_NO_AUTHORIZATION | 12 | User is not authorized to run diagnostics |
DIAG_EXIT_KERNSUPPORT | 13 | Device is not supported on the 64-bit kernel |