Understanding the Diagnostic Subsystem for AIX

Tasks and Service Aids

The Diagnostic Package contains programs that are called Tasks. Tasks can be thought of as performing a specific function on a resource; for example, running diagnostics, or performing a Service Aid on a resource.

Creating a Task

Note

The diagnostic subsystem only supports 32-bit Tasks and Service Aids.

Tasks are represented by an entry in the Predefined Diagnostic Task object class (PDiagTask). To create a new task, a PDiagTask object is needed plus the binary executable of the task itself, as specified by the PDiagTask->Action class member.

Some Task IDs are reserved for use by the Diagnostic Controller:

Task ID 0: Built-in Controller Task
Task ID 1000+: Reserved for Third-Party Use. Any number may be used above 999. A clash of task IDs by third-party tasks may occur if the same task ID is used. The problem may appear to the user as seeing a particular resource supported by a task, when in reality it is not. Each third-party supported task should be able to handle the condition of a nonsupported resource given as a command-line argument, if the PDiagTask->ResourceFlag is set.

Performing a Task

Many of these tasks work on all system model architectures. (The Diagnostic Task Matrix shows all current supported tasks and their supported platforms.) Some tasks are only accessible from Online Diagnostics in Service or Concurrent mode, others may be accessible only from Standalone Diagnostics. While still other tasks may only be supported on a particular system architecture, such as CHRP (Common Hardware Reference Platform), or RSPC (PowerPC Reference Platform).

Fastpath with Unknown Resource

A fastpath method is also available to perform a task by using the -T flag with the diag command. This means that the user does not have to go through most of the introductory menus just to get to a particular task. Instead the user is presented with a list of resources available that support the task specified.

The current fastpath tasks are:

format	Format Media
certify	Certify Media
download	Download Microcode
disp_mcode	Display Microcode Level
chkspares	Spare Sector Availability
identify	PCI RAID Physical Disk Identify

Fastpath with Known Resource

Each of these tasks can also be invoked directly from the command line specifying the resource and other task unique flags. This implies that the user already knows the resource to perform the task operation on. See publications Diagnostic Information for Micro Channel Bus Systems or Diagnostic Information for Multiple Bus Systems for more specific information on the tasks and flags.

Task List

The following is a list of all known supported tasks on the latest level of diagnostics. Tasks have been separated into one of six groups.

Add or Delete Drawer Configuration

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

Note

Not applicable to RSPC or CHRP systems.

This task invokes SMIT to provide the following options:

List all Drawers
Add a Drawer
Remove a Drawer

The supported drawer types are:

Media SCSI Device Drawer
DASD SCSI DASD Drawer

Add Resource to Resource List

Use this task to add resources back to the resource list.

Note

Only resources that were previously detected by the diagnostics and deleted from the Diagnostic Test List are listed. If no resources are available to be added, then none are listed.

Shell Prompt

Note

Online Service Mode only.

This Service Aid allows access to the command line. To use this Service Aid the user must know the root password (when a root password has been established).

Do not use this task to install code, or change the configuration of the system. It is intended to be used to look at files, configuration, data, etc. Changing the system configuration, or installing code may produce problems after exiting back to the Diagnostic Controller.

Analyze Adapter Internal Log (Device Specific)

The PCI RAID adapter has an internal log that logs information about the adapter and the disk drives attached to the adapter. Whenever data is logged in the internal log, the device driver copies the entries to the system error log and clears the internal log.

The Analyze Adapter Internal Log Service Aid analyzes these entries in the system error log. The Service Aid displays the errors and the associated service actions. Entries that do not require any service actions are ignored.

Backup and Restore Media

This Service Aid allows verification of backup media and devices. It presents a menu of tape and diskette devices available for testing and prompts for selection of the desired device. It then presents a menu of available backup formats and prompts for selection of the desired format. The supported formats are tar, backup, and cpio. After the device and format are selected, the Service Aid backups a known file to the selected device, restores that file to /tmp, and compares the original file to the restored file. The restored file is also left in /tmp to allow for visual comparison. All errors are reported.

Certify Media

This task allows the selection of diskette or hardfiles to be certified. Hardfiles can be connected either to a SCSI adapter(non RAID) or a PCI SCSI RAID adapter. The usage and certify criteria for a hardfile connected to a non RAID SCSI adapter are different from those for a hardfile connected to a PCI SCSI RAID adapter.

Note

The certify function for devices attached to a PCI SCSI RAID adapter is supported for certain PCI SCSI RAID adapters only.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "certify"

Change Hardware Vital Product Data

Use this Service Aid to display the Display/Alter VPD Selection Menu. The menu lists all resources installed on the system. When a resource is selected, a menu displays all the VPD that are recognized by the operating system for that resource.

Note

The user cannot alter the VPD for a specific resource unless it is not machine readable.

Configure Dials and LPFKeys

This Service Aid provides a tool for configuring and removing dials/LPFKs to the asynchronous serial ports.

Since version 4.1.3 a tty must be defined on the async port before the Dials and LPFKs can be configured on the port. Before version 4.2 the Dials and LPFKs could only be configured on the standard serial ports. At version 4.2 the Dials and LPFKs can be configured on any async port.

This selection invokes the SMIT utility to allow Dials and LPFKs configuration. A tty must be in the available state on the async port before the Dials and LPFKs can be configured on the port. The task allows an async adapter to be configured, then a tty port defined on the adapter, and then Dials and LPFKs can be defined on the port.

Configure ISA Adapter

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This task invokes SMIT to allow the identification and configuration of ISA adapters on systems that have an ISA bus and adapters.

Diagnostic support for ISA adapters not shown in the list may be supported from a Supplemental Diskette. ISA adapter support can be added from a Supplemental Diskette with the Process Supplemental Media task.

Whenever an ISA adapter is installed, this Service Aid must be run and the adapter configured before the adapter can be tested. This Service Aid must also be run (and the adapter removed) whenever an ISA adapter is physically removed from the system.

If diagnostics are run on an ISA adapter that has been removed from the system, the diagnostics fail.

ISA adapters cannot be detected by the system.

Note

When using this Service Aid choose the option that places the adapter in the "Defined State". Do not select the option that places the device in the "Available State".

Configure Reboot Policy (CHRP)

This Service Aid controls how the system tries to recover from a system crash.

Use this Service Aid to display and change the following settings for the Reboot Policy.

Notes:

Runs on CHRP systems units only.
Because of system capability, some of the following settings may not be displayed by this Service Aid.

Maximum Number of Reboot Attempts
Enter a number that is 0 or greater.
Note

A value of 0 indicates 'do not attempt to reboot' to a crashed system.

This number is the maximum number of consecutive attempts to reboot the system. The term "reboot", in the context of this Service Aid, is used to describe bringing system hardware back up from scratch, for example from a system reset or power on.

When the reboot process completes successfully, the reboot attempts count is reset to 0, and a "restart" begins. The term "restart", in the context of this Service Aid, is used to describe the operating system activation process. Restart always follows a successful reboot.

When a restart fails, and a restart policy is enabled, the system attempts to reboot for the maximum number of attempts.
Use the O/S Defined Restart Policy (1=Yes, 0=No)
When 'Use the O/S Defined Restart Policy' is set to Yes, the system attempts to reboot from a crash if the operating system has an enabled Defined Restart or Reboot Policy.

When 'Use the O/S Defined Restart Policy' is set to No, or the operating system restart policy is undefined, then the restart policy is determined by the 'Supplemental Restart Policy'.
Enable Supplemental Restart Policy (1=Yes, 0=No)
The 'Supplemental Restart Policy', if enabled, is used when the O/S Defined Restart Policy is undefined, or is set to False.

When surveillance detects operating system inactivity during restart, an enabled 'Supplemental Restart Policy' causes a system reset and the reboot process begins.
Call-Out Before Restart (on/off)
When enabled, Call-Out Before Restart allows the system to call out (on a serial port that is enabled for call out) when an operating system restart is initiated. Such calls can be valuable if the number of these events becomes excessive, thus signaling bigger problems.
Enable Unattended Start Mode (1=Yes, 0=No)
When enabled, 'Unattended Start Mode' allows the system to recover from the loss of AC power.

If the system was powered-on when the AC loss occurred, the system reboots when power is restored. If the system was powered-off when the AC loss occurred, the system remains off when power is restored.

This Service Aid may be accessed directly from the command line, by entering:

/usr/lpp/diagnostics/bin/uspchrp -b

Configure Remote Maintenance Policy (CHRP)

The Remote Maintenance Policy includes modem configurations and phone numbers to use for remote maintenance support.

Use this Service Aid to display and change the following settings for the Remote Maintenance Policy.

Notes:

Runs on CHRP systems units only.
Because of system capability, some of the following settings may not be displayed by this Service Aid.

Configuration File for Modem on S1
Configuration File for Modem on S2
Enter the name of a modem configuration file to load on either serial port 1 (S1) or serial port 2 (S2). The modem configuration files are located in the directory /usr/share/modems. If a modem file is already loaded, it is showed by Modem file currently loaded.
Modem file currently loaded on S1
Modem file currently loaded on S2
This is the name of the file that is currently loaded on serial port 1 or serial port 2.
Note

These settings are only shown when a modem file is loaded for a serial port.
Call In Authorized on S1 (on/off)
Call In Authorized on S2 (on/off)
Call In allows the Service Processor to receive a call from a remote terminal.
Call Out Authorized on S1 (on/off)
Call Out Authorized on S2 (on/off)
Call Out allows the Service Processor to place calls for maintenance.
S1 Line Speed
S2 Line Speed
A list of line speeds is available by using 'List' on the screen.
Service Center Phone Number
This is the number of the service center computer. The service center usually includes a computer that takes calls from systems with call-out capability. This computer is referred to as "the catcher". The catcher expects messages in a specific format to which the Service Processor conforms. For more information about the format and catcher computers, refer to the README file in the /usr/samples/syscatch directory. Contact the service provider for the correct telephone number to enter here.
Customer Administration Center Phone Number
This is the number of the System Administration Center computer (catcher) that receives problem calls from systems. Contact the system administrator for the correct telephone number to enter here.
Digital Pager Phone Number In Event of Emergency
This is the number for a pager carried by someone who responds to problem calls from your system.
Customer Voice Phone Number
This is the number for a telephone near the system, or answered by someone responsible for the system. This is the telephone number left on the pager for callback.
Customer System Phone Number
This is the number to which your system's modem is connected. The service or administration center representatives need this number to make direct contact with your system for problem investigation. This is also referred to as the Call In phone number.
Customer Account Number
This number could be used by a service provider for record keeping and billing.
Call Out Policy Numbers to call if failure
This is set to either 'first' or 'all'. If the call out policy is set to 'first', call out stops at the first successful call to one of the following numbers in the order listed:
1. Service Center
2. Customer Admin Center
3. Pager
If Call Out Policy is set to 'all', call out attempts to call all of the following numbers in the order listed:
1. Service Center
2. Customer Admin Center
3. Pager
Customer RETAIN Login ID
Customer RETAIN Login Password
These settings apply to the RETAIN service function.
Remote Timeout, in seconds
Remote Latency, in seconds
These settings are functions of the service provider's catcher computer.
Number of Retries While Busy
This is the number of times the system should retry calls that resulted in busy signals.
System Name (System Administrator Aid)
This is the name given to the system and is used when reporting problem messages.
Note

Knowing the system name aids the support team to quickly identify the location, configuration, history, etc. of your system.

This Service Aid may be accessed directly from the command line, by entering:

/usr/lpp/diagnostics/bin/uspchrp -m

Configure Ring Indicate Power On (RSPC)

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid allows the user to display and change the NVRAM settings for the Ring Indicate Power On capability of the service processor.

Note

Runs on RSPC systems units only.

The settings allows the user to:

Enable/Disable power on from Ring Indicate
Read/Set the number of rings before power on

Configure Ring Indicate Power On Policy (CHRP)

This Service Aid allows the user to power on a system by telephone from a remote location. If the system is powered off, and Ring Indicate Power On is enabled, the system powers on at a predetermined number of rings. If the system is already on, no action is taken. In either case, the telephone call is not answered and the caller receives no feedback that the system has powered on.

Use this Service Aid to display and change the following settings for the Ring Indicate Power On Policy.

Notes:

Runs on CHRP systems units only.
Because of system capability, some of the following settings may not be displayed by this Service Aid.

Power On Via Ring Indicate (on/off)
Number of Rings Before Power On

This Service Aid may be accessed directly from the command line, by entering:

/usr/lpp/diagnostics/bin/uspchrp -r

Configure Service Processor (RSPC)

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid allows you to display and change the NVRAM settings for the service processor.

This Service Aid supports the following functions:

Note

Runs on RSPC systems units only.

Surveillance Setup
Modem Configuration
Call In/Call Out Setup
Site Specific Call In/Call Out Setup

Surveillance Setup

This selection allows you to display and change the NVRAM settings for the surveillance capability of the service processor.

The settings allow you to:

Enable/disable surveillance
Set the surveillance time interval, in minutes
Set the surveillance delay, in minutes

The current settings are read from NVRAM and displayed on the screen. Any changes made to the data shown are written to NVRAM.

Modem Configuration

Use this selection when setting the NVRAM for a modem attached to any of the Service Processor's serial ports. The user inputs the file name of a modem configuration file and the serial port number. The formatted modem configuration file is read, converted for NVRAM than loaded into NVRAM. Refer to the Service Processor Installation and User's Guide for more information.

Call In/Out Setup

This selection allows the user to display and change the NVRAM settings for the Call In/Call Out capability of the service processor.

The settings allows the user to:

Enable/Disable call in on either serial port.
Enable/Disable call out on either serial port.
Set the line speed on either serial port.

Site Specific Call In/Out Setup

This selection allows you to display and change the NVRAM settings that are site specific for the call in/call out capability of the service processor.

The site specific NVRAM settings allow you to:

Set the phone number for the service center
Set the phone number for the customer administration center
Set the phone number for a digital pager
Set the phone number for the customer system to call in
Set the phone number for the customer voice phone
Set the customer account number
Set the call out policy
Set the customer RETAIN id
Set the customer RETAIN password
Set the remote timeout value
Set the remote latency value
Set the number of retries while busy
Set the system name

The current settings are read from NVRAM and displayed on the screen. Any changes made to the data shown are written to NVRAM.

Configure Surveillance Policy (CHRP)

This Service Aid monitors the system for hang conditions, that is, hardware or software failures that cause operating system inactivity. When enabled, and surveillance detects operating system inactivity, a call is placed to report the failure.

Use this Service Aid to display and change the following settings for the Surveillance Policy.

Notes:

Runs on CHRP systems units only.
Because of system capability, some of the following settings may not be displayed by this Service Aid.

Surveillance (on/off)
Surveillance Time Interval
This is the maximum time between heartbeats from the operating system.
Surveillance Time Delay
This is the time to delay between when the operating system is in control and when to begin operating system surveillance.
Changes are to take affect immediately
Set this to Yes if the changes made to the settings in this menu are to take place immediately. Otherwise the changes takes place beginning with the next system boot.

This Service Aid may be accessed directly from the command line, by entering:

/usr/lpp/diagnostics/bin/uspchrp -s

Create Customized Configuration Diskette

This selection invokes the Diagnostic Package Utility Service Aid which allows the user to Create a Standalone Diagnostic Package Configuration Diskette

The Standalone Diagnostic Package Configuration Diskette allows the following to be changed when running diagnostics from removable media:

High-Function Terminals 60/77-Mhz Refresh Rate
The refresh rate used by the standalone diagnostic package is 60Hz. If the display's refresh rate is 77Hz, then set the refresh rate to 77.
Different async terminal console
A console configuration file that allows a terminal attached to any RS232 or RS422 adapter to be selected as a console device can be created using this Service Aid. The default device is a RS232 tty attached to the first standard serial port (S1).

Delete Resource from Resource List

Use this task to delete resources from the resource list.

Note

Only resources that were previously detected by the diagnostics and have not been deleted from the Diagnostic Test List are listed. If no resources are available to be deleted, then none are listed.

Disk Maintenance (SCSI Disks)

Disk to Disk Copy
Display/Alter Sector

Disk to Disk Copy

This selection allows you to recover data from an old drive when replacing it with a new drive. The Service Aid only supports copying from a drive to another drive of similar size. This Service Aid cannot be used to update to a different size drive. The migratepv command should be used when updating drives. The Service Aid recovers all LVM software reassigned blocks. To prevent corrupted data from being copied to the new drive, the Service Aid aborts if an unrecoverable read error is detected. To help prevent possible problems with the new drive, the Service Aid aborts if the number of bad blocks being reassigned reaches a threshold.

Note

Use the migratepv command when copying the contents to other disk drive types. This command also works when copying SCSI disk drives or when copying to a different size SCSI disk drive. Refer to AIX 5L Version 5.2 System Management Guide: Operating System and Devices for a procedure on Migrating the Contents of a Physical Volume.

The procedure for using this Service Aid requires that both the old and new disks be installed in or attached to the system with unique SCSI addresses. This requires that the new disk drive SCSI address must be set to an address that is not currently in use and the drive be installed in an empty location. If there are no empty locations, then one of the other drives must be removed. Once the copy is complete, only one drive may remain installed. Either remove the target drive to return to the original configuration, or perform the following procedure to complete the replacement of the old drive with the new drive.

Remove both drives.
Set the SCSI address of the new drive to the SCSI address of the old drive.
Install the new drive in the old drive's location.
Install any other drives that were removed into their original location.

To prevent problems that may occur when running this Service Aid from disk, it is suggested that this Service Aid be run from the diagnostics that are loaded from removable media when possible.

Display/Alter Sector

This selection allows the user to display and alter information on a disk sector. Care must be used when using this Service Aid because inappropriate modification to some disk sectors may result in total loss of all data on the disk. Sectors are addressed by their decimal sector number. Data is displayed both in hex and in ASCII. To prevent corrupted data from being incorrectly corrected, the Service Aid does not display information that cannot be read correctly.

Display Checkstop Analysis Results

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid analyzes checkstop files and displays the results. During a system reboot, following a checkstop, a data file is written to /usr/lib/ras that contains the state of the system at the time of the checkstop. The files have names that begin with checkstop and end with either .A or .B.

The analysis of the file(s) produce a description of the problem and provide an action plan with repair instructions or recommendations. Following the action plans, a detailed dump of the data that was saved for the checkstop is displayed.

The following options are provided:

Analyze Checkstop Files Created Within the Last 7 Days
Analyze and display the results of any checkstop file that was created in the last 7 days. This is the same file(s) that the system planar diagnostics analyzed, but will provide more detail.
Analyze All of the Checkstop Files
Analyze and display the results of all of checkstop files.

For either option, carefully read the results of the analysis and perform any recommended actions.

Display Configuration and Resource List

This Service Aid displays the item header only for all installed resources. Use this Service Aid when there is no need of seeing the VPD. (No VPD is displayed.)

Display Firmware Device Node Information (CHRP)

This task displays the firmware device node information that appears on CHRP platforms. The format of the output data does not necessarily have to be the same between different levels of the operating system. It is intended to be used to gather more information about individual or particular devices on the system.

Note

Runs on CHRP systems units only.

Display Hardware Error Report

This Service Aid provides a tool for viewing the hardware error log. It uses the errpt command.

The Display Error Summary and Display Error Detail selection provide the same type of report as the errpt command. The Display Error Analysis Summary and Display Error Analysis Detail selection provide additional analysis.

Display Hardware Vital Product Data

This Service Aid displays all installed resources along with any VPD that is recognized by the operating system for those resources. Use this Service Aid when you want to look at the VPD for a specific resource.

Display Machine Check Error Log

When a machine check occurs, information is collected and logged in a NVRAM error log before the system unit shuts down. This information is logged in the error log and cleared from NVRAM when the system is rebooted from either hard disk or LAN. The information is not cleared when booting from Standalone Diagnostics. When booting from Standalone Diagnostics, this Service Aid can take the logged information and turn it into a readable format that can be used to isolate the problem. When booting from the hard disk or LAN, the information can be viewed from the error log using the Hardware Error Report Service Aid. In either case the information is analyzed when running the sysplanar0 diagnostics in Problem Determination Mode.

Note

The Machine Check Error Log Service Aid is available only on Standalone Diagnostics.

Display Microcode Level

This selection provides a way to display microcode on a device or adapter. Once invoked, a list of resources are available for selection that supports this function. Once a resource is selected, a specific application that supports that function on the resource is invoked. See the description on PDiagAtt for the stanza that is needed to achieve this.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:

diag -T "disp_mcode"

Display or Change Bootlist

This Service Aid allows the bootlist to be displayed, altered, or erased.

The system attempts to perform an IPL from the first device in the list. If the device is not a valid IPL device or if the IPL fails, the system proceeds in turn to the other devices in the list to attempt an IPL.

Display or Change BUMP Configuration

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid is unique to the POWER-based SMP system units and provides the following functions:

Display or Change Remote Support Phone Number
This function allows the remote support phone number to be displayed or altered.
Display or Change Diagnostics Modes
This function displays a dialog screen that lists the states of all the BUMP (Bringup Micro-Processor) Diagnostic Flags. The states can be changed via the dialog screen.
Save or Restore Diagnostics Modes and Remote Support Phone Number
This function allows the diagnostics modes and remote support phone number to be saved or restored. The location of the save area is to be defined.
Flash EPROM Download
This function updates the Flash EPROM.

Display or Change Diagnostic Run Time Options

The Display or Change Diagnostic Run Time Options task allows the diagnostic run time options to be set.

The run time options are:

Display Diagnostic Mode Selection Menus
This option allows the user to turn on or turn off displaying the DIAGNOSTIC MODE SELECTION MENU. The default value is on.
Include Advanced Diagnostics
This option allows the user to turn on or off including the Advanced Diagnostics. The default value is off.
Run Tests Multiple Times
This option allows the user to turn on or off running the diagnostic in Loop Mode. The default value is off.
Note

This option is only displayed when running Online Diagnostics in Service Mode.
Include Error Log Analysis
This option allows the user to turn on or off including the Error Log Analysis (ELA). The default value is off.
Number of days used to search error log
This option allows the user to select the number of days to search the error log for errors when running Error Log Analysis. The default value is 7 days, but can be changed from 1 to 60 days.
Display Progress Indicators
This option allows the user to turn on or off the progress indicators shown when running Diagnostic Applications. The progress indicators are a popup box at the bottom of the screen indicating the test being run. The default value is on.
Diagnostic Event Logging
This option allows the user to turn on or off logging information to the Diagnostics Event Log. The default value is on.
Diagnostic Event Log file size
This option allows the user to select the maximum size of the Diagnostic Event Log. The default value is 100K, but can be changed from 100K to 1000K.
Save changes to the database
This option allows the user to save any changes made to the run time options. Without saving the changes, any changes made are only applicable to that session of diagnostics. The default value is no.

Display or Change Electronic Mode Switch

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid is unique to the POWER-based SMP system units and displays the states of the Physical and Electronic Keys. It also allows the electronic keys to be set.

Display or Change Multiprocessor Configuration (Multiprocessor Service)

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid is unique to the POWER-based SMP system units and provides the following functions:

Display or Change Processor States
This function displays or changes the state of available processors.
Bind Process
This function provides a tool for binding a process and all its threads to a specified processor.

Display Previous Diagnostic Results

This service aid allows a service representative to display results from a previous diagnostic session. When the Display Previous Results option is selected, the user will be able to view up to 25 no trouble found (NTF) and service request number (SRN) results.

This service aid also displays diagnostic log information. The diagnostic log can be displayed in a short version or a long version. The diagnostic log contains information about events logged by a diagnostic session.

This service aid displays the information in reverse chronological order. If more information is available than what can be displayed on the screen, the Page Down and Page Up keys can be used to scroll through the information.

Note

This Service Aid is not available when you load the diagnostics from a source other than a disk drive or from a network.

This information is not from the error log maintained by the operating system. This information is stored in the /var/adm/ras directory.

Display Resource Attributes

This task displays the Customized Device Attributes associated with a selected resource. This task is similar to running the lsattr -E -l resource command.

Display Service Hints

This Service Aid reads and displays the information in the CEREADME file from the diagnostics media. This file contains information that is not in the publications for this version of the diagnostics. It also contains information about using this particular version of diagnostics.

This Service Aid presents a menu if multiple CEREADME files are present in the /usr/lpp/diagnostics/ directory. This allows other non-related CEREADME files to be displayed containing information about unrelated functions.

Use the arrow keys to scroll through the information in the file.

Display Software Product Data

This task invokes SMIT to display information about the installed software and provides the following functions:

List Installed Software
List Applied but Not Committed Software Updates
Show Software Installation History
Show Fix (APAR) Installation Status
List Fileset Requisites
List Fileset Dependents
List Files Included in a Fileset
List File Owner by Fileset

Display System Environmental Sensors (CHRP)

This Service Aid displays the environmental sensors implemented on a CHRP system. The information displayed is the sensor name, physical location code, literal value of the sensor status, and the literal value of the sensor reading.

Note

Runs on CHRP systems units only.

The sensor status can be any one of the following:

Normal
The sensor reading is within the normal operating range.
Critical High
The sensor reading indicates a serious problem with the device. Run diagnostics on sysplanar0 to determine what repair action is needed.
Critical Low
The sensor reading indicates a serious problem with the device. Run diagnostics on sysplanar0 to determine what repair action is needed.
Warning High
The sensor reading indicates a problem with the device. This could become a critical problem if action is not taken. Run diagnostics on sysplanar0 to determine what repair action is needed.
Warning Low
The sensor reading indicates a problem with the device. This could become a critical problem if action is not taken. Run diagnostics on sysplanar0 to determine what repair action is needed.
Hardware Error
The sensor could not be read because of a hardware error. Run diagnostics on sysplanar0 in problem determination mode to determine what repair action is needed.
Hardware Busy
The system has repeatedly returned a busy indication, and a reading is not available. Try the Service Aid again. If the problem continues, run diagnostics, on sysplanar0 in problem determination mode to determine what repair action is needed.

This Service Aid can also be run as a command. The command can be used to list the sensors and their values in a text format, list the sensors and their values in numerical format, or a specific sensor can be queried to return either the sensor status or sensor value.

The command can be run by entering one of the following:

/usr/lpp/diagnostics/bin/uesensor -l | -a
/usr/lpp/diagnostics/bin/uesensor -t token -i index [-v]

Flags

-l	List the sensors and their values in a text format.
-a	List the sensors and their values in a numerical format. For each sensor, the following numerical values are displayed as: <token> <index> <status> <measured value> <location code>
-t token	Specifies the sensor token to query.
-i index	Specifies the sensor index to query.
-v	Indicates to return the sensor measured value. The sensor status is returned by default.

Examples

Display a list of the environmental sensors:

/usr/lpp/diagnostics/bin/uesensor -l

Sensor Token = Fan Speed
Status = Normal
Value = 2436 RPM
Location Code = F1

Sensor Token = Power Supply
Status = Normal
Value = Present and operational
Location Code = V1

Sensor Token = Power Supply
*Status = Critical low
Value = Present and not operational
Location Code = V2

Display a list of the environmental sensors in a numerical list:

/usr/lpp/diagnostics/bin/uesensor -a

3 0 11 87 P1
9001 0 11 2345 F1
9004 0 11 2 V1
9004 1 9 2 V2

Return the status of sensor 9004, index 1:

/usr/lpp/diagnostics/bin/uesensor -t 9004 -i 1

9

Return the value of sensor 9004, index 1:

/usr/lpp/diagnostics/bin/uesensor -t 9004 -i 1 -v

2

Display Test Patterns

This Service Aid provides a means of adjusting system display units by providing displayable test patterns. Through a series of menus the user selects the display type and test pattern. After the selections are made the test pattern is displayed.

Download Microcode

This selection provides a way to update microcode to a device or adapter. Once invoked, a list of resources are available for selection that supports this function. Once a resource is selected, a specific application that supports that function on the resource is invoked. See the description on PDiagAtt for the stanza that is needed to achieve this.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:

diag -T "download"

ESCON Bit Error Rate

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This Service Aid is used to check the bit error rate for an ESCON adapter to assure that the link to the host system is functioning properly. To run the ESCON Bit Error Rate Service Aid, the adapter must be connected, configured, and on-line. If the adapter is not configured properly, the Service Aid is not able to check the bit error rate.

Fibre Channel RAID (Device Specific)

The Fibre Channel RAID Service Aids contain the following functions:

Certify LUN
This selection reads and checks each block of data in the LUN. If excessive errors are encountered the user is notified.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
```
diag -T "certify"
```
Certify Spare Physical Disk
This selection allows the user to certify (check the integrity of the data) on drives designated as spares.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "certify"
Format Physical Disk
This selection is used to format a selected disk drive.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "format"
Array Controller Microcode Download
This selection allows the microcode on the Fibre Channel RAID controller to be updated when required.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "download"
Physical Disk Microcode Download
This selection is used to update the microcode on any of the disk drives in the array.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "download"
Update EEPROM
This selection is used to update the contents of the EEPROM on a selected controller.
Replace Controller
Use this selection when it is necessary to replace a controller in the array.

Flash SK-NET FDDI Firmware

This task allows the Flash firmware on the SysKonnect SK-NET FDDI adapter to be updated.

Format Media

The Format Media task supports the selection of diskettes, SCSI hardfiles, or SCSI optical media.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "format"

Generic Microcode Download

This Service Aid provides a means of executing a "generic" script from a diskette. The intended purpose for this "generic" script is to load microcode to a supported resource. This script is responsible for executing whatever program is required in order to download the microcode onto the adapter or device.

This Service Aid is supported in both concurrent and standalone modes from disk, LAN, or removable media.

On entry, the Service Aid displays information about what it does. It then asks for a "Genucode" diskette to be inserted into the diskette drive. The diskette must be in tar format. The Service Aid then restores the script file, "genucode", to the /tmp directory. Then the script is executed. The script must at that point then pull off any other needed files from the diskette. The script should then exec whatever program is necessary in order to perform its function. On completion, a status code is returned, and the user is returned to the Service Aid.

The genucode script should have a #!/usr/bin/ksh line at the beginning of the file. Return status of 0 should be returned if the program was successful, else a non-zero status should be returned.

Hot Plug Task

This Service Aid allows the user to choose a SCSI device or location from a menu and to identify a device, located in a 7027 system unit.

The Service Aid also does the following:

Generates a menu displaying all SCSI devices.
Lists the device and all of it's sibling devices.
List all SCSI adapters and their ports.
List all SCSI devices on a port.

Local Area Network Analyzer

This selection is used to exercise the LAN communications adapters (Token-Ring, Ethernet, and (FDDI) Fiber Distributed Data Interface). The following services are available:

Connectivity testing between two network stations
Data is transferred between the two stations. This requires the user to input the Internet Addresses of both stations.
Monitoring ring (Token-Ring only)
The ring is monitored for a period of time. Soft and hard errors are analyzed.

PCI RAID Physical Disk Identify

This selection identifies physical disks connected to a PCI SCSI-2 F/W RAID adapter.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:

diag -T "identify"

Periodic Diagnostics

This selection provides a tool for configuring periodic diagnostics and automatic error log analysis. A hardware resource can be chosen to be tested once a day, at a user specified time. If the resource cannot be tested because it is busy, error log analysis is performed. Hardware errors logged against a resource can also be monitored by enabling Automatic Error Log Analysis. This allows error log analysis to be performed every time a hardware error is put into the error log. If a problem is detected, a message is posted to the system console and a mail message sent to the user(s) belonging to the system group with information about the failure such as Service Request Number.

The Service Aid provides the following functions:

Add or delete a resource to the periodic test list
Modify the time to test a resource
Display the periodic test list
Modify the error notification mailing list
Disable or Enable Automatic Error Log Analysis

Process Supplemental Media

Diagnostic Supplemental Media contains all the necessary diagnostic programs and files required to test a particular resource. The supplemental is normally released and shipped with the resource as indicated on the diskette label. Diagnostic Supplemental Media must be used when the device support has not been incorporated into the latest Diagnostic CDROM.

This task processes the Diagnostic Supplemental Media. Insert the Supplemental Media when prompted, then press Enter. After processing has occurred, go to the Resource Selection list to find the resource to test.

Notes:

This task is supported in Standalone Diagnostics only.

Always process and test one resource at a time.

Do not process multiple supplementals at a time.

More information on Diagnostic Supplemental Media can be found at the following link:Diagnostic Supplemental Media.

Run Diagnostics

The Run Diagnostics task invokes the Resource Selection List menu. When the commit key is pressed, Diagnostics are run on all selected resources.

The procedures for running the diagnostics depends on the state of the Diagnostics Run Time Options. See Display or Change Diagnostic Run Time Options section.

Run Error Log Analysis

The Run Error Log Analysis task invokes the Resource Selection List menu. When the commit key is pressed, Error Log Analysis is run on all selected resources.

Save or Restore Hardware Management Policies (CHRP)

Use this Service Aid to save or restore the settings from Ring Indicate Power On Policy, Surveillance Policy, Remote Maintenance Policy and Reboot Policy.

Note

Runs on CHRP systems units only.

Save Hardware Management Policies
This selection writes all of the settings for the hardware management policies to the file:
```
/etc/lpp/diagnostics/data/hmpolicies
```
Restore Hardware Management Policies
This selection restores all of the settings for the hardware management policies from the contents of the file:
```
/etc/lpp/diagnostics/data/hmpolicies
```

This Service Aid may be accessed directly from the command line, by entering:

/usr/lpp/diagnostics/bin/uspchrp -a

Save or Restore Service Processor Configuration (RSPC)

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

Use this Service Aid to save or restore the Service Processor Configuration to or from a file. The Service Processor Configuration includes the Ring Indicator Power On Configuration.

Note

Supported on RSPC system units only.

Save Service Processor Configuration
This selection will write all of the settings for the Ring Indicate Power On and the Service Processor to the file:
```
/etc/lpp/diagnostics/data/spconfig
```
Restore Service Processor Configuration
This selection will restore all of the settings for the Ring Indicate Power On and the Service Processor from the file:
```
/etc/lpp/diagnostics/data/spconfig
```

SCSD Tape Drive Service Aid

This Service Aid provides a means to obtain the status or maintenance information from a SCSD tape drive. Only some models of SCSI tape drive are supported.

The Service Aid provides the following options:

Display time since a tape drive was last cleaned.
The time since the drive was last cleaned is displayed onto the screen. In addition, a message whether the drive is recommended to be cleaned is also displayed.
Copy a tape drive's trace table.
The trace table of the tape drive is written to diskettes.
The required diskettes must be formatted for DOS. Writing the trace table may require several diskettes. The actual number of required diskettes is determined by the Service Aid based on the size of the trace table. The names of the data files are of the following format:

TRACE[X].DAT

where X is the sequential diskette number. The complete trace table consists of the sequential concatenation of all the diskette data files.
Display or copy a tape drive's log sense information.
The Service Aid provides options to display the log sense information onto the screen, to copy it to a DOS formatted diskette or to copy it to a file. The file name LOGSENSE.DAT is used when the log sense data is written on the diskette. The Service Aid prompts for a file name when the log sense data is chosen to be copied to a file.

SCSI Bus Analyzer

This Service Aid provides a means to diagnose a SCSI Bus problem in a free-lance mode.

To use this Service Aid, the user should have an understanding of how a SCSI Bus works. This Service Aid should be used when the diagnostics cannot communicate with anything on the SCSI Bus and cannot isolate the problem. Normally the procedure for finding a problem on the SCSI Bus with this Service Aid is to start with a single device attached, ensure that it is working, then start adding additional devices and cables to the bus ensuring that each one works. This Service Aid works with any valid SCSI Bus configuration.

The SCSI Bus Service Aid transmits a SCSI Inquiry command to a selectable SCSI Address. The Service Aid then waits for a response. If no response is received within a defined amount of time, the Service Aid displays a timeout message. If an error occurs or a response is received, the Service Aid then displays one of the following messages:

The Service Aid transmitted a SCSI Inquiry Command and received a valid response back without any errors being detected.
The Service Aid transmitted a SCSI Inquiry Command and did not receive any response or error status back.
The Service Aid transmitted a SCSI Inquiry Command and the adapter indicated a SCSI bus error.
The Service Aid transmitted a SCSI Inquiry Command and an adapter error occurred.
The Service Aid transmitted a SCSI Inquiry Command and a check condition occurred.

When the SCSI Bus Service Aid is entered a description of the Service Aid is displayed.

Pressing the Enter key displays the Adapter Selection menu. This menu allows the user to enter which address to transmit the SCSI Inquiry Command.

When the adapter is selected the SCSI Bus Address Selection menu is displayed. This menu allows the user to enter which address to transmit the SCSI Inquiry Command.

Once the address is selected the SCSI Bus Test Run menu is displayed. This menus allows the user to transmit the SCSI Inquiry Command by pressing the Enter key. The Service Aid then indicates the status of the transmission. When the transmission is completed, the results of the transmission are displayed.

Notes:

A Check Condition can be returned when there is nothing wrong with the bus or device.
The operating system does not allow the command to be sent if the device is in use by another process.

Service Aids for use with Ethernet

Attention: This diagnostic task has been removed in AIX 5.2. The information has been retained for reference only.

This selection provides a tool for diagnosing Ethernet problems. This Service Aid is used to exercise the Ethernet adapter and parts of the Ethernet network. The Service Aid works by transmitting a data block to itself. This Service Aid works with a wrap plug or with any valid Ethernet network and can be used as a tool to diagnose Ethernet network problems.

When the Ethernet Service Aid is executed, one of the following messages is returned:

No errors occurred.
An adapter error occurred.
A transmit time-out occurred.
A transmit error occurred.
A receive time-out occurred.
A receive error occurred.
A system error occurred.
Receive and transmit data did not match.
An error occurred that could not be identified.
The configuration indicates that there are no Ethernet adapters in this system unit.
Another application is currently using the adapter.
The resource could not be configured.

Spare Sector Availability

This selection checks the number of spare sectors available on the optical disk. The spare sectors are used to reassign when defective sectors are encountered during normal usage or during a format and certify operation. Low availability of spare sectors shows that the disk needs to be backed up and replaced. Formatting the disk does not improve the availability of spare sectors.

This task may be run directly from the command line. The following usage statement describes the syntax of the fastpath command:

Usage:
diag -T "chkspares"

SSA Service Aids

This Service Aid provides tools for diagnosing and resolving problems on SSA attached devices. The following tools are provided:

Set Service Mode
Link Verification
Configuration Verification
Format and Certify Disk

Update Disk Based Diagnostics

This Service Aid allows fixes (APARs) to be applied.

This task invokes the SMIT Update Software by Fix (APAR) task. The task allows the input device and APARs to be selected. Any APAR can be installed using this task.

Update System Flash (RSPC)

This selection updates the system flash for RSPC systems.

The user provides a valid binary image either on diskette or qualified path name. The diskettes can be in DOS or a backup format.

The flash update image is copied to the /var file system. If there is not enough space in the file system for the flash update image file, an error will be reported. If this occurs, increase the file size of the /var file system. The current flash image is not saved. The command automatically removes the /var/update_flash_image.

After user confirmation, the command will reboot the system twice to complete the flash update.

Note

Supported on RSPC system units only.

Update System or Service Processor Flash (CHRP)

This selection updates the system or service processor flash for CHRP system units.

Further update and recovery instructions may be provided with the update. It is necessary to know the fully qualified path and file name of the flash update image file that was provided. If the flash update image file is on a diskette, the Service Aid can list the files on the diskette for selection.

Refer to the update instructions, or the system unit's service guide to determine the level of the system unit or service processor flash.

Note

Runs on CHRP system units only.

When run from online diagnostics, the flash update image file is copied to the /var file system. If there is not enough space in the /var file system for the flash update image file, an error is reported. If this occurs, exit the Service Aid, increase the size of the /var file system and retry the Service Aid. After the file is copied, a warning screen asks for confirmation to continue the update flash. Continuing the update flash reboots the system. The system does not return to diagnostics. The current flash image is not saved. After the reboot, the /var/update_flash_image can be removed.

When running from standalone diagnostics, the flash update image file is copied to the file system from diskette. The user needs to provide the image on a diskette since the user does not have access to remote file systems or any other files that are on the system. If enough space is not available, an error is reported stating additional system memory is needed. After the file is copied, a warning screen asks for confirmation to continue the update flash. Continuing the update flash reboots the system. The current flash image is not saved.

The update_flash command can be used in place of this Service Aid. It is located in the /usr/lpp/diagnostics/bin directory.

Attention: The update_flash command reboots the entire system. Do not use this command if more than one user is signed onto the system.

7135 RAIDiant Array Service Aid

The 7135 RAIDiant Array Service Aids contain the following functions:

Certify LUN
This selection reads and checks each block of data in the LUN. If excessive errors are encountered the user is notified.
Certify Spare Physical Disk
This selection allows the user to certify (check the integrity of the data) on drives designated as spares.
Format Physical Disk
This selection is used to format a selected disk drive.
Array Controller Microcode Download
This selection allows the microcode on the 7135 controller to be updated when required.
Physical Disk Microcode Download
This selection is used to update the microcode on any of the disk drives in the array.
Update EEPROM
This selection is used to update the contents of the EEPROM on a selected controller.
Replace Controller
Use this selection when it is necessary to replace a controller in the array.