Diagnosis Guide
This section explains how to obtain and record information about your SP
system that you will need when you first problem occurs. You may not
have the time or the means to obtain this information after a failure has
occurred. The best strategy is to prepare this information before a
failure occurs, and to have it handy before investigating possible
problems.
Problem investigation efforts are streamlined considerably by knowing the
characteristics of the SP system at the time that a problem occurs.
This includes what node types are being used, what software is installed on
these nodes, what level of software is installed, what software service is
installed, and so forth.
PSSP: Planning Volume 1 provides guidance in planning
your physical site and selecting your hardware. PSSP:
Planning Volume 2 provides guidance for logically laying out the SP
system structure and the SP administrative network, and selecting your
software.
Examine your SP system, its structure and its software setup, and record
this information in a log. Keep this log in a place where you will
always have access to it, regardless of whatever failure occurs on your
system. To avoid the possibility of losing this log to an online
failure, it is best to keep this log in hardcopy format.
This list is the minimum amount of information to record in the log:
-
Your customer information:
- Your access code, which is your customer number
- Names and phone numbers of people whom the IBM Support Center should
contact to assist you with problem resolution
-
Control workstation Information
- The level of AIX installed
- The PTF numbers for all fixes installed on AIX
- The product number of PSSP on the control workstation. You need to
use this to report a problem.
- The PTF numbers for all fixes installed on PSSP
-
System Partitioning or Cluster information - what nodes are in the system
partition or cluster. Also, what software is installed and active on
that cluster.
- Node information
- The node's number, its frame number and associated slot number, which
can be found by issuing the /usr/lpp/ssp/bin/splstdata -n command
(note: this is a system partition sensitive command.)
- The node's hostname and IP address, which can be found using the
/usr/lpp/ssp/bin/splstdata -a command (note: this is a system
partition sensitive command.)
-
The level of AIX installed on the node
-
The PTF numbers for all fixes installed to AIX
- The product number of PSSP installed, which may be different from that
installed on the control workstation
-
The PTF numbers for all fixes installed to PSSP
- Optional software installed, including version numbers and fixes
- Special hardware characteristics, such as: wide node, network
attached node, twin-tailed DASD
Whenever an actual or suspected failure occurs on the SP system, make an
update to this log. Record symptoms that are noticed at the time of the
failure, and system conditions such as:
- The date and time that the problem was discovered
- The nature of the problem, such as a node halt, an abnormal program
termination, request hang, poor response time, or hardware failure
- What nodes the problem was experienced on
- What software was running on the nodes at the time the problem was
encountered
- What users were using the system at the time that the problem was
encountered
- What actions that you or others took to repair or bypass the problem, and
whether these actions were successful
Recording this information serves several purposes:
- It allows you to recognize recurring problems and to quickly find the
steps needed to resolve or bypass the problem.
- It records details about the conditions that existed in the SP system at
the time of the failure. This information is essential when contacting
the IBM Support Center to report problems.
- It allows you to detect patterns in the occurrence of problems.
Perhaps these problems occur on a regular basis, or whenever a specific
program runs, or when a specific system resource is unavailable or reaching
maximum limits. These patterns are difficult to detect unless
historical data on past failures is available. Having this information
available can assist you and the IBM Support Center in detecting patterns in
these failure conditions.
Using outdated or incomplete information when investigating a failure leads
to wasted time. The wrong information is obtained and analyzed, the
wrong diagnostic procedures are performed, and in some cases an incorrect
solution is applied. This causes problem conditions to remain the same
or become worse. It also may introduce additional problems. To
avoid this wasted effort, be sure to update this log whenever the SP system
structure or setup changes.
Update the log whenever the following occurs:
- New software is installed
-
Upgrades are made to new levels of AIX or PSSP
-
AIX or PSSP PTFs are installed
-
Node hostname or IP addresses change
- Hardware changes, Switch adapter changes
- Problems occur
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]