# @(#)53 1.4 src/ssa/usr/lib/methods/ssareadme/README.ssadiags, ssacmds, ssa43D 9/5/97 11:23:44 # # COMPONENT_NAME: mayaix # # FUNCTIONS: SSA Commandline Diagnostics Readme # # ORIGINS: 27 # # (C) COPYRIGHT International Business Machines Corp. 1995 # All Rights Reserved # Licensed Materials - Property of IBM # # US Government Users Restricted Rights - Use, duplication or # disclosure restricted by GSA ADP Schedule Contract with IBM Corp. # ################################################################### README for the SSA command line diagnostics. There are several command line programs in this directory. These commands have been provided to allow some of the functions which are available in the SSA service aids to be accessed from the command line. The commands are very simple and intended to be used primarily from within shell scripts. They do not have extensive error checking and error messages. If these are required, the SSA service aids should be used. In general the command will print a usage string if the syntax is incorrect but will not print a message if the command fails. If the command executes without error the return code will be 0. If there is an error, the return code will be a value other than 0. ------------------------------------------------------------------------ ssa_certify command Purpose Certifies the device. This means that the data can be read from or written to the device with out any problem. Syntax ssa_certify -l pdisk Description This command certifies the disk using the ISAL_Read/Write/ Characteristics commands. It returns 0 unless there is some non media related problem. In which case it prints a message to stderr. In the case of a media related problem, it attempts to re-assign soft error blocks. If this fails or if the block has a hard media error then it returns 0 but prints to stdout the LBA of the failing block in hex, followed by the word "Failed". For instance >ssa_certify -l pdisk4 0x436587676 Failed. If the certify operation is sucessful, ssa_certify returns no output. Flags -l Pdisk Specifies the pdisk which the user wants to certify. ------------------------------------------------------------------------ ssa_diag command Purpose To Run Diagnostic style tests to a specified device. [Syntax, Description] ssa_diag is found in /usr/lpp/diagnostics/bin and is invoked: ssa_diag -l pdiskX or ssa_diag -l ssaX additional parameters are: [-a] : which causes the adapter to be reset if it is an adapter being tested. (This has no effect on a disk test) [-u] : which forces a disk reservation to be broken, if it is a disk which is being tested. (This has no effect on an adapter test) [-s] : This can only be used with a disk device, and requests the output of the power status. (This flag cannot be used with the -a or -u flag) Output if an error is detected, then a message such as: ssa0 SRN 42500 will be sent to stdout. If there is no problem, then there is no message sent to stdout. A non-zero return code indicates an error, for which output will be sent to stderr. Power Status output (to stdout) is as follows: pdisk0 0 which means pdisk0 Power good pdisk0 1 which means pdisk0 Lost Redundancy pdisk0 2 which means pdisk0 Failed ------------------------------------------------------------------------ ssa_ela command Purpose scans the error log for the most significant error. [Syntax, Description] ssa_ela -l Device [-h timeperiod] This will scan the error log for all SSA errors and return the SRN for the most significant one. ssa_ela -l pdisk This scan's the error log for errors logged against the given pdisk. It returns the most significant SRN for that pdisk. ssa_ela -l hdisk This scans's the error log for errors logged against any hardware which supports this hdisk (pdisks and adapters). It reports the most significant. ssa_ela -l adapter This scan's the error log for errors logged against the specified adapter. Flags -l Device Specifies the device which the user wants to analyse error log for the most significant error. -h timeperiod Instructs the program to start searching the error log from a point in time equivalent to going back timeperiod * 24 Hours. The default is 1, which means search over the last 24 Hours. Setting -h 2, would cause the search to start from a point 48 Hours ago. Output if an error is detected, then a message such as: ssa0 SRN 42500 will be sent to stdout. If there is no problem, then there is no message sent to stdout. A non-zero return code indicates an error, for which output will be sent to stderr. ------------------------------------------------------------------------ ssa_format command Purpose Formats the device Syntax ssa_format -l pdisk or ssa_format -l SSA_Adapter Description The pdisk special file will be opened and the ISAL format command used to format the device. The disk can be closed whilst the format is still in progress. If it cannot format the device, it prints an appropriate error message. If the device specified is an adapter, the program will attempt to format the Fast-Write Cache on the card. This function zeroes all data in the cache card (for security purposes), provided that is has been destaged to disk. The possible failure conditions for this operation include: 1) There is no Fast-Write Cache on the card or the Fast-Write Cache is not empty (Data waiting to be written out) Message is "Cannot be formatted because it is not empty" 2) The Card does not support fast write cache. Message is "This Adapter cannot be formatted" Flags -l Pdisk Specifies the pdisk which the user wants to format. -l SSA_Adapter Specifies the adapter which the user wants to format. Output All error messages are output to stderr ------------------------------------------------------------------------ ssa_getdump Command Purpose To Display SSA Adapter Dump locations, and save the dump to a specified location. Syntax The List version of the command is like this: ssa_getdump -l [-h] [-d pdiskxx] [-a AdapterName | -n AdapterUID | -s SlotNumber ] The copy command line is like this: ssa_getdump -c [-h] -d pdiskxx { -a AdapterName | -n AdapterUID | -s SlotNumber } [-x] -o OutputFile Description: The ssa_getdump command has 2 modes of operation: List and Copy. In the list [-l] mode the ssa_getdump command is used to search for adapter dumps on unused SSA disks. The disks are systematically searched and information about the dumps found is output. Example output is below: ADAPTER DUMPS DATE TIME ADAPTER UID DISK SLOT SIZE STATUS SEQ ADAP 961031 10:31:12.123 1234567890ABCDEF pdisk22 12 1.5 4 12345 ssa0 ???_xx 10:32:12.456 234567890ABCDEF1 pdisk22 3 13.5 3 12346 ssa1 961120 10:50:12.123 1234567890ABCDE7 pdisk22 7 1.5 4 12345 The Header Lines are NLS Supported and can be turned off with the [-h] flag. Where possible the adapter UID will be translated into the Adapter Name (ssa0 etc) if this is not possible the last field will be blank. By adding additional optional arguments to the command line, the search can be restricted to specific disks or adapters Note: The program uses space in the /tmp directory when copying out a file. If there is not enough space available it will fail. Be aware that some dumps could be quite large. Flags Required Flags One of the following flags is required -l This specifies that the program is to operate in List mode, and causes the program to search for dumps. -c This specifies that the program is to operate is Copy mode and the program will copy the dump (if one is found) from the location specified to the output point given Required Flags (Copy Mode) -d pdiskxx This specifies the disk to be copied from(e.g. pdisk2) One of these flags is to be used -a AdapterName This specifies which adapter name to search for. (e.g. ssa1) The adapter must be known to the searching machine -n AdapterUID This specifies which adapterUID to search for The adapter does not need to be known to the searching machine -s SlotNumber This is the disk slot location, as given in the List Output Finally the output destination must be given -o OutputFile This is where the "tar" command will write it's output Optional Flags (Copy Mode) -h This Flag suppresses the output of the progress messages from the program -x This flag supresses the actions of "compress" and "tar", copying the dump directly to the output point specified (-o). Note the user must ensure that the specified destination has enough free space to hold the dump. Optional Flags (List Mode) In List mode the following can be used -h This prevents the Header lines being displayed, (useful for scripts) -d pdiskxx This allows the specification of the disk to be searched, and so cuts down the scope of the search Only One of these flags can be used -a AdapterName This specifies which adapter name to search for. (e.g. ssa1) The adapter must be known to the searching machine -n AdapterUID This specifies which adapterUID to search for The adapter does not need to be known to the searching machine -s SlotNumber This is the disk slot location, as given in the List Output Notes In the copy [-c] mode the user must specify the disk and location to be copied from, and an output location to copy to. The output file will first be copied to a temporary location, (/tmp/IBMssadump/) and then compressed. A "system" call to the "compress" utility will be used. The compressed file will then be copied to the output file using the "tar" utility The temporary location will be in /tmp The program will create a subdirectory IBMssadump (mkdir /tmp/IBMssadump) The Data will be read from disk and written to a file in /tmp/IBMssadump. The Temporary file name will be made up from the pdisk name, the slot number and the Time Stamp in MilliSeconds. The extension .dmp will be added After the "tar" has completed the temporary file and the directory IBMssadump will be deleted. The File saved by the tar command is from a relative location, and includes a subdirectory header of "IBMssadump", so the tar output will look like this: (machine1 is the archive name) tar -tvf machine1 -rw-r--r-- 0 0 773505 Nov 19 14:48:49 1996 ./IBMssadump/pdisk2_22_643.dmp.Z Output Error messages are sent to stderr. Header messages,list mode output, and copy progress messages are sent to stdout, Return Codes generated, and their meanings: Successful completion returns 0. Incorrect Parameters, return is 1. Invalid diskname or no pdisk returns 2. Incorrect/Invalid ssa adapter name returns 3. Incorrect ssa adapterUID/SlotNumber returns 4. Unable to open file/directory in /tmp returns 5. Insufficient disk space or error writing temporary file returns 6. Insufficient Memory returns 7. Note: When in copy mode the program reads from disk in blocks of around 256Kbytes. For Internal / ODM errors the return code is 8. If an error is encountered whilst reading in copy mode then 9 is returned ------------------------------------------------------------------------ ssa_progress command Purpose Prints out the percentage complete of the format and the status. The status is either "Complete" "Formatting" or "Failed". Syntax ssa_progress -l pdisk Description The pdisk special file will be opened and the ISAL progress command used to determine the percentage complete. Examples 1] A disk which is 30% formatted: > ssa_progress -l pdisk Formatting 30 2] A disk which is not formatting and is not format degraded. > ssa_progress -l pdisk Complete 100 3] A disk which is format degraded. > ssa_progress -l pdisk Failed 0 Flags -l Pdisk Specifies the pdisk which the user wants to determine the percentage complete of the format and the status. Output Error messages are sent to stderr Progress messages are sent to stdout ------------------------------------------------------------------------ ssa_rescheck command Purpose To Report on the reservation status of an hdisk Syntax ssa_rescheck -l hdisk [-h] Description This command tests the access paths to the given hdisk, and reports on what it finds. Specifically, it looks to see if there is a reservation on the disk and if there is it tries to determine who the disk is reserved to. Usage Examples command line : ssa_rescheck -l hdisk1 produces output like: Disk Primary Secondary Adapter Primary Secondary Reserved Adapter Adapter In Use Access Access to hdisk1 ssa0 ==== ssa0 OK ==== none command line : ssa_rescheck -l hdisk1 -h produces output like: hdisk1 ssa0 ==== ssa0 OK ==== none ssa_rescheck firstly interrogates the ODM, to find out the name(s) of the adapter(s) the disk is attached to. Access to the disk via the primary (and secondary) adapter is attempted, the result of this test can be "OK", "Open", "Fail" or "Busy" OK means the disk can be accessed Open means that another program has the disk handle open Fail means that the disk could not be accessed Busy means that the disk is reserved to another adapter or system. for "SSA Enhanced Adapter" Busy means that another adapter has reserved the disk, this adapter may be in the same system, in which case the other adapter will show either OK or Open. for "SSA Enhanced RAID Adapters" Busy means that the disk is reserved. The "Reserved To" field will provide further information The example below shows the disk Open by ssa1, so the disk is reserved to ssa1 and ssa0 gets a Busy status. As the 2 adapters are in the same machine, this must imply that the Node number is not set. Disk Primary Secondary Adapter Primary Secondary Reserved Adapter Adapter In Use Access Access to hdisk2 ssa1 ssa0 ssa1 Open Busy ssa1 The example below shows that the disk is reserved to a node, as the secondary access is OK (not Busy) and the Reserved To field shows the machine name, not an adapter name. Disk Primary Secondary Adapter Primary Secondary Reserved Adapter Adapter In Use Access Access to hdisk2 ssa1 ssa0 ssa1 Open OK fred.somewhere.com The next step is to investigate the Reservation status (Reserved To), This can return "N/A", "none", Adapter name or Adapter UID, Node Number or System Name N/A is returned if the adapter is not able to return Reservation data, (This happens on non-"SSA Enhanced RAID Adapters") The exit code from the program will be 1 in this case. none means the disk is not reserved Adapter name or UID, is displayed if the disk is reserved to a specific adapter Node Number or System Name is shown if the disk is reserved to a specific Node. The Adapter in use field shows which adapter path the system is currently using to access the device. Flags -l hdisk This is the name of the hdisk to test -h This turns off the header output (useful for scripts) Output Error messages are sent to stderr Header information and status output is sent to stdout Return Codes: 0 Program Completed successfully 1 Program had a system error, (usually when the Adapter is a non-"SSA Enhanced RAID Adapter") any other value indicates a more serious error. ------------------------------------------------------------------------ ssa_servicemode command Purpose Place the device(pdisk) into/out of servicemode. Syntax ssa_servicemode -l pdisk [-a AdapterName] (-y|-n) Description This command will open the adapter special file, and send the appropriate IACL command to place the device into/out of service mode and then close the device special file. If the service mode cannot be achieved for any reason the command will print the appropriate error message. Flags -l Pdisk Specifies the pdisk which the user wants to put into/out of service mode. -a AdapterName Specifies the adapter which the disk is connected to. -y Put the pdisk into the service mode. -n Put the pdisk out of service mode. Output All error messages are sent to stderr ------------------------------------------------------------------------ ssavfynn command Purpose To check for duplicated node numbers [Syntax, Description] ssavfynn is found in /usr/lpp/diagnostics/bin and is invoked with no parameters eg ssavfynn Output Error messages are sent to stderr Configuration problem messages are sent to stdout If ssavfynn runs with no message then there is no conflict between the host machine node number and the other node numbers that can be seen on the ssa network. If there is a conflict a message such as: SSA User Configuration Error: Node Number 1 is set on both Local Host `fred.somewhere.ibm.com' and Remote Host `bloggs' This tells us that there is an problem between our machine (fred) and another machine (bloggs) which is connected via the ssa network. The names shown are the DNS names of the machines. Note that ssavfynn is only reliable if the other adapters on the network are SSA Network RAID Adapters. ------------------------------------------------------------------------