IBM Books

Diagnosis Guide


Information to collect before contacting the IBM Support Center

The following items are used to isolate problems in the SP Switch2 component of PSSP. More detailed information about each item appears in Error information. Before collecting any other information, check for the existance of the /etc/plane.info file, and make a copy of the file if it exists. See plane.info file.

css.snap package

The /usr/lpp/ssp/css/css.snap script collects log, trace, and dump information created by SP Switch2 support code (device driver, worm, fault-service daemon, diagnostics) into a single compressed package.

The complete package output file is in the directory: /var/adm/SPlogs/css. The file name varies according to the options:

where hostname is the hostname of the node where the css.snap command was issued, and yymmddhhmmss is the date and time that the css.snap information was collected.

The css.snap script creates a log file, /var/adm/SPlogs/css/css.snap.log where all the files gathered in the package are listed.

This script is called whenever a serious error is detected by the switch support code. To directly cause the system to create such a snapshot, login the desired node and manually issue the command:

/usr/lpp/ssp/css/css.snap [-c | -n] [-s] -a [css0 | css1] [-p p0]

where

Note:
For the SP Switch2 PCI Adapter, css.snap invokes the css.snap.corsair script. In this case, the output file has the name css.snap.corsair substituted where css.snap appears in the file naming conventions above.

Table 40 shows the error log entries that automatically take a snapshot, as well as the type of snap performed. The soft type enables a continuation of work with the switch. The full snap might corrupt the adapter, forcing an adapter reset and the node to be fenced off of the switch.

Table 40. AIX Error Log entries that invoke css.snap - SP Switch2

Error Log entry Snap type (full/soft)
CRS_OFFLINE_ER full
CRS_RESTART_HWSW_ER full
CS_SW_FIFOOVRFLW_RE soft
CS_SW_RECV_STATE_RE soft
CS_RCVY_START_RE full
CS_SW_INVALID_RTE_RE soft
CS_SW_PE_ON_DATA_RE soft
CS_SW_CQ_PE_NCL_RE soft
CS_SW_PE_RTE_TBL_RE soft
CSS_DD_CFG_ER full
CS_SW_SVC_Q_FULL_RE full

Collect the css.snap information from both the primary node and all nodes that are experiencing SP Switch2 problems. Do not reboot the nodes before running css.snap, because rebooting causes the loss of valuable diagnostic information.

The css.snap script collects all the files which reside in the /var/adm/SPlogs/css, /var/adm/SPlogs/css0, /var/adm/SPlogs/css1, /var/adm/SPlogs/css1/p0, and /var/adm/SPlogs/css0/p0 directories, and some additional files from the /tmp directory. Some of the files reside on each node, while others reside only on the primary node or on the control workstation.

Table 41 lists important files gathered by css.snap and their location at css.snap time. Some of the files are created by css.snap in order to gather concurrent information on the switch status. For the SP Switch2 PCI Adapter, additional files are collected by css.snap.corsair and they are noted after the table.

Log files within css.snap package

Table 41 contains the list of files that are collected by css.snap and their location in the log directory hierarchy.

Table 41. SP Switch2 log files

Number Log File name Hierarchy Contents Location
1 adapter.log adapter Fault service daemon adapter status information. For more information, see adapter.log. nodes
2 cable_miswire port Node-to-switch or switch-to-switch plane miswired connection information. For more information, see cable_miswire. primary node
3 cadd_dump.out adapter Most recent css.snap's cadd_dump command dump file. SP Switch2 adapter device driver trace buffer dump file. For more information, see cadd_dump.out. nodes
4 chgcss.log node Log file of chgcss, which changes the adapter device driver's attributes. For more information, see chgcss.log. nodes
5 col_dump.out adapter The most recent css.snap's col_dump command dump file. Microcode dump information. For more information, see col_dump.out. nodes
6 colad.trace adapter SP Switch2 adapter diagnostics messages. For more information, see colad.trace. nodes
7 core node Fault service daemon core dump file. nodes
8 cssadm2.debug node Trace of cssadm2 daemon. For more information, see cssadm2.debug. control workstation
9 cssadm2.stderr node Unexpected error messages received by the cssadm2 daemon. For more information, see cssadm2.stderr. control workstation
10 cssadm2.stdout node Unexpected informational messages received by the cssadm2 daemon. For more information, see cssadm2.stdout. control workstation
11 css.snap.log node css.snap snapshot command log information - list of all files gathered in the last snapshot. For more information, see css.snap.log. nodes
12 CSS_test.log node Present if the CSS_test command was run on the node.

For more information, see CSS_test.log, Verify software installation and the CSS_test command entry in PSSP: Command and Technical Reference.

nodes
13 daemon.log node Fault service daemon output file. For more information, see daemon.log. nodes
14 DeviceDB.dump port Latest dump of the device data base from the fault service daemon. See DeviceDB.dump. nodes
15 Ecommands.log node Log entries of all Ecommands. For more information, see Ecommands.log. control workstation
16 emasterd.log node TOD Management emasterd daemon - errors and notifications. For more information, see emasterd.log. control workstation
17 emasterd.stdout node TOD Management emasterd daemon - more detailed trace file. For more information, see emasterd.stdout. control workstation
18 errpt.out node Most recent errpt -a and errpt results.

For more information, see errpt.out and the errpt command entry in AIX Command and Technical Reference.

nodes
19 flt port Hardware error conditions found on the SP Switch2, recovery action taken by the fault-service daemon, and general operations that alter the SP Switch2 configuration. For more information, see flt. nodes
20 fs_daemon_print.file port Fault service daemon port status information. For more information, see fs_daemon_print.file. nodes
21 ifcl_dump.out adapter Most recent css.snap's ifcl_dump command dump file. IP dump information. For more information, see ifcl_dump.out. nodes
22 logevnt.out node Log error log events monitored by ha. For more information, see logevnt.out. nodes
23 netstat.out adapter Most recent css.snap's netstat command dump file. Network status information. For more information, see netstat.out, and the entry for the netstat command in AIX Command and Technical Reference. nodes
24 odm.out adapter The node's adapter_status configuration as saved in the ODM. For more information, see odm.out. nodes
25 out.top port SP Switch2 plane link information. For more information, see out.top. nodes
26 rc.switch.log node Fault service daemon initialization information. For more information, see rc.switch.log. nodes
27 rc.switch.log.previous node Node's previous fault service daemon initialization information. For more information, see rc.switch.log. nodes
28 regs.out adapter Most recent css.snap's read_regs command dump file. SP Switch2 adapter's registers dump file. For more information, see regs.out. nodes
29 router.log port SP Switch2 routing information. For more information, see router.log. nodes
30 scan_out.log adapter TBIC scan ring binary information. For more information, see scan_out.log and scan_save.log. nodes
31 scan_save.log adapter Previous TBIC scan ring binary information. For more information, see scan_out.log and scan_save.log. nodes
32 spd.trace port Tracing of advanced switch diagnostics. See spd.trace. control workstation
33 spdata.out port Most recent css.snap's splstdata command dump file. SP Switch2 data requests. For more information, see spdata.out. primary node
34 summlog.out node Error information from the css.summlog daemon. For more information, see summlog.out. control workstation
35 topology.data port System error messages from the distribution of the topology file to the secondary nodes. For more information, see topology.data. primary node

If css.snap is invoked for an SP Switch2 PCI Adapter error, the css.snap.corsair script is invoked. The following log files are also included:

  1. sdr.out, which is the result of issuing these commands:
  2. khal_dump.out which is the result of issuing the khal_dump -Z command.
  3. ifcr_dump.out which is the result of issuing the ifcr_dump -Z command.
  4. var/adm/ffdc/stacks which contains FFDC information.
  5. cssN/lsattr.out which is the result of issuing the lsattr -EL css[0 | 1] command.
  6. css[0 | 1]/cordd_dump.ini and cordd_dump.fin which are the results of issuing the cordd_dump -Z -l css[0 | 1] command. There is one file for the start and one for the end of the command.
  7. css[0 | 1]/ucode.out which is the result of issuing the cordd_dump -Z -a N command.
  8. css[0 | 1]/read_regs.out which is the result of issuing the read_regs -l css[0 | 1] command.
Note:
The files ending in .out are produced by running the appropriate command to dump internal (in memory) trace information or dump data to a file.

Disk space handling policy

The css.snap command avoids filling up the /var directory by following these rules:

  1. If less than 10% of /var is free, css.snap exits.
  2. If the css portion of /var is more than 30% of the total space in /var, css.snap erases old snap files until the css portion becomes less than 30%. If it is successful, the snap proceeds. If not, it exits.

The css.snap command is called automatically from the fault service daemon when certain serious errors are detected. The css.snap command can also be issued from the command line when a switch or adapter related problem is suspected. See css.snap package.

Note:
css.snap uses a number of undocumented utilities to collect information. Some of these can be destructive when used on a running system. After using css.snap with snap type full (see Table 40), it is advisable to run /usr/lpp/ssp/css/rc.switch. This command resets and reloads the switch adapter, and eliminates the residual effects of css.snap.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]