IBM Books

Diagnosis Guide


Information to collect before contacting the IBM Support Center

The following items are used to isolate problems in the SP Switch component of PSSP. More detailed information about each item appears in Error information.

css.snap package

The /usr/lpp/ssp/css/css.snap script collects log, trace, and dump information that are created by SP Switch support code (device driver, worm, fault-service daemon, diagnostics) into a single compressed package.

The complete package output file is: /var/adm/SPlogs/css/hostname.yymmddhhmmss.css.snap.tar.Z where hostname is the host name of the node where css.snap was issued, and yymmddhhmmss is the date and time that the css.snap information was collected.

The css.snap script creates a log file, /var/adm/SPlogs/css/css.snap.log where all the files gathered in the package are listed.

This script is called whenever a serious error is detected by the switch support code. To directly cause the system to create such a snapshot, login the desired node and manually issue the command:

/usr/lpp/ssp/css/css.snap [-c | -n | -s ]

where

This table shows the error log entries that automatically take a snapshot, as well as the type of snap performed. The soft type enables a continuation of work with the switch. The full snap might corrupt the adapter, forcing an adapter reset and the node to be fenced off of the switch.

Table 24. AIX Error Log entries that invoke css.snap - SP Switch

Error Log entry Snap type (full/soft)
SP_SW_FIFOOVRFLW_RE soft
SP_SW_RECV_STATE_RE soft
SP_SW_INVALID_RTE_RE soft
SP_SW_NCLL_UNINT_RE soft
SP_SW_PE_INBFIFO_RE soft
SP_SW_PE_ON_DATA_RE soft
SP_SW_PE_ON_NCLL_RE soft
SP_SW_PE_RTE_TBL_RE soft
SP_SW_SNDLOSTEOP_RE soft
TB3_CONFIG1_ER full
TB3_LINK_ER full
TB3_PIO_ER soft
TB3_SVC_QUE_FULL_ER full
TB3_THRESHOLD_RE full

Collect the css.snap information from both the primary node and all nodes that are experiencing SP Switch problems. Do not reboot the nodes before running css.snap, because rebooting causes the loss of valuable diagnostic information.

The css.snap collects all the files which reside in the /var/adm/SPlogs/css directory, some additional files from the /tmp directory, and the last available stack file from the /var/adm/ffdc/stacks directory. Some of the files reside in each node, while others reside only on the primary nde or on the control workstation.Table 25 lists important files gathered by css.snap, and their location at css.snap time. Some of the files are created by the css.snap in order to gather concurrent information on the switch status.

Log files within css.snap package

The list of files collected by css.snap and their location is given in this table:

Table 25. SP Switch log files and their location

Number Log file name Contents Location
1 cable_miswire node-to-switch or switch-to-switch miswired connection information.

For more information, see cable_miswire.

primary node
2 cable_miswire.old previous miswire information.

For more information, see cable_miswire.

primary node
3 core fault service daemon dump file. nodes
4 cssadm2.debug Trace of cssadm daemon.

For more information, see cssadm.debug.

control workstation
5 cssadm2.stderr Unexpected error messages received by the cssadm daemon.

For more information, see cssadm.stderr.

control workstation
6 cssadm2.stdout Unexpected informational messages received by the cssadm daemon.

For more information, see cssadm.stdout.

control workstation
7 css_dump.out most recent css_dump -r. SP switch adapter device driver trace buffer dump file.

For more information, see css_dump.out.

nodes
8 css.snap.log css.snap snapshot script information.

A list of all files gathered in the last snapshot.

nodes
9 CSS_test.log Present if CSS_test was run on the node.

See CSS_test.log and Verify software installation. For more information about the CSS_test command, see PSSP: Command and Technical Reference.

nodes
10 daemon.stderr fault service daemon output file (stderr).

For more information, see daemon.stderr.

nodes
11 daemon.stdout fault service daemon output file (stdout).

For more information, see daemon.stdout.

nodes
12 dist_topology.log system error messages occurring during the distribution of the topology file to the nodes.

For more information, see dist_topology.log.

primary node
13 dtbx.failed.trace SP Switch adapter diagnostics last failed run information.

For more information, see dtbx.failed.trace.

nodes
14 dtbx.trace SP Switch adapter diagnostics messages.

For more information, see dtbx.trace.

nodes
15 Eclock.log Eclock related information control workstation
16 Ecommands.log log entries of all Ecommands.

For more information, see Ecommands.log.

control workstation
17 errpt.out most recent errpt -a and errpt results.

See errpt.out. For more information about the errpt command, see AIX Command and Technical Reference.

nodes
18 flt hardware error conditions found on the SP Switch, recovery action taken by the fault service daemon, and general operations that alter the SP Switch configuration.

For more information, see flt.

nodes
19 fs_daemon_print.file fault service daemon status information.

For more information, see fs_daemon_print.file.

nodes
20 fs_dump.out most recent fs_dump -r. Fault service kernel extension trace buffer dump file.

For more information, see fs_dump.out.

nodes
21 logevnt.out Log error log events monitored by ha.

For more information, see logevnt.out.

nodes
22 netstat.out current netstat -I css0 and netstat -m information. Network status information.

See netstat.out. For more information about the netstat command, seeAIX Command and Technical Reference.

nodes
23 out.top SP switch link information.

For more information, see out.top.

nodes
24 rc.switch.log fault service daemon initialization information.

For more information, see rc.switch.log.

nodes
25 regs.out most recent read_regs. SP Switch adapter registers dump file.

For more information, see regs.out.

nodes
26 router.log SP Switch routing information

For more information, see router.log.

nodes
27 scan_out.log TBIC scan ring binary information.

For more information, see scan_out.log and scan_save.log.

nodes
28 scan_save.log previous TBIC scan ring binary information.

For more information, see scan_out.log and scan_save.log.

nodes
29 stack file:

fault_service_Worm_RTG_SP .PID.date.time

Customer Service related detailed error information with links between flow related errors. This is a binary format file. nodes
30 spd.trace Tracing of advanced switch diagnostics. See spd.trace. control workstation
31 spdata.out most recent css.snap's splstdata dump file. SP data requests.

For more information, see spdata.out.


32 summlog.out Error information from the css.summlog daemon.

For more information, see summlog.out.

control workstation
33 tb_dump.out most recent `tb3dump, tb3mxdump, or tb3pcidump according to the adapter card type: TB3, TB3MX or TB3PCI. Adapter memory dump file.

For more information, see css_dump.out.

nodes
34 vdidl.out most recent `vdidl3, vdidl3mx, or vdidl3pci according to the adapter card type: TB3, TB3MX or TB3PCI. Fault service kernel extension IP services dump file.

For more information, see vdidl.out.

nodes
35 worm.trace switch initialization information.

For more information, see worm.trace.

nodes

The files ending in .out are produced by running the appropriate command to dump internal (in memory) trace information or dump data to a file.

Disk space handling policy

The css.snap command avoids filling up the /var directory by following these rules:

  1. If less than 10% of /var is free, css.snap exits.
  2. If the css portion of /var is more than 30% of the total space in /var, css.snap erases old snap files until the css portion becomes less than 30%. If it is successful, the snap proceeds. If not, it exits.

The css.snap command is called automatically from the fault service daemon when certain serious errors are detected. It can also be issued from the command line when a switch or adapter related problem is suspected. See css.snap package.

Note:
css.snap uses a number of undocumented utilities to collect information. Some of these can be destructive when used on a running system. After using css.snap to collect diagnostic information, it is advisable to run /usr/lpp/ssp/css/rc.switch in order to reset and reload the switch adapter and eliminate the residual effects of these utilities.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]