The following items are used to isolate problems in the SP Switch component of PSSP. More detailed information about each item appears in Error information.
The /usr/lpp/ssp/css/css.snap script collects log, trace, and dump information that are created by SP Switch support code (device driver, worm, fault-service daemon, diagnostics) into a single compressed package.
The complete package output file is: /var/adm/SPlogs/css/hostname.yymmddhhmmss.css.snap.tar.Z where hostname is the host name of the node where css.snap was issued, and yymmddhhmmss is the date and time that the css.snap information was collected.
The css.snap script creates a log file, /var/adm/SPlogs/css/css.snap.log where all the files gathered in the package are listed.
This script is called whenever a serious error is detected by the switch support code. To directly cause the system to create such a snapshot, login the desired node and manually issue the command:
/usr/lpp/ssp/css/css.snap [-c | -n | -s ]
where
This table shows the error log entries that automatically take a snapshot,
as well as the type of snap performed. The soft type enables a
continuation of work with the switch. The full snap
might corrupt the adapter, forcing an adapter reset and the node to
be fenced off of the switch.
Table 24. AIX Error Log entries that invoke css.snap - SP Switch
Collect the css.snap information from both the primary node and all nodes that are experiencing SP Switch problems. Do not reboot the nodes before running css.snap, because rebooting causes the loss of valuable diagnostic information.
The css.snap collects all the files which reside in the /var/adm/SPlogs/css directory, some additional files from the /tmp directory, and the last available stack file from the /var/adm/ffdc/stacks directory. Some of the files reside in each node, while others reside only on the primary nde or on the control workstation.Table 25 lists important files gathered by css.snap, and their location at css.snap time. Some of the files are created by the css.snap in order to gather concurrent information on the switch status.
The list of files collected by css.snap and their location
is given in this table:
Table 25. SP Switch log files and their location
Number | Log file name | Contents | Location |
---|---|---|---|
1 | cable_miswire | node-to-switch or switch-to-switch miswired connection
information.
For more information, see cable_miswire. | primary node |
2 | cable_miswire.old | previous miswire information.
For more information, see cable_miswire. | primary node |
3 | core | fault service daemon dump file. | nodes |
4 | cssadm2.debug | Trace of cssadm daemon.
For more information, see cssadm.debug. | control workstation |
5 | cssadm2.stderr | Unexpected error messages received by the cssadm daemon.
For more information, see cssadm.stderr. | control workstation |
6 | cssadm2.stdout | Unexpected informational messages received by the cssadm
daemon.
For more information, see cssadm.stdout. | control workstation |
7 | css_dump.out | most recent css_dump -r. SP switch adapter device driver
trace buffer dump file.
For more information, see css_dump.out. | nodes |
8 | css.snap.log | css.snap snapshot script information.
A list of all files gathered in the last snapshot. | nodes |
9 | CSS_test.log | Present if CSS_test was run on the node.
See CSS_test.log and Verify software installation. For more information about the CSS_test command, see PSSP: Command and Technical Reference. | nodes |
10 | daemon.stderr | fault service daemon output file (stderr).
For more information, see daemon.stderr. | nodes |
11 | daemon.stdout | fault service daemon output file (stdout).
For more information, see daemon.stdout. | nodes |
12 | dist_topology.log | system error messages occurring during the distribution of the topology
file to the nodes.
For more information, see dist_topology.log. | primary node |
13 | dtbx.failed.trace | SP Switch adapter diagnostics last failed run information.
For more information, see dtbx.failed.trace. | nodes |
14 | dtbx.trace | SP Switch adapter diagnostics messages.
For more information, see dtbx.trace. | nodes |
15 | Eclock.log | Eclock related information | control workstation |
16 | Ecommands.log | log entries of all Ecommands.
For more information, see Ecommands.log. | control workstation |
17 | errpt.out | most recent errpt -a and errpt results.
See errpt.out. For more information about the errpt command, see AIX Command and Technical Reference. | nodes |
18 | flt | hardware error conditions found on the SP Switch, recovery action taken
by the fault service daemon, and general operations that alter the SP Switch
configuration.
For more information, see flt. | nodes |
19 | fs_daemon_print.file | fault service daemon status information.
For more information, see fs_daemon_print.file. | nodes |
20 | fs_dump.out | most recent fs_dump -r. Fault service kernel extension
trace buffer dump file.
For more information, see fs_dump.out. | nodes |
21 | logevnt.out | Log error log events monitored by ha.
For more information, see logevnt.out. | nodes |
22 | netstat.out | current netstat -I css0 and netstat -m
information. Network status information.
See netstat.out. For more information about the netstat command, seeAIX Command and Technical Reference. | nodes |
23 | out.top | SP switch link information.
For more information, see out.top. | nodes |
24 | rc.switch.log | fault service daemon initialization information.
For more information, see rc.switch.log. | nodes |
25 | regs.out | most recent read_regs. SP Switch adapter registers dump
file.
For more information, see regs.out. | nodes |
26 | router.log | SP Switch routing information
For more information, see router.log. | nodes |
27 | scan_out.log | TBIC scan ring binary information.
For more information, see scan_out.log and scan_save.log. | nodes |
28 | scan_save.log | previous TBIC scan ring binary information.
For more information, see scan_out.log and scan_save.log. | nodes |
29 | stack file:
fault_service_Worm_RTG_SP .PID.date.time | Customer Service related detailed error information with links between flow related errors. This is a binary format file. | nodes |
30 | spd.trace | Tracing of advanced switch diagnostics. See spd.trace. | control workstation |
31 | spdata.out | most recent css.snap's splstdata dump
file. SP data requests.
For more information, see spdata.out. |
|
32 | summlog.out | Error information from the css.summlog daemon.
For more information, see summlog.out. | control workstation |
33 | tb_dump.out | most recent `tb3dump, tb3mxdump, or tb3pcidump
according to the adapter card type: TB3, TB3MX or TB3PCI. Adapter
memory dump file.
For more information, see css_dump.out. | nodes |
34 | vdidl.out | most recent `vdidl3, vdidl3mx, or vdidl3pci
according to the adapter card type: TB3, TB3MX or TB3PCI. Fault
service kernel extension IP services dump file.
For more information, see vdidl.out. | nodes |
35 | worm.trace | switch initialization information.
For more information, see worm.trace. | nodes |
The files ending in .out are produced by running the appropriate command to dump internal (in memory) trace information or dump data to a file.
The css.snap command avoids filling up the /var directory by following these rules:
The css.snap command is called automatically from the fault service daemon when certain serious errors are detected. It can also be issued from the command line when a switch or adapter related problem is suspected. See css.snap package.