Diagnosis Guide

Dump information

Topology services provides two dumps, a core dump which is created automatically when certain errors occur, and a phoenix.snap dump which is created manually.

There is a core dump generated by the Topology Services daemon. It contains information normally saved by AIX in a core dump: user-space data segments for the Topology Services daemon. It refers to a particular instance of the Topology Services daemon on the local node. Other nodes may have a similar core dump. On PSSP systems, the dump is located in: /var/ha/run/hats.partition_name/core. On HACMP systems, the dump is located in: /var/ha/run/topsvcs.cluster_name/core. An approximate size for the core dump file is between 7 and 10MB.

The dump is created automatically when the daemon invokes an assert() statement, or when the daemon receives a segmentation violation signal for accessing its data incorrectly. Forcing hatsd to generate a dump is necessary, especially if the daemon is believed to be in a hung state. The dump is created manually by issuing the command:

kill -6 pid_of_daemon

The pid_of_daemon is obtained by issuing:

lssrc -s hats on PSSP nodes
lssrc -s hats. partition_name on the PSSP control workstation
lssrc -s topsvcs on HACMP nodes

The dump remains valid as long as the executable file /usr/sbin/rsct/bin/hatsd is not replaced. Only the last three core file instances are kept. The core dumps and the executable should be copied to a safe place. To analyze the dump, issue the command:

dbx /usr/sbin/rsct/bin/hatsd core_file

Good results are similar to the following:

Type 'help' for help.
reading symbolic information ...
[using memory image in core]
 
IOT/Abort trap in evt._pthread_ksleep [/usr/lib/libpthreads.a] at 
0xd02323e0 ($t6) 0xd02323e0 (_pthread_ksleep+0x9c) 80410014 lwz r2,0x14(r1)

Some of the error results are:

This means that the current executable file was not the one that created the core dump.

Type 'help' for help.
Core file program (hatsd) does not match current program (core ignored)
reading symbolic information ...
(dbx)

This means that the core file is incomplete due to lack of disk space.

Type 'help' for help.
warning: The core file is truncated.  You may need to increase the 
ulimit for file and coredump, or free some space on the filesystem.
reading symbolic information ...
[using memory image in core]
 
IOT/Abort trap in evt._pthread_ksleep [/usr/lib/libpthreads.a]
 at 0xd02323e0 0xd02323e0 (_pthread_ksleep+0x9c) 80410014 
 lwz   r2,0x14(r1) (dbx)

phoenix.snap dump

This dump contains diagnostic data used for RSCT problem determination. It is a collection of log files and other trace information used to obtain a global picture of the state of RSCT. The dump is specific to each node. It is located in the /tmp/phoenix.snapOut directory. The dump is created by this command:

/usr/sbin/rsct/bin/phoenix.snap

Note:: The phoenix.snap tool is a service tool and not a PSSP command. The tool is shipped with PSSP 3.2 as is - without documentation. For assistance on using phoenix.snap in a manner other than what is described in this section, contact the IBM Support Center.

The phoenix.snap tool may be run from a node, where it collects data from the node only. The phoenix.snap tool may be run from the control workstation, where it collects data from a list of nodes if the -l flag is used. For small systems, the command should be run from the control workstation with the -l flag to collect data from all the nodes.

For larger systems, see Information to collect before contacting the IBM Support Center for a list of nodes to collect data from. If the -l option is not used, data is collected only from the local node. When this command is run on the control workstation, data is also collected for the Topology Services Group Leader and Group Services Nameserver. See Contents of the phoenix.snap file.

Data about the state of the subsystems, such as output of the lssrc -l command, becomes partially obsolete as soon as the state of the subsystem changes. An example of this is an adapter or node event. Data in the log files collected by phoenix.snap remains valid even after the state of the subsystem changes, but the collected log files will contain data only up to the time when phoenix.snap was run.

Data in /tmp/phoenix.snapOut will not be overwritten when another instance of phoenix.snap is run on the same node. This is because file phoenix.snap.timestamp.tar.Z has a timestamp in the name.

These flags control phoenix.snap:

The -d flag instructs phoenix.snap to save the collected information in a different directory.
The -l flag instructs phoenix.snap to collect data from a list of nodes, specified by each node's hostname, separated by commas. -l ALL collects data from all the nodes.
The -l flag can be used only on the control workstation. When this flag is used, phoenix.snap produces a tar file that contains the collected output from all the requested nodes.

Good results from the phoenix.snap command are indicated by an output similar to the following:

phoenix.snap version 1.4, args: 
 
Determining if there is enough space in /tmp/phoenix.snapOut
 
compressed file will be about 982303 bytes
phoenix.snap requires about 5893820 bytes
Think we have enough space.
######################################################################
Send file /tmp/phoenix.snapOut/phoenix.snap.node3.12211215.out.tar.Z
 to the RS6000/SP service team

Error results are indicated by messages such as:

There is not enough space in /tmp/phoenix.snapOut

which indicates that there is no space to store the resulting data file. In this case, rerun the command using the -d flag to specify another directory, or create additional space in the /tmp directory.

Contents of the phoenix.snap file

The dump is a collection of files archived with the tar command and compressed with the compress command. Some of the files are copies of daemon log files, and some are the output of certain commands.

This is a partial list of the data items collected:

Output of these AIX commands:
- LANG=C errpt -a
- ps -edf
- lslpp -L
- lssrc -a
- vmstat
- vmstat -s
- netstat (several options)
- ifconfig (several adapters)
- no -a
- netstat interface_name
Data obtained from the SDR:
- hats.machines.lst SDR file
- hats.machines.inst SDR file
- Syspar_ports object
- TS_Config object
- GS_config object
- host_responds object
- /etc/SDR_dest_info file
Log files and run directory for:
- Topology Services
- Group Services
Output of component-specific commands, such as:
- lssrc -l
- hagsgr
- hagsvote

The output of the phoenix.snap command is in English. Many of the logs collected by the program are in English. Some will be in the node's language.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

Diagnosis Guide

Dump information

Core dump

phoenix.snap dump

Contents of the phoenix.snap file