When your nodes do not respond or when your system crashes, a system dump may help you determine the cause of the problem. A system dump contains a copy of the kernel data on the system at the time of the crash. This section explains how to produce a dump, verify it, copy the dump to tape, and send the tape to IBM.
In some cases the system produces a dump automatically. If the system senses a fatal condition, it usually dumps automatically to the primary dump device and puts flashing 888 in the node's three-digit display.
Attention |
---|
Do not initiate a system dump if the node's three-digit display is 888. If you initiate a dump, you will overwrite the dump that was taken at the time of the problem. Instead, proceed to Action 2. Verify the system dump. |
There are several ways you can produce a system dump. Some of the methods work with all configurations, and others do not. Each method explained here includes this configuration information.
Notes:
A node can be reset using the command:
/usr/lpp/ssp/bin/spmon -reset
On systems that do not have a key mode switch, the spmon -reset command produces a dump.
A node's key mode switch can be altered using the command:
/usr/lpp/ssp/bin/spmon -key state
where state is either normal, secure, or service.
Choose one of these methods to produce a dump on the primary dump device.
This method works for all systems that have a key switch.
Set the key mode switch to the Service position and press the Reset button once.
This method can only be done from a directly-attached keyboard. It cannot be done from a tty connection. This method works only on the control workstation.
Set the key mode switch to the Service position and, while holding the <Ctrl> and <Alt> keys, press the 1 on the numeric key pad.
This method works for all system configurations, if the system is responding to commands.
sysdumpstart -p
On the SP system, when sysdumpstart is issued on a PCI node, the SPLED display in SP Perspectives indicates stby. Following a stby, power off the node and then back on. DO NOT RESET THE NODE. A reset will lose the dump taken.
This method works for nodes with virtual keys. It produces a dump to the default dump device, as defined by AIX. Issue the following commands from the control workstation, specifying the node's frame and slot.
Choose one of these methods to produce a dump on the secondary dump device.
Note |
---|
If the secondary dump device is a removable media device, such as a tape or diskette drive, make sure that the medium is in the device. |
This method can only be done from a directly-attached keyboard. It cannot be done from a tty connection. This method works only on the control workstation.
Set the key mode switch to the Service position and, while holding down <Ctrl> and <Alt> keys, press the 2 on the numeric key pad.
This method works for all system configurations, if the system is responding to commands.
Login as root and enter:
sysdumpstart -s
You may have a system dump because you initiated it yourself or because the system produced one automatically. In either case, follow these steps to verify that the system dump was successful and that the information it contains is usable.
Table 6. System dump status codes
sysdumpdev
This should return something like:
primary /dev/hd7 secondary /dev/sysdumpnull
Note the primary dump device name, and substitute it for /dev/hd# in the following steps.
sysdumpdev -L
Output is similar to:
0453-039 Device name: /dev/lv00 Major device number: 10 Minor device number: 10 Size: 67108352 bytes Date/Time: Wed Apr 5 14:52:35 EDT 2000 Dump status: -2 dump device too small
In this case, a 0c4 LCD was in the crash codes, and the dump device was too small. There maybe enough dump information, since 67108352 bytes of information were written to the dump device /dev/lv00. Continue to 6. If no bytes were written, the system was hung and no dump exists. DO NOT continue with these steps.
crash /dev/hd#
This should return:
Using /unix as the default namelist file. Reading in symbols........................
The dump file is not useful. Enter q to quit the crash command. DO NOT continue with these steps and do not send the dump to IBM.
/usr/lib/errdead /dev/hd#
crash /dev/lvxx
where /dev/lvxx is the dump device.
When you see the > prompt, enter:
stat
Output is similar to:
sysname: AIX nodename: journey release: 2 version: 3 machine: 000052643100 time of crash: Sun Jan 24 19:18:53 1993 age of system: 18 day, 1 hr., 29 min.
Now put the dump symptom string information into the error log. Issue the command:
symptom -e
This copies the symptom string into the error log, and can be used to search problem databases for duplicate problems.
trace
Look for a trace report similar to this sample:
STACK TRACE: .m_freem () .soreceive ._recv .recv
Enter q to quit the crash command.
Gather the dump and other snap or log information for the IBM Support Center. Contact your local service representative or call the IBM Support Center to open a Problem Management Record as explained in How to contact the IBM Support Center.