System Crashes in AIX 4.1 and 4.2


Contents

About this document
Collecting the system dump
Submitting the testcase
Forcing a system dump

About this document

This document describes the steps necessary for collecting and sending system dumps in AIX Versions 4.1 and 4.2. The section "Collecting the system dump" discusses dump collection, and "Submitting the testcase" describes the various methods of sending testcases. The section "Forcing a system dump" provides information if a valid dump does not result from the initial collection procedure.


Collecting the system dump

For best results, have a tape drive connected to your system before proceeding with the following steps.

  1. Determine if your system has an LED display and proceed accordingly.

    System with LED display

    1. Turn the key to service mode.
    2. Wait for 888 to flash.
    3. Write down LED each time Reset is pressed until 888 displays again. These codes will be required during phone contact with AIX support personnel.
    4. Power off the system and move to step 2.

    System with LED or non-LED display without key switch or reset button, AIX 4.1.4 and later.

    1. If there is no disk activity, press Ctrl Alt 1 (on the number pad).
    2. Wait for disk activity to stop.
    3. Power off the system and move to step 2.

  2. If you have not already done so, connect a tape drive to the system. Power the system on.

    Unless the default dump configuration has been modified with the sysdumpdev command, the dump will be copied to /var/adm/ras/vmcore.x when the system is powered on.

    NOTE: If /var is too small to hold the dump, the system will prompt the user to copy the dump to external media such as tape or pre-formatted diskettes. If a tape drive is not connected, the system will prompt for diskettes. Using diskettes is NOT a recommended method of collecting a dump. If you are unable to save the dump, IBM will not be able to determine what caused your system to crash.

  3. If the dump to /var/adm/ras fails, answer Yes when prompted to collect the dump to external media. Log in as root and then issue the command:
        sysdumpdev -L 
    

    Check the sysdumpdev -L output for a valid time-stamp. If the time-stamp matches an approximate time of the system crash, then go to the next section. If the time-stamp does not match OR if the sysdumpdev command does not result in any output, a valid dump was not captured at this time.


Submitting the testcase

  1. Decide which method to use to send the testcase.

    NOTE: Overnight and internet submissions are recommended. VM is not a preferred shipment method.

  2. To send a testcase via overnight mail, do the following:

    1. Place a blank tape in the tape drive. This step is applicable for users who copied the dump to tape or diskettes as well as users who copied the dump to a file in /var/adm/ras.

      If you are sending the dump on tape, acceptable media are 8mm, 4mm or 1/4" QIC tape.

    2. Run:
          /usr/sbin/snap -gfkD -o /dev/rmt#. 
      

      This copies the unix file image and the error log needed for a complete dump analysis.

      NOTE: In the previous instructions, replace /dev/rmt# with the device name for your tape drive (usually /dev/rmt0).

    3. Verify that the medium contains at least the following files to analyze a system crash or hang:
          tar -tvf /dev/rmt# 
      

      In the output, there should be three lines that look similar to the following:

      -rw-r--r--   1 user    group      36243456 Feb 21 10:05 dump.Z 
      -rw-r--r--   1 user    group           176 Feb 20 15:13 dump.snap 
      -rw-r--r--   1 user    group        933536 Feb 20 15:13 unix.Z 
      

      NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 453088 bytes and unix.Z = 1270849 bytes.

    4. Ship the testcase per the following shipping instructions.

      1. Label the media.
            <customer name> 
            <pmr #, branch #> 
             tape block_size = <xxx> 
            <the command used to copy the information to tape> 
        

        NOTE: The tape block size can be obtained with the command:

            lsattr -El rmt# 
        

        The copy command is generally snap.

        IMPORTANT: If the person who is sending the testcase is NOT the person who reported the problem, be sure to include the name of the person who reported it. If the proper information is not on the package, then process delays will occur.

      2. Send the media to:

        IBM Corp. / Zip 2900 / Bldg 42
        Attn: AIX Testcase Dept. J66S
        11400 Burnet Road
        Austin, TX 78758-3493
        Extension 3-4050

        NOTE: If you specify Saturday delivery, you must first make special arrangements with an AIX Support specialist. Otherwise, there could be a delay of several days.

  3. If you are sending the testcase by ftp, proceed according to how you collected the dump:
  4. Copy the dump information from the tape to the hard disk.

    1. Find the size of the dump with the command:
          sysdumpdev -L 
      
    2. To see the amount of free space available in /tmp, run the command:
          df /tmp 
      

      If space is not available to contain the dump, then increase the size of /tmp.

    3. Collect relevant system dump information, such as the unix file image and error logs, by running:
          snap -gfkD 
      
    4. Copy the dump from the tape to the hard disk:
          cd /tmp/ibmsupt/dump 
          tar -xf/dev/rmt# 
      
    5. Create a compressed tar image of the dump and the snap output:
          snap -c 
      
    6. Verify that the tar file contains at least the following files to analyze a system crash or hang:
          zcat /tmp/ibmsupt/snap.tar.Z | tar -tvf - 
      

      In the output, there should be three lines that look similar to the following:

      -rw-r--r--   1 user    group      36243456 Feb 21 10:05 dump.Z 
      -rw-r--r--   1 user    group           176 Feb 20 15:13 dump.snap
      -rw-r--r--   1 user    group        933536 Feb 20 15:13 unix.Z 
      

      If so, go to step 6.

    NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 453088 bytes and unix.Z = 1270849 bytes.

  5. Verify that the compressed file contains at least the following files to analyze a system crash or hang:
        zcat /tmp/ibmsupt/snap.tar.Z | tar -tvf - 
    

    In the output, there should be three lines that look similar to the following:

    -rw-r--r--   1 user    group      36243456 Feb 20 10:05 dump.Z 
    -rw-r--r--   1 user    group           176 Feb 20 15:13 dump.snap 
    -rw-r--r--   1 user    group        933536 Feb 20 15:13 unix.Z 
    

    Go to step 6.

    NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 453088 bytes and unix.Z = 1270849 bytes.

  6. Sample file names for the tarred and compressed file follow:

    ad1000.tar.Z                    (item number.tar.Z)
    1x234.001.tar.Z              (problem_report_#.branch_office_#.tar.Z)
    1x234.1234567.tar.Z     (problem_report_#.customer_#.tar.Z)

    Follow theses steps to ftp the file to the testcase repository:

    1. Enter ftp 198.17.57.67 (or ftp testcase.boulder.ibm.com ).
    2. At the login prompt enter anonymous.
    3. At the password prompt enter your complete email address following the format customer@company.com.
    4. Change to binary transfer mode. Enter bin.
    5. Change into the aix directory. Enter, cd aix.
    6. Use the put command to place the file. For example, put 1x234.001.tar.Z.
    7. Enter ls -l to ensure the file lists in the contents of the directory.
    8. Enter quit.

  7. Sending a testcase via VM (IBM Internal):

    REMINDER: If you have absolutely no other available electronic method of sending a testcase, you can send it through VNET. This method only works for IBM internal customers. It is not automated, which could cause delays in testcase processing. FTP IS THE PREFERRED METHOD FOR SENDING TESTCASES.

    The testcase should be combined into a single compressed tar archive and uploaded in BINARY format.

    Please name the file PPPPPBBB TARZBIN, where PPPPP is the PMR number and BBB is the branch office number (without the b).

    node = AUSVMR
    userid = V3DEFECT


Forcing a system dump

If the system does not respond to mouse or keypad input, then it is in a hung state.

If the user cannot telnet, rlogin, or ping to the system, it is a hung system. Another indication of this is if the user can ping the system but the rest of the system is not available.

It is likely that the system will hang again. If this event recurs, the following steps will prepare the system for a forced dump.

Preparing for a forced dump

NOTE: In AIX Version 4.1.4 or later, a system dump can be forced without a key switch. The system needs to be initially configured to use this method, which you can do through SMIT by following the fast path.

Run the following command:

    smit dump 

Change the attribute Always Allow System Dump to TRUE.

When the system hangs again, proceed according to the type of system:

System with key and LED machine

  1. Turn the key to service mode. Wait for a moment, then press Reset.
  2. The LED sequence will be 0c9-0c4 or 0c9-0c0.

System with LED or non-LED display without a key switch or reset button, AIX 4.1.4 and later

  1. If there is no disk activity, press Ctrl Alt 1 (on the number pad).
  2. Wait for disk activity to stop.
If a hang occurs after completing either of the previous sequences, power off the system and go to
step 2 at the beginning of this document.
System Crashes in AIX 4.1 and 4.2: howto.dump.41-42.cmd ITEM: FAX
Dated: 99/06/21~00:00 Category: cmd
This HTML file was generated 99/06/24~12:41:57
Comments or suggestions? Contact us