Managing System Dump Devices


Contents

About This Document
Managing System Dump Devices
Determining Proper Size for Dump Device
Setting a Tape Drive as a Dump Device
Extended Options in AIX 4.x
Dumping a Mirrored Logical Volume
Remote Dumps Over to a Network

About This Document

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure.

Its intent is to help the system administrator insure that a system dump will be complete and usable for troubleshooting purposes.

For more in-depth coverage of this subject, the following IBM documents are recommended:

This document applies to AIX versions 3.2 and 4.x.


Managing System Dump Devices

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are 2 dumps devices (a primary and secondary). To view information about the current dump devices, use this command:

 
sysdumpdev -l 

Example:

 
# sysdumpdev -l 
primary              /dev/hd7 
secondary            /dev/sysdumpnull 

In this example, the primary dump device is the logical volume "hd7".

When you install the operating system, the primary dump device is automatically configured for you.

In AIX 3.2, the default primary dump device is /dev/hd7. This is a Logical Volume dedicated for system dumps.

In AIX 4.x, the default dump device is /dev/hd6. This is the primary paging space Logical Volume.

In both AIX 3.2 and 4.x the default secondary dump device is /dev/sysdumpnull. This is a null device and any dump written to this device is lost.


Determining Proper Size for Dump Device

The default dump device created for system use may NOT be large enough for a complete dump. To determine how large your dump device is, determine what your primary dump device is, using the procedure mentioned in this section. If you do not currently have the dump device set to a tape drive, then this device should be a logical volume. Use this command to retrieve information about this logical volume:

 
lslv <LOGICAL VOLUME NAME> 

Example:

 
lslv hd7 

This command will return a screen of information. You will need the values for "LPs:" and "PP SIZE:". Multiply these 2 values together to get the size of the dump device in megabytes.

Next, you need to determine how large the dump device for your machine needs to be.

To view an estimate of how large the dump device should be, use this command:

 
sysdumpdev -e 

Example:

 
# sysdumpdev -e 
Estimated dump size in bytes: 4526080 

NOTE: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device needs to be a size that is at or greater than the value returned. In this case, the dump space needs to be 4.5 megabytes. A normal system will have a Physical Partition size of 4 megabytes for rootvg. The dump device has to be increased in multiples of this size. A dump space of 4 megabytes would not be large enough to hold this dump, so the next size would have to be 8 megabytes.

If you are at a level of AIX prior to 3.2.4, you may not have this command option. If this is the case, a general rule of thumb is to make the dump device 1/4 of the size of your total ram. To get the size of your total ram, run this command:

 
bootinfo -r 

If the dump device is a standard dump logical volume, such as hd7, then the command to use to increase its size is extendlv. If it is the primary paging space "hd6", then use the command chps.


Setting a Tape Drive as a Dump Device

If you do not have sufficient space on the system to store a dump, you can use a tape drive as your dump device. To accomplish this, put a blank tape in the desired tape drive and run this command:

 
sysdumpdev -Pp /dev/rmt# 

In this case, rmt# refers to the specific tape drive you want to use for this (i.e., rmt0, rmt1, rmt2, etc.)

Be aware that the tape drive will not be usable by any other application until you re-assign the dump device to another location.


Extended Options in AIX 4.x

If you are using AIX 4.x, there are 3 extra attributes that are not available in AIX 3.2. sysdumpdev -l will show these extra options.

Example:

 
# sysdumpdev -l 
primary              /dev/hd6 
secondary            /dev/sysdumpnull 
copy directory       /var/adm/ras 
forced copy flag     TRUE 
always allow dump    TRUE 

The copy directory entry specifies a filesystem in the "rootvg" volume group where the dump will be copied upon reboot after a system dump. This only applies if the primary dump is the primary paging space (hd6).

The force copy flag entry specifies if the system will prompt you to copy this dump to external media if there is not enough space in the specified filesystem. If this is set to "FALSE" and the system cannot copy this dump to the filesystem, then it will discard the contents of the dump.

The always allow dump flag is a security measure. If this is set to "FALSE", then the only way to force a system dump would be to turn the service key to "Service" and then press reset. It also prevents forcing a dump of any kind on machines with no service key, such as all PCI based machines.

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the "rootvg" volume group.


Dumping to a Mirrored Logical Volume

AIX does not support dumping to a mirrored logical volume. This is because the dump only dumps to one copy of the logical volume. In other words, one of the mirrors will contain the dump. Since the LV isn't being handled like a mirrored lv, the new data written, i.e. the dump, won't get synch'd with the other mirrors. Thus when crash tries to read the dump, it can get data from both mirrors, only one of which actually contains the dump. In other words, crash sees good dump data mixed with garbage data, and won't read the dump.

So, if we could just split up the logical volume, creating one logical volume per copy of the original, one of them would contain a good dump. We can do this with the splitlvcopy command. The proceedure for splitting the lv is:

 
Run "lslv" to get the LV IDENTIFIER, 
# lslv hd7 
LOGICAL VOLUME: hd7                 VOLUME GROUP: rootvg 
LV IDENTIFIER:  0000335216021417.12 PERMISSION:   read/write 
VG STATE:       active/complete     LV STATE:     opened/syncd 
TYPE:           dump                WRITE VERIFY: off 
MAX LPs:        128                 PP SIZE:      4 megabyte(s) 
COPIES:         2                   SCHED POLICY: parallel 
LPs:            4                   PPs:          8 
STALE PPs:      0                   BB POLICY:    relocatable 
INTER-POLICY:   minimum             RELOCATABLE:  yes 
INTRA-POLICY:   middle              UPPER BOUND:  32 
MOUNT POINT:    N/A                 LABEL:        None 
MIRROR WRITE CONSISTENCY: on 
EACH LP COPY ON A SEPARATE PV ?: no 

Notice that there are 2 COPIES. This means that there is one mirror. Use the splitlvcopy command to split the LV, hd7 in this case, into 2 LVs.

 
# splitlvcopy 0000335216021417.12 1 

You may get a message like:

 
splitlvcopy: WARNING! The logical volume being split, hd7, 
        is open. Splitting an open logical volume may cause 
        data loss or corruption and is not supported by IBM. 
        IBM will not be held responsible for data loss or 
        corruption caused by splitting an open logical 
        volume. Do you wish to continue? y(es) n(o)? lv02 

Enter y. The command will complete and show the name of the new LV it created, e.g., lv02. At this point, hd7 contains one copy of the original hd7, and lv02 contains the other. This is exactly what we need.

If there had been 3 COPIES, shown by "lslv", then lv02 would contain 2 copies of the original hd7.

Run crash on /dev/hd7 first to see if that was the right copy. If crash doesn't give error messages you've found the right one. If the dump is unusable, run crash on /dev/lv02, if lv02 has only one copy, i.e. if the original hd7 contained 2 copies. If lv02 has 2 copies now, because the original hd7 had 3, you'll have to run lslv /dev/lv02 to get the LV IDENTIFIER, then splitlvcopy <LVID> 1 to split lv02 so you'll have one copy of each of its mirrors.

This may not work for dumps taken to mirrored paging space, because the pager may have already overwritten the dump.


Remote Dumps Over a Network

Currently, the system dump does not handle ARP requests received from the server, or the gateway used, during the dump. If an ARP request is received while taking a dump, this causes the dump to hang. If your system takes a system dump and hangs on 0c7, this is likely the problem. You will have to power the system off and reboot at this point.

To avoid this problem, you can create a permanent arp entry for the client (the dumping machine) on the server or gateway. The machine that needs the permanent arp entry is the machine on the same local network or ring as the client. This can be thought of as the logical server, since, if it's not the real server, the dump data must pass through it to get to the real server.

Note: "real server" refers to the machine designated in the remote dump specification on the client.

Run the following steps on the real server to establish a permanent arp entry on the server or gateway machine.

  1. Ensure an arp entry exists by pinging the client.
     
        Example: ping myclient.xyz.com 
    
  2. Use arp -a to see the arp table.
     
    Example: 
     # arp -a 
    

    The following 4 lines of text should appear as two full lines.

     
     myclient.xyz.com (128.3.56.9) 
                       at 10:0:5a:9:e:7d [token ring] 
     myserver.xyz.com(128.3.56.20) 
                       at 10:0:5a:8f:12:bf [token ring]| 
    
  3. Now use the arp command to make the dumping client's entry permanent.
     
       Example:   # arp -s 802.5 myclient.xyz.com 10:0:5a:9:e:7d 
    

The 802.5 refers to a token-ring network. Valid network types are listed in the arp documentation in InfoExplorer, and are currently ether(802.3), fddi, and 802.5.

Note that if the dump hangs and the client must be rebooted, the partial dump on the server may still be useful.


Managing System Dump Devices: manage.dump.32-42.cmd ITEM: FAX
Dated: 99/04/15~00:00 Category: cmd
This HTML file was generated 99/06/24~12:42:00
Comments or suggestions? Contact us