10/09/96,4FAX# 6221 Managing System Dump Devices SPECIAL NOTICES Information in this document is correct to the best of our knowledge at the time of this writing. Please send feedback by fax to "AIXServ Information" at (512) 823-4009. Please use this information with care. IBM will not be responsible for damages of any kind resulting from its use. The use of this information is the sole responsibility of the customer and depends on the customer's ability to eval- uate and integrate this information into the customer's operational environment. ABOUT THIS DOCUMENT This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure. Its intent is to help the system administrator insure that a system dump will be complete and usable for troubleshooting purposes. For more in-depth coverage of this subject, the following IBM documents are recommended: o AIX Version 4.1 Software Problem Debugging and Reporting for the RISC System/6000. (GG24-2513) o Common Diagnostics and Service Guide (SA23-2687) o Diagnostic Information For Micro Channel Bus Systems (SA23-2765) o Diagnostic Information for Multiple Bus Systems (SA38-0509) o Problem Solving Guide and Reference (SA23-2204), (SA23-2606) o System Management Guide, V3.2 (SC23-2457) o System Management Guide, V4 (SC23-2525) o How to Send a System Dump (FAX# 1770 from 1-800-IBM-4FAX) This document applies to AIX versions 3.2, 4.1, and 4.2. Managing System Dump Devices 1 10/09/96,4FAX# 6221 MANAGING SYSTEM DUMP DEVICES When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions. There are 2 dumps devices (a primary and secondary). To view information about the current dump devices, use this command: sysdumpdev -l Example: # sysdumpdev -l primary /dev/hd7 secondary /dev/sysdumpnull In this example, the primary dump device is the logical volume "hd7". When you install the operating system, the primary dump device is automatically configured for you. In AIX 3.2, the default primary dump device is /dev/hd7. This is a Logical Volume dedicated for system dumps. In AIX 4.1 and 4.2, the default dump device is /dev/hd6. This is the primary paging space Logical Volume. In both AIX 3.2, 4.1, and 4.2 the default secondary dump device is /dev/sysdumpnull. This is a null device and any dump written to this device is lost. DETERMINING PROPER SIZE FOR DUMP DEVICE The default dump device created for system use may NOT be large enough for a complete dump. To determine how large your dump device is, determine what your primary dump device is, using the procedure mentioned in this section. If you do not currently have the dump device set to a tape drive, then this device should be a logical volume. Use this command to retrieve information about this logical volume: lslv Example: lslv hd7 This command will return a screen of information. You will need the values for "LPs:" and "PP SIZE:". Multiply these 2 values together to get the size of the dump device in mega- bytes. Next, you need to determine how large the dump device for your machine needs to be. Managing System Dump Devices 2 10/09/96,4FAX# 6221 To view an estimate of how large the dump device should be, use this command: sysdumpdev -e Example: # sysdumpdev -e Estimated dump size in bytes: 4526080 NOTE: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under it's heaviest work load. This will return a value in bytes. The primary dump device needs to be a size that is at or greater than the value returned. In this case, the dump space needs to be 4.5 mega- bytes. A normal system will have a Physical Partition size of 4 megabytes for rootvg. The dump device has to be increased in multiples of this size. A dump space of 4 mega- bytes would not be large enough to hold this dump, so the next size would have to be 8 megabytes. If you are at a level of AIX prior to 3.2.4, you may not have this command option. If this is the case, a general rule of thumb is to make the dump device 1/4 of the size of your total ram. To get the size of your total ram, run this command: bootinfo -r If the dump device is a standard dump logical volume, such as hd7, then the command to use to increase its size is "extendlv". If it is the primary paging space "hd6", then use the command "chps". SETTING A TAPE DRIVE AS A DUMP DEVICE If you do not have sufficient space on the system to store a dump, you can have a tape drive be your dump device. To accomplish this, put a blank tape in the desired tape drive and run this command: sysdumpdev -Pp /dev/rmt# In this case, rmt# refers to the specific tape drive you want to use for this (i.e., rmt0, rmt1, rmt2, etc.) Be aware that the tape drive will not be usable by any other application until you re-assign the dump device to another location. Managing System Dump Devices 3 10/09/96,4FAX# 6221 EXTENDED OPTIONS IN AIX 4.1 AND 4.2 If you are using AIX 4.1 or 4.2, there are 3 extra attri- butes that are not available in AIX 3.2. "sysdumpdev -l" will show these extra options. Example: # sysdumpdev -l primary /dev/hd6 secondary /dev/sysdumpnull copy directory /var/adm/ras forced copy flag TRUE always allow dump TRUE The "copy directory" entry specifies a filesystem in the "rootvg" volume group where the dump will be copied upon reboot after a system dump. This only applies if the primary dump is the primary paging space (hd6). The "force copy flag" entry specifies whether the system will prompt you to copy this dump to external media if there is not enough space in the specified filesystem. If this is set to "FALSE" and the system cannot copy this dump to the filesystem, then it will discard the contents of the dump. The "always allow dump" flag is a security measure. If this is set to "FALSE", then the only way to force a system dump would be to turn the service key to "Service" and then press reset. It also prevents forcing a dump of any kind on machines with no service key, such as all PCI based machines. If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the "df" command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the "sysdumpdev" command. This filesystem must be in the "rootvg" volume group. DUMPING TO A MIRRORED LOGICAL VOLUME AIX does not support dumping to a mirrored logical volume. This is because the dump only dumps to one copy of the logical volume. In other words, one of the mirrors will contain the dump. Since the LV isn't being handled like a mirrored lv, the new data written, i.e. the dump, won't get synch'd with the other mirrors. Thus when crash tries to read the dump, it can get data from both mirrors, only one of which actually contains the dump. In other words, crash sees good dump data mixed with garbage data, and won't read the dump. Managing System Dump Devices 4 10/09/96,4FAX# 6221 So, if we could just split up the logical volume, creating one logical volume per copy of the original, one of them would contain a good dump. We can do this with the "splitlvcopy" command! The proceedure for splitting the lv is: Run "lslv" to get the LV IDENTIFIER, # lslv hd7 LOGICAL VOLUME: hd7 VOLUME GROUP: rootvg LV IDENTIFIER: 0000335216021417.12 PERMISSION: read/write VG STATE: active/complete LV STATE: opened/syncd TYPE: dump WRITE VERIFY: off MAX LPs: 128 PP SIZE: 4 megabyte(s) COPIES: 2 SCHED POLICY: parallel LPs: 4 PPs: 8 STALE PPs: 0 BB POLICY: relocatable INTER-POLICY: minimum RELOCATABLE: yes INTRA-POLICY: middle UPPER BOUND: 32 MOUNT POINT: N/A LABEL: None MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: no Notice that there are 2 COPIES. This means that there is one mirror. Use the "splitlvcopy" command to split the LV, hd7 in this case, into 2 LVs. # splitlvcopy 0000335216021417.12 1 You may get a message like: splitlvcopy: WARNING! The logical volume being split, hd7, is open. Splitting an open logical volume may cause data loss or corruption and is not supported by IBM. IBM will not be held responsible for data loss or corruption caused by splitting an open logical volume. Do you wish to continue? y(es) n(o)? lv02 Enter "y". The command will complete and show the name of the new LV it created, e.g., lv02. At this point, hd7 con- tains one copy of the original hd7, and lv02 contains the other. This is exactly what we need. If there had been 3 COPIES, shown by "lslv", then lv02 would contain 2 copies of the original hd7. Run crash on /dev/hd7 first to see if that was the right copy. If crash doesn't give error messages you've found the right one. If the dump is unusable, run crash on /dev/lv02, if lv02 has only one copy, i.e. if the original hd7 con- tained 2 copies. If lv02 has 2 copies now, because the ori- ginal hd7 had 3, you'll have to run "lslv /dev/lv02" to get the LV IDENTIFIER, then "splitlvcopy 1" to split lv02 so you'll have one copy of each of its mirrors. This may not work for dumps taken to mirrored paging space, because the pager may have already overwritten the dump. Managing System Dump Devices 5 10/09/96,4FAX# 6221 READER'S COMMENTS Please fax this form to (512) 823-4009, attention "AIXServ Informa- tion". You may also e-mail comments to: elizabet@austin.ibm.com. These comments should include the same customer information requested below. Use this form to tell us what you think about this document. If you have found errors in it, or if you want to express your opinion about it (such as organization, subject matter, appearance) or make sug- gestions for improvement, this is the form to use. If you need technical assistance, contact your local branch office, point of sale, or 1-800-CALL-AIX (for information about support offer- ings). These services may be billable. Faxes on a variety of sub- jects may be ordered free of charge from 1-800-IBM-4FAX. Outside the U.S. call 415-855-4329 using a fax machine phone. When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. NOTE: If you have a problem report or item number, supplying that number may help us determine why a procedure did or did not work in your specific situation. Problem Report or Item #: Branch Office or Customer #: Be sure to print your name and fax number below if you would like a reply: Name: Fax Number: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ END OF DOCUMENT (manage.dump.krn,4FAX# 6221) Managing System Dump Devices 6