09/04/96 Diagnosing CHECKSTOPS SPECIAL NOTICES Information in this document is correct to the best of our knowledge at the time of this writing. Please send feedback by fax to "AIXServ Information" at (512) 823-4009. Please use this information with care. IBM will not be responsible for damages of any kind resulting from its use. The use of this information is the sole responsibility of the customer and depends on the customer's ability to eval- uate and integrate this information into the customer's operational environment. ABOUT THIS DOCUMENT This document discusses checkstops, a machine check that occurs during another machine check. This document is applicable to AIX version 3.2 and 4.1. WHAT IS A MACHINECHECK A machine check is an error logged by the machine check handler. Causes of a machine check could be: o uncorrectable (double bit) memory error o an access to a memory address that does not exist An non-maskable interrupt (NMI) is generated. The operating system logs the machine check including various error logging registers reporting the cause of the machine check and a system dump is initiated. WHAT IS A CHECKSTOP A Checkstop is a machine check that occurs during another machine check. A checkstop will also occur when the machine (usually a processor but sometimes a cache, memory, or IO bus controller) determines that something is in an "impos- sible" state. An error occurred that can not be isolated to a particular bus transfer in progress, or a processor detects no progress is being made. The processor is not able to complete any instructions for some period of time. When a system checkstops, the clocks in the machine are frozen within a few cycles after the error and the service processor saves the part of the state of the CPUs in NVRAM. It then attempts to do a full hardware reset and restart the system some number of times. When the system reboots, the data is copied to a file in the /usr/lib/ras directory (ras stands for reliability and service). Two file names are used (checkstop.A and checkstop.B) in a rotating manner. The total number of checkstops during the reboot attempts before the system came Diagnosing CHECKSTOPS 1 09/04/96 up successfully is logged in the error log entry along with the file name. If a second machine check occurs before the operating system completes logging the error (to NVRAM) and initiates a com- plete hardware reset and/or halts, the processor will checkstop. HOW TO PROCEED | An IBM Hardware Service contract must be in place for this | issue to be addressed. | A service call should be placed to your local IBM Hardware | Support organization to have these files examined. They can | be contacted through this telephone number: | 1-800-IBM-SERV (1-800-426-7378) | Alternatively, the following instructions can be used to | package these files to be sent in for Hardware Service to | examine. Gather system information by performing the following steps: 1. Run the command "snap -g". This will put general infor- mation about the machine in the directory "/tmp/ibmsupt" 2. Copy the checkstop files in /usr/lib/ras to the direc- tory /tmp/ibmsupt/testcase Example: cp /usr/lib/ras/checkstop* /tmp/ibmsupt/testcase 3. Make a file called "customer_info" in the directory /tmp/ibmsupt/other. In this file, you need to include the following information: Main contact Telephone number of main contact Machine type having the problem (Example: 7011, 7012, 7015, etc.) Serial number of machine Location of machine (physical location of machine including address) The above information is the same information you would provide if you were calling IBM Hardware support. Also include a history of the problem after the above infor- mation such as, the LED code the machine displayed and what symptoms occurred. 4. Use the following steps to put the testcase on tape: a. With the following command, determine the block size used to back up the tape: lsattr -E -l rmt# | grep block_size Diagnosing CHECKSTOPS 2 09/04/96 Sample output: block_size 512 BLOCK size (0=variable length) True Note the number where "512" is in this example. That number is the block size. b. Run the following command to put the testcase on tape: tar -cvf /dev/ /tmp/ibmsupt Example: tar -cvf /dev/rmt0 /tmp/ibmsupt 5. Use the following steps to put the testcase on diskette: tar -cvf /dev/ /tmp/ibmsupt Example: tar -cvf /dev/fd0 /tmp/ibmsupt 6. Label the tape/diskette. (supply this only if using a tape drive) VERY IMPORTANT: If the person sending in this testcase is not the person who reported the problem, be sure to include the name of the person who reported it. If the proper information is not on the package, then it takes valuable time to process and delays solving your problem. SENDING TESTCASES Sending Via Standard Mail or Overnight Delivery NOTE: Look at the "file Naming Convention" section of this document and name testcases accordingly. IBM Corp. / Zip 2900 / Bldg 42 Attn: AIX Testcase Dept. J66S 11400 Burnet Road Austin, TX 78758-3493 Extension 3-4050 Note: NO overnight delivery service delivers directly to bldg. 42, instead they are delivered to IBM Receiving. Please keep this in mind when choosing a delivery time. If you specify Saturday delivery you must first make special arrangements with an AIX Support specialist (1-800-225-5249 Diagnosing CHECKSTOPS 3 09/04/96 or 1-800-call-aix), otherwise there could be a delay of several days. Sending Via FTP or E-MAIL +--- SERVICE USE AGREEMENT --------------------------------+ | | | International Business Machines Corporation | | Internet Testcase Delivery Service | | Service Use Agreement | | | | THIS IS A LEGAL AGREEMENT TO WHICH YOU ARE CONSENTING TO | | BE BOUND. IF YOU DO NOT AGREE TO ALL OF THE TERMS OF | | THIS LICENSE, DO NOT USE THE SERVICE. | | | | o International Business Machines Corporation (IBM) | | grants you a non-exclusive access to the Testcase | | Delivery Service via the Internet. | | | | o IBM makes no representations about the suitability | | of this service or about any content or information | | made accessible by the service, for any purpose. The | | service is provided "as is" without express or | | implied warranties, including warranties of | | merchantability and fitness for a particular purpose | | or noninfringement. This service is provided | | gratuitously and, accordingly, IBM shall not be | | liable for any damages suffered by you or any user | | of the service. | | | | o While IBM intends to maintain this Service, IBM | | reserves the right at any time to alter access, fea- | | tures, capabilities and functions of this Service. | | IBM may terminate this Service at any time without | | notice to you. | | | | (continued) | | | +----------------------------------------------------------+ Diagnosing CHECKSTOPS 4 09/04/96 +--- SERVICE USE AGREEMENT (CONTINUED) --------------------+ | | | o You may not download or upload or otherwise export | | or reexport images or files from or to systems pro- | | viding this Service except in full compliance with | | all United States and other applicable laws and reg- | | ulations. In particular, but without limitation, | | none of the images or files provided or received by | | this Service may be downloaded or uploaded or other- | | wise exported or reexported (i) into or received | | from (or to a national or resident of) Cuba, Haiti, | | Iraq, Libya, Yugoslavia, North Korea, Iran, or Syria | | or (ii) to anyone on the US Treasury Department's | | list of Specially Designated Nationals or the US | | Commerce Department's Table of Deny Orders. By down- | | loading or uploading images or files from or to | | systems providing this Service, you are agreeing to | | the foregoing and you are representing and war- | | ranting that you are not located in, under control | | of, or a national or resident of any such country on | | any such list. | | | +----------------------------------------------------------+ Email Via Internet Relatively small testcases can be sent via e-mail to aasc@austin.ibm.com. The allowable size of the testcase depends mostly on intermediate mail gateways. It is gener- ally not advisable to send tescases larger than approxi- mately 4MB via e-mail. Testcases sent via e-mail in the correct format are automatically processed, making them almost immediately available for analysis. See the "File Naming Convention" section of this document for more details. FTP Via Internet Sample file names for the tarred and compressed files: ad1000.tar.Z (item number.tar.Z) 1x234.001.tar.Z (problem_report_#.branch_office_#.tar.Z) 1x234.1234567.tar.Z (problem_report_#.customer_#.tar.Z) FTP it to our testcase repository: ftp 198.17.57.67 (or "ftp testcase.boulder.ibm.com") login: anonymous password: (e.g., "customer@wallyworld.com") bin (change to binary transfer mode) cd aix put 1x234.001.tar.Z (for example) ls -l quit Diagnosing CHECKSTOPS 5 09/04/96 FILE NAMING CONVENTION: NOTE: Testcases should be archived into a compressed tar archive using relative path names. If FTPing, Do not create a separate directory on the testcase server. Use of the following naming convention will allow our tools to automatically move the testcase to the proper directories and update the PMR to indicate that the testcase is avail- able. Failure to use this naming convention could cause delays in processing the testcase. If the testcase files cannot be associated with an existing PMR or XMENU item, they will be tagged as lost and eventu- ally deleted. Testcases associated with PMRs ppppp.bbb.ccc.tar.Z | | | | | | | | | Indicates that the archive is compressed. | | | | If you have the gzip command available, | | | | this will be .gz rather than .Z. | | | | | | | Indicates a tar archive. | | | | | The 3 character country code. | | (May be omitted if the country code is 000.) | | | The 3 character branch office number. | The 5 character PMR number. Testcases associated with XMENU items xxxxxx.tar.Z | | | | | Indicates that archive is compressed. | | If you have the gzip command available, | | this will be .gz rather than .Z. | | | Indicates a tar archive. | The XMENU item number. CREATING A COMPRESSED TAR ARCHIVE Testcases should be archived into a compressed tar archive using relative path names. The following is an example of how to create a compressed tar archive. o Place the testcase files into a separate directory using commands similar to those in the following example. In this example, we'll use a directory named /tmp/testcase. cd /tmp mkdir testcase cp file1 file2 file3 testcase Diagnosing CHECKSTOPS 6 09/04/96 o Execute one of the following commands to create the com- pressed tar archive. Using the gzip command (if avail- able), will generally produce a significantly smaller file. With the gzip command: tar -cf- testcase | gzip -9 >9x999.999.999.tar.gz Without the gzip command: tar -cf- testcase | compress >9x999.999.999.tar.Z Testcases should be archived into a compressed tar archive using relative path names. The following example will uuencode and send the compressed tar archive created in the previous section. You should substitute the correct filename when sending your testcase. uuencode 9x999.999.999.tar.gz 9x999.999.999.tar.gz | \ mail -s "AIX_Testcase:9x999.999.999.tar.gz" aasc@austin.ibm.com Sending Via Fax 512-823-7634 (793-7634 within IBM) Sending Via IBM Internal (VM) REMINDER: If you have absolutely no other available elec- tronic method of sending a testcase, you can send it through VNET. This method only works for IBM internal customers. It is not automated, which could cause delays in testcase proc- essing. FTP IS THE PREFERRED METHOD FOR SENDING TESTCASES. The testcase should be combined into a single compressed tar archive and uploaded in BINARY format. Please name the file PPPPPBBB TARZBIN, where PPPPP is the PMR number and BBB is the branch office number (without the "b"). node = "AUSVMR" userid = "V3DEFECT" Diagnosing CHECKSTOPS 7 09/04/96 READER'S COMMENTS Please fax this form to (512) 823-4009, attention "AIXServ Informa- tion". You may also e-mail comments to: elizabet@austin.ibm.com. These comments should include the same customer information requested below. Use this form to tell us what you think about this document. If you have found errors in it, or if you want to express your opinion about it (such as organization, subject matter, appearance) or make sug- gestions for improvement, this is the form to use. If you need technical assistance, contact your local branch office, point of sale, or 1-800-CALL-AIX (for information about support offer- ings). These services may be billable. Faxes on a variety of sub- jects may be ordered free of charge from 1-800-IBM-4FAX. Outside the U.S. call 415-855-4329 using a fax machine phone. When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. NOTE: If you have a problem report or item number, supplying that number may help us determine why a procedure did or did not work in your specific situation. Problem Report or Item #: Branch Office or Customer #: Be sure to print your name and fax number below if you would like a reply: Name: Fax Number: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ END OF DOCUMENT (checkstops.krn) Diagnosing CHECKSTOPS 8