ITEM: BC9105L
system crashed with 888, it is back up now
Env:
AIX 3.2.5
RISC 990
Desc:
Customer's system crashed with 888-102-300-0c4
28mb dump out of a 20mb dump space.
Action:
had the customer do this:
crash /dev/hd7
stat
t -m
Sending in the output from this command
Next Action:
Waiting on email from the customer
Response:
Script started on Wed Dec 13 10:50:26 1995netdb:/tmp/ibmsupt \# crash /dev/hd7
Using /unix as the default namelist file.
Reading in Symbols .........................
Symbol panicstr has null value.
Symbol t_maint has null value.
> stats\^H \^H
Not a valid dump data area:0x41495800
0452-338: Cannot read uname structure from address 0x41495800
time of crash: Tue Dec 12 18:46:38 1995
Not a valid dump data area:0x383affe3
age of system:
> t -m
MST STACK TRACE:
00219db0 (excpt=00000000:00000000:00000000:00000000:00000000) (intpri=0)
IAR: 0000559c .i_enable + 9c: bcr cr20,0
*LR: 00021b14 .i_offlevel + 7c
*00058e68: 00000000 \
2ff98000 (excpt=60a60018:40000000:007fffff:60a60018:00000106) (intpri=b)
IAR: 0149bfc4 [TRL.ext:trace_cmd] + 7854: lsi r5,r3,0x4
*LR: 01499484 [TRL.ext:llcheck] + 4d14
*2ff7fd90: 00000000 \
2ff7fde0: 014a6314 [TRL.ext:test_grpaddr] + 11ba4
2ff7fe40: 014a616c [TRL.ext:rcv_completion] + 119fc
2ff7fef0: 014a5aac [TRL.ext:link_manager] + 1133c
2ff7ff80: 014a4df4 [TRL.ext:init_proc] + 10684
2ff7ffc0: 000330f0 .procentry + 14
00000000: 00000000 \
.Output from the errpt:
.ERROR LABEL: DSI_PROC
ERROR ID: 20FAED7F
Date/Time: Tue Dec 12 18:46:38
Sequence Number: 245053
Machine Id: 000003328000
Node Id: netdb
Class: S
Type: PERM
Resource Name: SYSVMM
Error Description
Data Storage Interrupt, Processor
Probable Causes
SOFTWARE PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
Data Storage Interrupt Status Register
4000 0000
Data Storage Interrupt Address Register
007F FFFF
Segment Register, SEGREG
60A6 0018
EXVAL
0000 000E
Response:
Found a Exact hit on this. Known bug, APAR to fix is IX41199
System crashes when packet is received on sap 0xfc and user has
not opened sap.
PROBLEM CONCLUSION:
Test_grpaddr() is no longer called when discovery packet is
received per discovery packet is special case and does not
require explicit enable of sap. Group address is already
handled within discovery sap logic. Also, added check in
test_grpaddr() to verify sap exists before handling packet.
This handles exposure that adapter places packet on dlc ring
queue and sap is disabled before packet is processed.
I explained the above to the client, he would like a more
detailed explained as to why his system has crashed, why it
is happening now, what in the source is being fixed.DLCs support 2
kinds of resolution -- name discovery and address
resolution. The application defines which of these are to be used
to find the remote. Name discovery uses a special protocol over the
FC sap. There is a find and a found packet which is sent to determine
the address of the remote system.
Address resolution uses the TEST command sent on the NULL sap as a
broadcast message.
Apar ix41199 was to fix a problem when the Discovery packet is sent
to the FC sap but was unexpected (system not using name discovery).
Probable cause is due to some other system erroneously using name
discovery instead of address resolution.There apparently is no
longer an external documentation which
describes the 0xFC sap being used for name discovery. This was a
limited functionality for SNA which was later removed. Because
the RS/6000 had to interoperate with the AIX RT, the function was
retained -- although it is not the default and will be removed in the
future.
New functionality was added to support OS/2 RIPL which also uses the
0xFC sap. This also is not documented externally (non-AIX pubs).
This assumes that the Functional Address will be the LS/X RIPL
functional address = 0xC00040000000.
Without an attachment or link trace (or a network sniffer trace), I
cannot tell what packet was received.
Response:
Provided this information to the customer
Closing with customer approval
Support Line: system crashed with 888, it is back up now ITEM: BC9105L
Dated: January 1996 Category: N/A
This HTML file was generated 99/06/24~13:30:23
Comments or suggestions?
Contact us