The Diagnostic Controller loads the Kernel Extensions for each device that requires it. This is specified by the PDiagRes->KernExt ODM stanza for the device. If using DIAGEX or PDIAGEX, there is a trace hook built in for debugging purposes.
To use this trace hook, you first must make sure that the trace command is installed. This command is part of the bos.sysmgt.trace fileset.
To run trace, perform the following:
trace -j 355 // Invoke trace > trcon // Start trace > !diag -d "device_name"// Run diagnostics against the device > trcoff // Stop trace > quit // Quit
To generate a trace file, perform the following:
trcrpt -o /tmp/diagex.trc
This trace file will contain all the steps performed by the diagnostic kernel extension. To understand the tags, you must use the source code.
The Diagnostic Controller loads the Kernel Extensions for each device that requires it. This is specified by the PDiagRes->KernExt ODM stanza for the device. If you are using DIAGEX or PDIAGEX, there is a trace hook built in for debugging purposes.
To use this trace hook, first make sure that the trace command is installed. This command is part of the bos.sysmgt.trace fileset.
To run trace in the background, enter:
trace -a -j 355 -L < length of file > -o < filename >
The -L flag overrides the default trace log file size of 1 MB with the value stated. Specifying a file size of zero sets the trace log file size to the default size. The -o flag outputs trace data to a specific trace log file.
To generate a trace file, enter:
trcrpt < filename > < output filename >
This trace file will contain all the steps performed by the diagnostic kernel extension. To understand the tags, you must use the source code.
To stop a trace, enter:
trcstop
While in the Kernel Debugger, there is a structure that can be searched that gives the address of the trace buffer and first device handle. For DIAGEX, this structure is diag_cntl. For PDIAGEX, it is pdiag_cntl. Use the map command to get the address of the structure.
For instance, for PDIAGEX:
>0> map pdiag_cntl pdiag_cntl:0x0123F220, type:CSECT Definition
>0> d 123F220 100
0123F220 FFFFFFFF FFFFFFFF 05C8A400 00000764 |...............d|
0123F230 64677874 72616365 544F5021 21212100 |dgxtraceTOP!!!!.|
0123F240 72775F61 00000004 00000000 00000000 |rw_a............|
0123F250 67697042 00000018 00000004 2FF3B270 |gipB......../..p|
0123F260 67697064 000000C0 00000000 3D7FF018 |gipd........=...|
0123F270 67697045 00000000 3D7FF018 00000000 |gipE....=.......|
0123F280 72775F62 00000001 00000001 00000001 |rw_b............|
0123F290 72775F45 00000000 00000000 00000000 |rw_E............|
0123F2A0 52656445 00000000 20001111 00000000 |RedE.... .......|
0123F2B0 57727442 05C8A200 00000004 00000014 |WrtB............|
0123F2C0 5772742B 14000000 00000001 00000001 |Wrt+............|
0123F2D0 5772742B 00000001 0000007B 00000000 |Wrt+.......{....|
0123F2E0 72775F42 05C8A200 00000001 00000001 |rw_B............|
0123F2F0 72775F2B 00000004 00000014 00000001 |rw_+............|
0123F300 72775F2B 0000007B 14000000 00000001 |rw_+...{........|
0123F310 66685F42 05C8A200 05C8A200 00000000 |fh_B............|
>0> find dgxtraceCUR 123F220 01240FC0 64677874 72616365 43555221 21212121 |dgxtraceCUR!!!!!|
Work backwards from this point to see exactly what events have taken place to this point.
>0> d 05C8A400 100 05C8A400 00000000 012438B8 00040040 0000000D |.....$8....@....| 05C8A410 00000003 000000C0 0000002C 00000000 |...........,....| 05C8A420 011759FC 05F1D000 00000000 60054335 |..Y.........`.C5| 05C8A430 00000000 00000000 00000070 000000C0 70 is slot#, C0 is bus id# 05C8A440 00000004 007FF800 00000100 00000000 4 is bus type 7ff800 is io 05C8A450 00000100 00000000 00000000 00000000 address of the bus
The 8th word is a pointer to the next device in the linked list. In this case the 8th word is 00000000, indicating this is the only device.
In some instances, an Illegal Trap Instruction may occur if some application unloads their SLIH or kernel extension, without having previously unpinned its memory. This can also happen if the Diagnostic Kernel Extension close routine is not called on exit.
If this happens when the debugger is enabled, a screen similar to the following may appear. The appearance of ff_free in the dump is the indicator that an application did not unpin some code before unloading.
The address passed to ff_free is in (r29) or r30. Use the (s)creen command to trace back until you see a familiar function name. In the following example, the SLIH mps_interrupt was indicated.
GPR0 00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400 GPR8 00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000 GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40 MSR 00029030 CR 44224828 LR 0014032C CTR 000908A8 MQ 00000000 XER 00000000 SRR0 00140334 SRR1 00029030 DSISR 40000000 DAR 00000000 IAR 00140334 (ORG+00140334) ORG=00000000 Mode: VIRTUAL 00140330 5400D97E 0C800000 387F0000 4BECADC5 |T..~....8...K...| | tweqi r0,0x0 00140340 81810058 30210050 7D8803A6 BBA1FFF4 |...X0!.P}.......| | 00140330 5400D97E 0C800000 387F0000 4BECADC5 |T..~....8...K...| 00140340 81810058 30210050 7D8803A6 BBA1FFF4 |...X0!.P}.......| 00140350 4E800020 00000000 00002041 80030100 |N.. ...... A....| 00140360 00000000 00000174 00076666 5F667265 |.......t..ff_fre| 00140370 65000000 80E20328 BF81FFF0 7C0802A6 |e......(....|...| 00140380 2C070000 90010008 9421FFB0 3B830000 |,........!..;...| 00140390 41820050 80E201E8 38640000 83810040 |A..P....8d.....@| Illegal Trap Instruction Interrupt in Kernel >0>
>0> s 1A1C5a0 100 GPR0 00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400 GPR8 00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000 GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40 MSR 00029030 CR 44224828 LR 0014032C CTR 000908A8 MQ 00000000 XER 00000000 SRR0 00140334 SRR1 00029030 DSISR 40000000 DAR 00000000 IAR 00140334 (ORG+00140334) ORG=00000000 Mode: VIRTUAL 00140330 5400D97E 0C800000 387F0000 4BECADC5 |T..~....8...K...| | tweqi r0,0x0 00140340 81810058 30210050 7D8803A6 BBA1FFF4 |...X0!.P}.......| | 01A1C5A0 01A29850 0000A518 01DF0004 325E9F94 |...P........2^..| 01A1C5B0 00000000 00000000 00481007 010B0001 |.........H......| 01A1C5C0 00000BF0 0000010C 00000000 000000E4 |................| 01A1C5D0 00000000 00000000 000000F0 00020001 |................| 01A1C5E0 00020002 00040003 00020003 314C0000 |............1L..| 01A1C5F0 00000000 00000000 00000000 00000000 |................| 01A1C600 00000000 2E746578 74000000 00000000 |.....text.......|
>0> enter several times GPR0 00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400 GPR8 00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000 GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40 MSR 00029030 CR 44224828 LR 0014032C CTR 000908A8 MQ 00000000 XER 00000000 SRR0 00140334 SRR1 00029030 DSISR 40000000 DAR 00000000 IAR 00140334 (ORG+00140334) ORG=00000000 Mode: VIRTUAL 00140330 5400D97E 0C800000 387F0000 4BECADC5 |T..~....8...K...| | tweqi r0,0x0 00140340 81810058 30210050 7D8803A6 BBA1FFF4 |...X0!.P}.......| | 01A1CDF0 41820010 306300CC 48000479 80410014 |A...0c..H..y.A..| 01A1CE00 38600000 4800000C 3860FFFF 48000004 |8`..H...8`..H...| 01A1CE10 80010088 7C0803A6 30210080 BBC1FFF8 |....|...0!......| 01A1CE20 4E800020 00000000 00002041 80020201 |N.. ...... A....| 01A1CE30 00000000 00000780 000D6D70 735F696E |..........mps_in| 01A1CE40 74657272 75707400 00000000 BDA1FFB4 |terrupt.........| 01A1CE50 80A20004 39C30000 80650060 7C0802A6 |....9....e.`|...|