[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]

Understanding the Diagnostic Subsystem for AIX

Debugging Hints for Diagnostic Kernel Extension


Starting Trace for Diagnostic Kernel Extension

The Diagnostic Controller loads the Kernel Extensions for each device that requires it. This is specified by the PDiagRes->KernExt ODM stanza for the device. If using DIAGEX or PDIAGEX, there is a trace hook built in for debugging purposes.

To use this trace hook, you first must make sure that the trace command is installed. This command is part of the bos.sysmgt.trace fileset.

To run trace, perform the following:

trace -j 355            // Invoke trace
> trcon                 // Start trace
> !diag -d "device_name"// Run diagnostics against the device
> trcoff                // Stop trace
> quit                  // Quit

To generate a trace file, perform the following:

trcrpt -o /tmp/diagex.trc 

This trace file will contain all the steps performed by the diagnostic kernel extension. To understand the tags, you must use the source code.

Running Trace for Diagnostic Kernel Extension in the Background

The Diagnostic Controller loads the Kernel Extensions for each device that requires it. This is specified by the PDiagRes->KernExt ODM stanza for the device. If you are using DIAGEX or PDIAGEX, there is a trace hook built in for debugging purposes.

To use this trace hook, first make sure that the trace command is installed. This command is part of the bos.sysmgt.trace fileset.

To run trace in the background, enter:

trace -a -j 355 -L < length of file > -o < filename >

The -L flag overrides the default trace log file size of 1 MB with the value stated. Specifying a file size of zero sets the trace log file size to the default size. The -o flag outputs trace data to a specific trace log file.

To generate a trace file, enter:

trcrpt < filename > < output filename >

This trace file will contain all the steps performed by the diagnostic kernel extension. To understand the tags, you must use the source code.

Note
You can only have one trace running at a time.

To stop a trace, enter:

trcstop

Finding the Right Address

Note
The following examples are based on a particular debugger. The concepts shown can be applied using the debugger available to you.

While in the Kernel Debugger, there is a structure that can be searched that gives the address of the trace buffer and first device handle. For DIAGEX, this structure is diag_cntl. For PDIAGEX, it is pdiag_cntl. Use the map command to get the address of the structure.

For instance, for PDIAGEX:

  1. >0> map pdiag_cntl
    pdiag_cntl:0x0123F220, type:CSECT Definition
  2. Use that address and display 100 words:
    >0> d 123F220 100
    0123F220   FFFFFFFF FFFFFFFF 05C8A400 00000764   |...............d|
    0123F230   64677874 72616365 544F5021 21212100   |dgxtraceTOP!!!!.|
    0123F240   72775F61 00000004 00000000 00000000   |rw_a............|
    0123F250   67697042 00000018 00000004 2FF3B270   |gipB......../..p|
    0123F260   67697064 000000C0 00000000 3D7FF018   |gipd........=...|
    0123F270   67697045 00000000 3D7FF018 00000000   |gipE....=.......|
    0123F280   72775F62 00000001 00000001 00000001   |rw_b............|
    0123F290   72775F45 00000000 00000000 00000000   |rw_E............|
    0123F2A0   52656445 00000000 20001111 00000000   |RedE.... .......|
    0123F2B0   57727442 05C8A200 00000004 00000014   |WrtB............|
    0123F2C0   5772742B 14000000 00000001 00000001   |Wrt+............|
    0123F2D0   5772742B 00000001 0000007B 00000000   |Wrt+.......{....|
    0123F2E0   72775F42 05C8A200 00000001 00000001   |rw_B............|
    0123F2F0   72775F2B 00000004 00000014 00000001   |rw_+............|
    0123F300   72775F2B 0000007B 14000000 00000001   |rw_+...{........|
    0123F310   66685F42 05C8A200 05C8A200 00000000   |fh_B............|
  3. The current pointer can be found by searching from this point for dgxtraceCUR!:
    >0> find dgxtraceCUR 123F220
    01240FC0   64677874 72616365 43555221 21212121   |dgxtraceCUR!!!!!|

    Work backwards from this point to see exactly what events have taken place to this point.

  4. As far as the device handles are concerned, display 100 words to see the data associated with the device at that address (the third word from 2.b above):
    >0> d 05C8A400 100
    05C8A400   00000000 012438B8 00040040 0000000D   |.....$8....@....|
    05C8A410   00000003 000000C0 0000002C 00000000   |...........,....|
    05C8A420   011759FC 05F1D000 00000000 60054335   |..Y.........`.C5|
    05C8A430   00000000 00000000 00000070 000000C0   70 is slot#, C0 is bus id#
    05C8A440   00000004 007FF800 00000100 00000000    4 is bus type 7ff800 is io
    05C8A450   00000100 00000000 00000000 00000000       address of the bus

    The 8th word is a pointer to the next device in the linked list. In this case the 8th word is 00000000, indicating this is the only device.

Looking at an Illegal Trap

In some instances, an Illegal Trap Instruction may occur if some application unloads their SLIH or kernel extension, without having previously unpinned its memory. This can also happen if the Diagnostic Kernel Extension close routine is not called on exit.

If this happens when the debugger is enabled, a screen similar to the following may appear. The appearance of ff_free in the dump is the indicator that an application did not unpin some code before unloading.

The address passed to ff_free is in (r29) or r30. Use the (s)creen command to trace back until you see a familiar function name. In the following example, the SLIH mps_interrupt was indicated.

  1. Trap Occurs:
    GPR0  00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400
    GPR8  00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF
    GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000
    GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40
    
    MSR 00029030  CR   44224828  LR   0014032C  CTR   000908A8  MQ   00000000
    XER 00000000  SRR0 00140334  SRR1 00029030  DSISR 40000000  DAR  00000000
    
    IAR 00140334  (ORG+00140334)  ORG=00000000   Mode: VIRTUAL
    00140330   5400D97E 0C800000 387F0000 4BECADC5   |T..~....8...K...|
                        |    tweqi   r0,0x0
    00140340   81810058 30210050 7D8803A6 BBA1FFF4   |...X0!.P}.......|
    
                        |
    00140330   5400D97E 0C800000 387F0000 4BECADC5   |T..~....8...K...|
    00140340   81810058 30210050 7D8803A6 BBA1FFF4   |...X0!.P}.......|
    00140350   4E800020 00000000 00002041 80030100   |N.. ...... A....|
    00140360   00000000 00000174 00076666 5F667265   |.......t..ff_fre|
    00140370   65000000 80E20328 BF81FFF0 7C0802A6   |e......(....|...|
    00140380   2C070000 90010008 9421FFB0 3B830000   |,........!..;...|
    00140390   41820050 80E201E8 38640000 83810040   |A..P....8d.....@|
    
    Illegal Trap Instruction Interrupt in Kernel
    
    >0>
  2. Use (s)creen to display contents of R29:
    >0> s 1A1C5a0 100
    GPR0  00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400
    GPR8  00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF
    GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000
    GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40
    
    MSR 00029030  CR   44224828  LR   0014032C  CTR   000908A8  MQ   00000000
    XER 00000000  SRR0 00140334  SRR1 00029030  DSISR 40000000  DAR  00000000
    
    IAR 00140334  (ORG+00140334)  ORG=00000000   Mode: VIRTUAL
    00140330   5400D97E 0C800000 387F0000 4BECADC5   |T..~....8...K...|
                        |    tweqi   r0,0x0
    00140340   81810058 30210050 7D8803A6 BBA1FFF4   |...X0!.P}.......|
    
               |
    01A1C5A0   01A29850 0000A518 01DF0004 325E9F94   |...P........2^..|
    01A1C5B0   00000000 00000000 00481007 010B0001   |.........H......|
    01A1C5C0   00000BF0 0000010C 00000000 000000E4   |................|
    01A1C5D0   00000000 00000000 000000F0 00020001   |................|
    01A1C5E0   00020002 00040003 00020003 314C0000   |............1L..|
    01A1C5F0   00000000 00000000 00000000 00000000   |................|
    01A1C600   00000000 2E746578 74000000 00000000   |.....text.......|
  3. Press enter until you find a function name:
    >0> enter several times
    GPR0  00000000 2FF3B188 00192DF0 00000016 007FFFFF C0000000 00009030 2FF3B400
    GPR8  00000000 00000000 00000000 00000010 0014032C DEADBEEF DEADBEEF DEADBEEF
    GPR16 DEADBEEF DEADBEEF 200004B0 DEADBEEF DEADBEEF DEADBEEF 2FF3B2C0 00000000
    GPR24 00000000 00161BF8 C0000420 03762428 0015FF40 01A1C5A0 01A1C5A8 0015FF40
    
    MSR 00029030  CR   44224828  LR   0014032C  CTR   000908A8  MQ   00000000
    XER 00000000  SRR0 00140334  SRR1 00029030  DSISR 40000000  DAR  00000000
    
    IAR 00140334  (ORG+00140334)  ORG=00000000   Mode: VIRTUAL
    00140330   5400D97E 0C800000 387F0000 4BECADC5   |T..~....8...K...|
                        |    tweqi   r0,0x0
    00140340   81810058 30210050 7D8803A6 BBA1FFF4   |...X0!.P}.......|
    
               |
    01A1CDF0   41820010 306300CC 48000479 80410014   |A...0c..H..y.A..|
    01A1CE00   38600000 4800000C 3860FFFF 48000004   |8`..H...8`..H...|
    01A1CE10   80010088 7C0803A6 30210080 BBC1FFF8   |....|...0!......|
    01A1CE20   4E800020 00000000 00002041 80020201   |N.. ...... A....|
    01A1CE30   00000000 00000780 000D6D70 735F696E   |..........mps_in|
    01A1CE40   74657272 75707400 00000000 BDA1FFB4   |terrupt.........|
    01A1CE50   80A20004 39C30000 80650060 7C0802A6   |....9....e.`|...|

[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home | Legal | Search ]