Undetermined problems - (NF 8500R - Type 8681)


Undetermined problems

You are here because you have encountered a condition which could not be corrected using other parts of the service package.
Before proceeding, you should have already dealt with any beep codes, POST error codes, diagnostic error codes, error messages in the system error log, error messages on the front panel or video display, or any other symptoms addressed by other sections of this publication.

This section is structured so that if a step is not applicable to the service situation at hand, go to the next step, unless otherwise directed.

Notes:

  1.  If you suspect a software mismatch is causing failures (solid or intermittent), be sure to see 'Resolving configuration conflicts'.
  2.  A corrupt CMOS can cause undetermined problems.


Note: If the problem goes away when you remove an adapter from the system, and replacing that adapter does not correct the problem, suspect the I/O board and then the processor board(s).

  1.  Look at the front of the system and see if any amber component fault indicator LEDs are lit.
     The component fault indicator LEDs correspond to memory DIMMs and processors installed in the system.
     A lit LED indicates the failing memory DIMM or processor.
  2.  If no component fault indicator LEDs are lit, proceed to 6.
  3.  If one or more component fault indicator LEDs are lit, replace the FRU corresponding to one of the  component fault indicator LEDs.
  4.  Retest the system.
     If it boots properly and all component fault indicator LEDs are unlit, go to (below) step 32.
     If there are still remaining component fault indicator LEDs lit, continue replacing the corresponding FRUs one at a time  until all component fault indicator LEDs remain unlit.
  5.  If the FRU(s) identified by the component fault indicator LED did not correct the problem, replace the  original FRU(s) and proceed with 6.
  6.  Check the LEDs on all the power supplies, see 'Power supply LED errors'.
     If the power supply LEDs indicate the power supplies are not working correctly,  follow the guidance in 'Power supply LED errors' and correct that problem before proceeding.
     If the LEDs indicate the power supplies are working correctly, do the following:

    1.  If a component has been added, reseat the added component and components around it, and retest.
    2.  If the system has been moved recently, reseat all the components and retest.
    3.  If a PCI adapter or a device attached to a PCI adapter that was previously configured is now  missing, suspect FRUs in the following order:

        1) PCI adapter
        2) Device attached to the PCI adapter
        3) I/O board
        4) If a non-PCI media device that was previously configured
        is now missing, suspect FRUs in the following order:
        a) Media device
        b) I/O function card
        c) Media signal cable
        d) Media power cable

  7.  Check the System Error log by booting to the System Configuration/Setup Utility (if possible), or accessing  remotely through the System Management Adapter.
     Locate any error messages generated just prior to the system error.
     Investigate the errors in the order they were generated, since the root cause may generate multiple errors.

    Note: A single problem might cause several error messages.
     When this occurs, work to correct the cause of the first error message.
     After the cause of the first error message is corrected, the other error messages usually will not occur the next  time you run the test.

  8.  Check the LEDs on the I/O board, see 'I/O board component locations'.

    1.  Check Power Good LEDs.
       If they are on, go to 8b. If not on, reseat the I/O board in the midplane.
       If the Power Good LEDs are still not lit, replace power control card.
    2.  Check PCI slot power LEDs. If an adapter is present and its PCI power LED is on, proceed to 8c.
       If an adapter is present but the power LED is not on, ensure the hot plug switch is closed, then  suspect the following FRUs in this order:

        1) PCI adapter
        2) PCI sense card
        3) I/O board.

    3.  If no PCI adapter are installed, proceed to step 8d, if any PCI adapters are installed,  check their PCI slot attention lights (others may be flashing, but focus on those with adapters installed).
       If a PCI adapter is present and its attention light is flashing, suspect the following FRUs in this order:

        1) PCI adapter
        2) I/O board
        3) I/O function card

    4.  If no PCI adapters are installed and one or more PCI attention lights flashing, suspect FRUs in this order:

        1) I/O board
        2) I/O function card

    5.  Power-off the computer, wait 30 seconds and then power-on the computer.
    6.  If the system boots at least as far as the IBM logo screen, go to step 16.
       If the system will not boot to the IBM logo screen, consider the following note and continue with 8g.

      Note: Minimum partial-boot requirements are:

      •  I/O Board
      •  I/O Function Card
      •  Midplane
      •  Power Control Card
      •  Power Supply (1)
      •  LED card
      •  Processor Daughter Board (1)
      •  Processor Controller Board
      •  Processor (1)
      •  Memory Card (1)
      •  DIMM (Minimum requirement = 1 DIMM of 128 MB or larger)
      •  Video Monitor (Display)

    7.  Reduce the server to the minimum partial-boot configuration (see note).
    8.  Retest the system. Use jumper J11 on the I/O board to force power on since the front panel is disconnected.
    9.  If the system does not boot as far as the IBM logo screen, go to step 10.
    10.  If the system boots to the IBM logo screen, power-off the system, remove jumper J11 on the I/O board, and add components back into the  system one at a time and reboot the system until the system fails to boot as far as the IBM logo screen.
       This narrows the possibilities of the failing FRU to the last FRU added, or the board it plugs into.

      Note: In the case of multiple FRUs, such as power supplies, processors, processor terminators,  processor daughter boards, memory cards, and DIMMS, it is prudent to verify the function of each FRU in the same position  or slot prior to installing multiples of the FRU.
       This way, you are working with known-good FRUs.
       A failure when multiples are installed indicates failure of the card the multiple FRUs plug into, or another  related FRU, rather than the FRU you just installed.
       Consider these possibilities:

        1. If all processors work in position A-1 but the system fails
        with processors in A-1 plus any other slot on processor
        daughterboard A, suspect the processor daughter board followed
        by the processor controller board.
        2. If all power supplies work in position 1 but fail in
        slots 2 or 3, suspect the midplane or power control card.
        3. If all memory DIMMs work in memory board A slot J1 but fail
        in any other slot, suspect the memory board, followed by the
        processor control board and the midplane.
        4. If a populated memory card works in position A but fails in
        position B, suspect the processor controller board followed
        by the the midplane.

       Components should be added in the following order:

        1) Front panel
        (after which you no longer need to use Jumper J11 on the I/O board to force power on)
        2) Power supplies (one at a time)
        3) Processors (one at a time)

      Note: Addition of the 5th processor automatically requires installation of the second processor  daughter card and the cache coherency filter DIMMs as well.
       Any of these may be the failing FRU if the system fails at this point.
       If failure occurs when this combination is installed, do the following:

        1. Remove the cache coherency filter DIMMs.
        2. Remove the first processor daughter board with all its processors.
        3. Reboot the system.

       If the system boots to the IBM logo screen, the one or more of the cache coherency filter DIMMs were the failing  FRUs. Replace the pair of DIMMs.
       If the system does not boot with this combination of processor and processor daughter card, do the following:

        1. Swap the processor with one from the first processor daughter card
        2. Reboot the system.

       If the system boots to the IBM logo screen, the processor just removed is the failing FRU. Replace it.  If the system does not boot to the IBM logo screen, the processor daughter board is the failing FRU. Replace it.

    11.  Processor terminator cards (one at a time)
    12.  Memory card (maintaining DIMM pairs across the memory cards)
    13.  Memory DIMM (maintaining DIMM pairs across the memory cards)
    14.  System management adapter
    15.  SCSI backplane
    16.  Hardfiles (one at a time)
    17.  CD-ROM drive
    18.  Diskette drive
    19.  Fans (one at a time)
    20.  PCI card (one at a time)
    21.  External device (one at a time)

  9.  Once the failing FRU is identified, power off the system, replace the failing FRU and go to step 12.
  10.  If the server does not boot as far as the IBM logo screen with the minimum partial-boot configuration,  one of the FRUs in the current configuration is faulty.
     Do the following:

    1.  If the system had multiple processors, processor daughter boards, memory cards, DIMMs or power supplies  in its original configuration, swap removed FRUs with those currently installed in the minimum partial-boot configuration and  reboot to see if there is any change in system behavior.
       If the system boots to the IBM logo screen after swapping a FRU, the last FRU removed is the failing FRU.
       Swap FRUs one at a time in this order:

        1) Power supplies
        2) Processors
        3) Processor terminator cards
        4) Processor daughter boards
        5) Memory boards
        6) DIMMs

    2.  Continue with this step until the failing FRU has been identified, or all multiple FRUs have been  cycled through the minimum partial-boot configuration.
    3.  If the failing FRU has been identified, replace it and go to 12.
    4.  If all multiple FRUs have been cycled through the minimum partial-boot configuration and the  failing FRU has not been identified, go to 11.

  11.  One of the FRUs remaining in the configuration is the failing FRU.
     Replace FRUs in the order listed below.
     The FRU which allows the system to boot as far as the IBM logo screen is the failing FRU:

    1.  Memory board (if the original configuration had only one)
    2.  DIMM (if the original configuration had only one)
    3.  Processor (if the original configuration had only one)
    4.  Processor terminator card (if the original configuration had only one)
    5.  Processor daughter board (if the original configuration had only one)
    6.  Processor controller board
    7.  I/O board
    8.  I/O function card
    9.  Power control card
    10.  Midplane

  12.  Restore the server to its original configuration.
  13.  Retest the system without external devices attached.
  14.  Attach external devices.
  15.  Verify the system repair and return the system to the customer.
  16.  If the system will boot at least as far as the IBM Logo screen, consider the following note and continue with (below) step 17.

    Note: Minimum full-boot requirements are:

  17.  Reduce the system to the minimum full-boot configuration (see note).
  18.  Retest the system.
     Use jumper J11 on the I/O board to force power on since the front panel is disconnected.
  19.  If the system does not boot to the operating system, go to step 21.
     If the system boots to the operating system, shutdown the server and add FRUs back into the configuration one at a time until the  system will no longer boot to the operating system.
     This narrows the possibilities of the failing FRU to the last FRU added, or the board it plugs into.

    Note: In the case of multiple FRUs, such as power supplies, processors, processor terminators,  memory cards, and DIMMS, it is prudent to verify the function of each FRU in the same position or slot prior  to installing multiples of the FRU.
     This way, you are working with known-good FRUs.
     A failure when multiples are installed indicates failure of the card the multiple FRUs plug into, or  another related FRU, rather than the FRU you just installed.
     Consider these possibilities:

     1. If all processors work in position A-1 but the system fails with processors in A-1 plus any  other slot on processor daughterboard A, suspect the processor daughter board followed by the  processor controller board.

     2. If all power supplies work in position 1 but fail in slots 2 or 3, suspect the midplane or power control card.

     3. If all memory DIMMs work in memory board A slot J1 but fail in any other slot, suspect the memory board, followed by the  processor control board, and the midplane.

     4. If a populated memory card works in position A but fails in position B, suspect the processor controller board followed by the  the midplane.

     Components should be added in the following order:

    1.  Front panel (after which you no longer need to use Jumper J11 on the I/O board to force power on)
    2.  Power supplies (one at a time)
    3.  Processors (one at a time)

      Note: Addition of the 5th processor automatically requires installation of the second processor  daughter card and the cache coherency filter DIMMs as well.
       Any of these may be the failing FRU if the system fails at this point.
       If failure occurs when this combination is installed, do the following:

        1. Remove the cache coherency filter DIMMs.
        2. Remove the first processor daughter board with all its processors.
        3. Reboot the system.

       If the system boots to the IBM logo screen, the one or more of the cache coherency filter DIMMs were the failing FRUs.  Replace the pair of DIMMs.
       If the system does not boot with this combination of processor and processor daughter card, do the following:

        1. Swap the processor with one from the first processor daughter card
        2. Reboot the system.

       If the system boots to the IBM logo screen, the processor just removed is the failing FRU. Replace it.
       If the system does not boot to the IBM logo screen, the processor daughter board is the failing FRU. Replace it.

    4.  Processor terminator cards (one at a time)
    5.  Memory card (maintaining DIMM pairs across the memory cards)
    6.  Memory DIMM (maintaining DIMM pairs across the memory cards)
    7.  System management adapter
    8.  PCI card (one at a time)
    9.  External device (one at a time)

  20.  Once the failing FRU is identified, power off the system, replace the failing FRU and go to (below) step 29.
  21.  If the server boots past the IBM logo, but will not boot to the operating system, do the following:

    1.  Reboot to the configuration/setup utility.
    2.  Use the default configuration
    3.  Exit the configuration/setup utility and reboot the system.
    4.  After the IBM logo screen disappears, watch the video display to see if the boot device is listed  during the SCSI polling sequence.
       Displayed on the screen will be a line of text that says the following:

        Press <<Ctrl-A for SCSIselect(TM) Utility>>

       Immediately after you see that line of text, the SCSI ports and devices will be posted to the screen as they are identified.

    5.  Watch the video display to see if the boot hardfile is listed.

  22.  If the boot device is listed, but the system will not boot to the operating system, replace FRUs in the  following order:

    1.  Boot device
    2.  I/O function card
    3.  I/O Planar (if your operating system is NT or SCO Unix)

  23.  If the boot device is not listed on the SCSI device screen and the boot device is an internal SCSI drive,  move the boot drive to the other internal SCSI bay and reboot.
     If the drive is now listed on the SCSI screen, replace the SCSI backplane and go to (below) step 29.
  24.  Move the SCSI cable from the internal SCSI port to the external SCSI port. Reboot the system.  If the drive is now listed on the SCSI screen, replace the I/O function card and go to (below) step 29.
  25.  If you still do not see the boot device listed on the SCSI screen, suspect FRUs in the following order:

    1.  Boot hardfile
    2.  SCSI cable
    3.  I/O function card
    4.  SCSI backplane

  26.  If the boot device is displayed on the SCSI screen but you are unable to boot to the operating system,  suspect the following:

    1.  Corrupt boot code
    2.  Boot hardfile
    3.  SCSI cable
    4.  I/O function card
    5.  SCSI backplane

  27.  If the operating system begins to load and the system hangs, suspect FRUs in the order listed.
     The FRU replaced which allows the system to boot to the operating system identifies the failing FRU.

    1.  Memory Board
    2.  I/O Board
    3.  I/O function card
    4.  Processor controller board
    5.  Processor daughter card
    6.  Processor
    7.  Processor terminator card
    8.  DIMM
    9.  Midplane
    10.  Power control card
    11.  Power supply

  28.  Power off the server.
  29.  Restore the server to its original configuration.
  30.  Retest the system without external devices attached.
  31.  Attach external devices.
  32.  If memory or processor component fault indicators were lit and you replaced any of those FRUs, boot  the system to the Configuration/Setup utility and enable the slots which were automatically disabled  when the component fault indicators were activated.
  33.  Verify the system repair and return the system to the customer.


Back to  Jump to TOP-of-PAGE

Please see the LEGAL  -  Trademark notice.
Feel free - send a Email-NOTE  for any BUG on this page found - Thank you.