IBM Books

Diagnosis Guide


Error information

AIX Error Log information

In order to isolate an adapter or SP Switch error, first view the AIX error log.

The Resource Name (Res Name) in the error log gives you an indication of what resource detected the failure.

Table 16. Resource Name failure indications - SP Switch

Resource name Indication
Worm The information was extracted from the SP Switch worm subsystem for SP Switch initialization.
css Incorrect status was detected by the adapter (css device driver) for SP Switch recovery.
css0 css failed adapter diagnostics for SP Switch adapter recovery.

For a more detailed description, issue the AIX command:

errpt -a [-N resource_name] | more

where the optional resource_name is one of the entries in Table 16.

An error which was reported by the fault service daemon may be related to a more detailed message, or to a previous related error message saved in the stack file. View these linked related errors by issuing the FFDC fcreport command. The other trace files, and in particular the flt file, may give more information about the problem. Neighboring error log entries may give detailed information as well. For more information on FFDC services and tools, see First Failure Data Capture Programming Guide and Reference.

There are several subcomponents that write entries in the AIX error log:

  1. TB3, TB3MX, or TB3PCI adapter errors, which are recognized by labels with a prefix of: TB3_.

    Table 17. Possible causes of adapter failures - SP Switch

    Label and Error ID Error description and analysis
    TB3_SLIH_ER

    BD3BFA74

    Explanation: SP Switch adapter interrupt handler error.

    Cause: SP Switch adapter or SP Switch failure.

    Action:

    • Search for more logged information.
    • Run adapter diagnostics.
    • Call IBM Hardware service if the problem persists.
    TB3_SVC_QUE_FULL_ER

    9311420B

    Explanation: SP Switch adapter service interface overrun.

    Cause: SP Switch adapter or SP Switch failure.

    Action: Call IBM Hardware service if the problem persists.

    TB3_CONFIG1_ER

    A4BFBE61

    Explanation: Failed to update the ODM during CSS configuration.

    Cause: Software error

    Action: Run configuration method with the verbose option. See Adapter configuration error information.

    TB3_PIO_ER

    4E93C6F0

    Explanation: An I/O error was received on the SP Switch adapter device driver.

    Cause: SP Switch adapter hardware failure.

    Action: Run adapter diagnostics. Call IBM Hardware service if the problem persists.

    TB3_HARDWARE_ER

    17B8B70C

    Explanation: SP Switch adapter hardware or microcode error.

    Cause: SP Switch adapter hardware failure or microcode error.

    Action: Run adapter diagnostics. Call IBM Hardware service if the problem persists.

    TB3_MICROCODE_ER

    0A67508C

    Explanation: SP Switch adapter hardware or microcode error.

    Cause: SP Switch adapter or microcode failure.

    Action: Run adapter diagnostics. Call IBM Hardware service if the problem persists.

    TB3_LINK_RE

    3A1947F1

    Explanation: SP Switch adapter link outage occurred.

    Cause: The node is fenced.

    Action: Run Eunfence to unfence the node.

    Cause: A cable is loose, disconnected, or faulty.

    Action: Run Cable diagnostics.

    TB3_BAD_PACKET_RE

    1102E5B2

    Explanation: Bad packet received. This entry always appears together with TB3_BAD_PKT_CONT_RE.

    Cause: An SP Switch cable failure.

    Action: Run Cable diagnostics.

    Cause: SP Switch adapter or SP Switch failure.

    Action:

    • Run adapter diagnostics.
    • Call IBM Hardware service if the problem persists.
    TB3_BAD_PKT_CONT_RE

    448DCF0D

    Explanation: More detailed information about a bad packet received error. This entry completes the bad packet detailed information started in the previous TB3_BAD_PACKET_RE entry.
    TB3_TRANSIENT_RE

    E94651FA

    Explanation: SP Switch adapter transient error.

    Cause: A loose, disconnected, or faulty cable.

    Action: Run Cable diagnostics.

    TB3_FAILURE_CONT_RE

    EBF464AF

    Explanation: More detailed information about an SP Switch adapter transient, link, hardware or microcode error. This entry completes the detailed information started in the previous TB3_TRANSIENT_RE, TB3_LINK_RE, TB3_MICROCODE_ER or TB3_HARDWARE_ER entries.
    TB3_THRESHOLD_ER

    6E31CB46

    Explanation: SP Switch adapter error threshold exceeded. The number of bad packets exceeded the threshold level.

    Cause: A loose, disconnected, or faulty cable.

    Action: Run Cable diagnostics.

  2. TB3MX adapter errors, which are recognized by labels with a prefix of: TB3MX_.

    Table 18. Possible causes of TB3MX specific adapter failures

    Label and Error ID Error Description and Analysis
    TB3MX_ACCESS_ER

    417A4BCB

    Explanation: SP Switch adapter TB3MX user access error. A user process attempted to read or write a no access region of a switch adapter.

    Cause: User software error.

    Action: Review any user applications that have access to a an SP Switch adapter, such as those that issue LAPI, MPI or switch clock API calls.

  3. Switch fault service daemon errors, which are recognized by labels with a prefix of: SP_SW_, TBS_ , SP_CSS_, SP_CLCK, or SP_MCLCK.

    Table 19. Possible causes of fault service daemon failures - SP Switch

    Label and Error ID Error description and analysis
    SP_SW_RECV_STATE_RE

    2A095362

    Explanation: SP Switch receiver state machine error.

    Cause: SP Switch adapter or SP Switch failure.

    Action: Call IBM Hardware Service if the problem persists.

    SP_SW_PE_ON_DATA_RE

    EBC5B66E

    Explanation: SP Switch sender parity error on data.

    Cause: An SP Switch board failure.

    Action: Call IBM Hardware Service.

    SP_SW_INVALD_RTE_RE

    BCCF8CCD

    Explanation: SP Switch sender invalid route error.

    Cause: SP Switch adapter microcode or a switch daemon software error.

    Action: Call the IBM Support Center.

    SP_SW_EDC_ERROR_RE

    1A508382

    Explanation: Receiver EDC class error.

    Cause: A transient error in data occurred during transmission over switch links. The EDC error may be one of the following:

    • A receiver EDC error.
    • A parity error on route.
    • An undefined control character was received.
    • Unsolicited data was received.
    • A receiver lost end-of-packet.
    • A token count mismatch.
    • A token sequence error.
    • A token count overflow.

    Cause: A loose, disconnected, or faulty cable

    Action:

    Cause: A node was shutdown, reset, powered off, or disconnected.

    Action: See Verify SP Switch node operation.

    Cause: SP Switch adapter hardware failure.

    Action:

    SP_SW_SNDLOSTEOP_RE

    F9453ADD

    Explanation: SP Switch sender lost EOP (end-of-packet) condition.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    Cause: A node was shutdown, reset, powered off, or disconnected.

    Action: See Verify SP Switch node operation.

    Cause: SP Switch adapter hardware failure.

    Action:

    SP_SW_RCVLNKSYNC_RE

    B4C98286

    Explanation: SP Switch receiver link synchronization error or a switch chip lost clock synchronization on one of its receive ports.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    • Check, reconnect, or replace the cable
    • See /var/adm/SPlogs/css/out.top for cable information.
    • See Cable diagnostics.

    Cause: A node was shutdown, reset, powered off, or disconnected.

    Action:

    Cause: SP Switch adapter hardware failure.

    Action:

    • Run adapter diagnostics.
    • See /var/adm/SPlogs/css/flt for more information.

    Cause: Remote switch adapter hardware failure.

    Action:

    • Run adapter diagnostics on remote node.
    • See /var/adm/SPlogs/css/flt to identify remote node.
    SP_SW_EDCTHRSHLD_RE

    E940D28E

    Explanation: SP Switch receiver EDC errors exceeded the threshold level.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    • See Cable diagnostics.
    • See /var/adm/SPlogs/css/flt for more information.
    • Call IBM Hardware Service if the problem persists.
    SP_SW_SND_STATE_RE

    6116CB9D

    Explanation: SP Switch sender state machine error.

    Cause: SP Switch adapter or SP Switch failure.

    Action: Call IBM Hardware Service if the problem persists.

    SP_SW_FIFOOVRFLW_RE

    D06B4832

    Explanation: SP Switch receiver FIFO overflow error.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    Cause: A node was shutdown, reset, powered off, or disconnected.

    Action:

    Cause: SP Switch adapter hardware failure

    Action:

    SP_SW_SNDTKNTHRS_RE

    C948155E

    Explanation: SP Switch sender token errors exceeded the threshold level.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    • See Cable diagnostics.
    • Call IBM Hardware Service if the problem persists.
    • See /var/adm/SPlogs/css/flt for more information.
    SP_SW_PE_ON_NMLL_RE

    9B7A16D3

    Explanation: NMLL Switch central queue parity error.

    Cause: A switch board failure.

    Action: Call IBM Hardware Service.

    SP_SW_PE_ON_NCLL_RE

    2E4C02D8

    Explanation: NCLL Switch central queue parity error.

    Cause: A switch board failure.

    Action: Action: Call IBM Hardware Service.

    SP_SW_NCLL_UNINT_RE

    840FBC5A

    Explanation: NCLL switch central queue was not initialized.

    Cause: A switch board failure.

    Action: Call IBM Hardware Service.

    SP_SW_SNDLNKSYNC_RE

    33164DD2

    Explanation: SP Switch sender link synchronization error. Switch chip lost synchronization clock on one of its send ports.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    • Check, reconnect, or replace the cable.
    • See /var/adm/SPlogs/css/out.top for cable information.
    • See Cable diagnostics.

    Cause: A node was shut down, reset, powered off, or disconnected

    Action:

    • Replace the switch cable from the powered-off node with a wrap plug.
    • See /var/adm/SPlogs/css/flt for more information.
    • See Verify SP Switch node operation.

    Cause: A switch adapter hardware failure.

    Action:

    Cause: A remote switch adapter hardware failure.

    Action:

    SP_SW_SVC_PKTLEN_RE

    BE19105C

    Explanation: SP Switch service logic encountered an incorrect packet length.

    Cause: SP Switch adapter microcode error or a switch daemon software error.

    Action:

    • See /var/adm/SPlogs/css/flt for more information.
    • Call the IBM Support Center if the problem persists.
    SP_SW_PE_INBFIFO_RE

    8E1C91D6

    Explanation: SP Switch service logic detected a bad parity in FIFO.

    Cause: A switch board failure.

    Action: Call IBM Hardware Service.

    SP_SW_CRC_SVCPKT_RE

    642E0890

    Explanation: SP Switch service logic detected an incorrect CRC on a service packet.

    Cause: A transient error in the data occurred during transmission over the switch links.

    Action: See /var/adm/SPlogs/css/flt for more information.

    Cause: A loose, disconnected, or faulty cable.

    Action:

    Cause: A node was shutdown, reset, powered off, or disconnected.

    Action:

    Cause: A switch adapter hardware failure.

    Action:

    SP_SW_PE_RTE_TBL_RE

    5FEAC531

    Explanation: SP Switch service logic determined bad parity in the route table.

    Cause: A switch board failure.

    Action: Call IBM Hardware Service.

    SP_SW_LNK_ENABLE_RE

    732582E8

    Explanation: SP Switch service logic detected an invalid link enable value.

    Cause: A switch daemon software error.

    Action: Call the IBM Support Center.

    SP_SW_SEND_TOD_RE

    28F6B80F

    Explanation: SP Switch service logic send TOD error.

    Cause: A switch daemon software error.

    Action: Call the IBM Support Center.

    SP_SW_SVC_STATE_RE

    E51188C7

    Explanation: SP Switch service logic state machine error.

    Cause: A switch board failure.

    Action: Call IBM Hardware Service if the problem persists.

    SP_SW_OFFLINE_RE

    DC85B51D

    Explanation: Node fence request received.

    Cause: The operator ran the Efence command.

    Action: Run the Eunfence command to bring the node onto the switch.

    SP_SW_PRI_TAKOVR_RE

    80E1023C

    Explanation: SP Switch primary node takeover.

    Cause: The switch primary node became inaccessible.

    Action: See the error log on the previous switch primary node.

    SP_SW_BCKUP_TOVR_RE

    CE1A52F8

    Explanation: SP Switch primary-backup node takeover.

    Cause: The primary backup node became inaccessible.

    Action:

    • See /var/adm/SPlogs/css/out.top for cable information.
    • See the error log on the previous switch primary backup node.
    SP_SW_LST_BUP_CT_RE

    A5D521DE

    Explanation: Primary-backup node not responding.

    Cause: The primary backup node became inaccessible.

    Action: See the error log on the current switch primary backup node.

    SP_SW_UNINI_NODE_RE

    A4653F22

    Explanation: SP Switch nodes that are listed were not initialized during Estart command processing.

    Cause: These nodes was shutdown, reset, powered off, or disconnected.

    Action:

    Cause: Nodes fenced.

    Action: Run the Eunfence command.

    Cause: SP Switch adapter problem.

    Action: Run adapter diagnostics.

    SP_SW_UNINI_LINK_RE

    DB5B6D5D

    Explanation: SP Switch links were not initialized during Estart command processing.

    Cause: The switch cable is not wired correctly.

    Action: See /var/adm/SPlogs/css/cable_miswire to determine if cables were not wired correctly.

    Cause: Nodes fenced.

    Action: Run the Eunfence command.

    Cause: A loose, disconnected, or faulty cable

    Action: See Cable diagnostics.

    SP_SW_SIGTERM_ER

    6E0AA114

    Explanation: SP Switch fault service daemon received SIGTERM.

    Cause: Another process sent a SIGTERM.

    Action: Run the `rc.switch command to restart switch daemon.

    SP_SW_GET_SVCREQ_ER

    1861B4F6

    Explanation: Switch daemon could not get a service request.

    Cause: A switch kernel extension error.

    Action: Call the IBM Support Center.

    SP_PROCESS_KILLD_RE

    DD37D0E5

    Explanation: User process was killed due to a link outage.

    Cause: A switch adapter or switch failure.

    Action: See neighboring error log entries to determine the cause of the outage.

    Cause: The operator fenced this node.

    Action: Eunfence the node.

    SP_SW_MISWIRE_ER

    C6A41256

    Explanation: Switch cable miswired (not connected to the correct switch jack).

    Cause: The switch cable was not wired correctly.

    Action: See /var/adm/SPlogs/css/cable_miswire to determine if cables were not wired correctly.

    SP_SW_LOGFAILURE_RE

    D35D236E

    Explanation: Error writing to switch log files.

    Cause: The /var file system is full.

    Action: Obtain free space in the file system or expand the file system.

    Cause: There are too many files open in the system.

    Action: Reduce the number of open files in the system.

    SP_SW_INIT_FAIL_ER

    071B5E7A

    Explanation: Switch fault service daemon initialization failed.

    Cause: The operating environment could not be established.

    Action:

    • See Detail Data in this error log entry, for the specific failure.
    • Correct the problem and restart the daemon, by issuing the rc.switch command.
    • Call the IBM Support Center if the problem persists.
    SP_SW_SVC_Q_FULL_RE

    42B75697

    Explanation: Switch service send queue is full.

    Cause: There is a traffic backlog on the switch adapter.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_RSGN_PRIM_RE

    66B1169C

    Explanation: Resigning switch primary node responsibilities.

    Cause: Could not communicate over the switch.

    Action: See neighboring error log entries to determine the cause of the outage.

    Cause: Another node was selected as primary node.

    Action: None.

    SP_SW_RSGN_BKUP_RE

    FDC35FFD

    Explanation: Resigning as the switch primary-backup node.

    Cause: Could not communicate over the switch.

    Action: See neighboring error log entries to determine the cause of the outage.

    Cause: Another node was selected as the primary backup node.

    Action: None.

    SP_SW_ACK_FAILED_RE

    A6524825

    Explanation: Switch fault service daemon acknowledge of a service command failed.

    Cause: A switch communications failure.

    Action: Call the IBM Support Center if the problem persists.

    Cause: A traffic backlog on the switch adapter.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_SDR_FAIL_RE

    CD9CE5B3

    Explanation: Switch fault service daemon failed to communicate with the SDR.

    Cause: An Ethernet overload.

    Action: Call the IBM Support Center if the problem persists.

    Cause: Excessive SDR traffic.

    Action: Call the IBM Support Center if the problem persists.

    Cause: The SDR daemon or the control workstation is down.

    Action: Check to see if the SDR daemon is up.

    Cause: A software error.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_SCAN_FAIL_ER

    90AB4ED5

    Explanation: Switch scan failed.

    Cause: Could not communicate over the switch.

    Action: Issue the Estart command if primary takeover does not occur.

    Cause: A switch adapter or switch failure.

    Action: Issue the Estart command if primary takeover does not occur.

    SP_SW_NODEMISW_RE

    F402BA0E

    Explanation: SP Switch node miswired.

    Cause: A switch cable was not plugged into the correct node.

    Action: See /var/adm/SPlogs/css/cable_miswire to determine if cables were not wired correctly.

    SP_SW_RTE_GEN_RE

    7F223C8F

    Explanation: Switch fault service daemon failed to generate routes.

    Cause: A software error.

    Action: Call the IBM Support Center.

    SP_SW_FENCE_FAIL_RE

    5515EE64

    Explanation: Fence of a node failed.

    Cause: Could not communicate over the switch.

    Action:

    • See /var/adm/SPlogs/css/flt for more information.
    • See the error log on the failing node.
    • Issue the Estart command to initialize the switch network.
    SP_SW_REOP_WIN_ER

    5988A5BF

    Explanation: Switch fault service daemon reopen adapter windows failed.

    Cause: A switch kernel extension error.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_ESTRT_FAIL_RE

    F9A6917F

    Explanation: Estart failed - switch network could not be initialized.

    Cause: Could not initialize switch chips or nodes.

    Action:

    • See Detail Data in this error log entry, for specific failure.
    • Issue Eclock -d to reset the switch network and reestablish switch clocking.
    • Call the IBM Support Center if the problem persists.
    SP_SW_CBCST_FAIL_RE

    275E6902

    Explanation: Switch fault service daemon command broadcast failed.

    Cause: Could not communicate over the switch.

    Action: Call the IBM Support Center if the problem persists.

    Cause: A traffic backlog on the switch adapter.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_UBCST_FAIL_RE

    04839F9e

    Explanation: Switch fault service daemon DBupdate broadcast failed.

    Cause: A switch communications failure.

    Action: Call the IBM Support Center if the problem persists.

    Cause: A traffic backlog on the switch adapter.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_DNODE_FAIL_RE

    EAC0A694

    Explanation: Switch fault service daemon failed to communicate with dependent nodes.

    Cause: A failure to communicate over the switch.

    Action: Call the IBM Support Center if the problem persists.

    Cause: A traffic backlog on the switch adapter.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_IP_RESET_ER

    F323DBDB

    Explanation: Switch fault service daemon could not reset IP.

    Cause: A switch kernel extension error.

    Action: Call the IBM Support Center if the problem persists.

    SP_SW_PORT_STUCK_RE

    A45FB6AB

    Explanation: Switch port cannot be disabled. Eunfence failed.

    Cause:

    • switch-chip or adapter hardware error
    • cable failure

    Action:

    • See /var/adm/SPlogs/css/worm.trace on the primary node for more information.
    • Run adapter diagnostics on the node that failed to unfence.
    SP_SW_CATALOG_OPEN_ST

    7962AB9A

    Explanation: Failed to open the switch catalog file.

    Cause: Too many files are opened.

    Action: Rerun the rc.switch script.

    Cause: Catalog file not found.

    Action: Check installation.

    SP_SW_DIST_TOP_RE

    4FFEADB6

    Explanation: Switch topology file distribution failed to the listed nodes during autojoin.

    Action: See more information in file /var/adm/SPlogs/css/dist_topology.log.

    Cause: Ethernet connection lost.

    Action: Check Ethernet cable and configuration.

    Cause: kerberos security lost.

    Action: Get kerberos key. See kerberos command.

    Cause: Not enough space left in the node's /etc/SP file system.

    Action: Free more space.

    SP_CLCK_MISS_RE

    0A137C5D

    Explanation: Switch (non-master) lost clock.

    Cause: Switch clock signal is missing.

    Action:

    SP_SW_FSD_TERM_ER

    E8817142

    Explanation: Switch fault service daemon process was terminated.

    Cause: Faulty switch adapter or missing external clock source.

    Action:

    Cause: Faulty system planar.

    Action: Run complete diagnostics on the node. If diagnostics fail to isolate the problem, contact IBM Hardware Service.

    SP_MCLCK_MISS_RE

    7A3D0758

    Explanation: Switch (master oscillator) lost clock.

    Cause:

    • A node or switch board lost power.
    • A switch board failure.
    • A user incorrectly clocked the system.

    Action:

    • See Clock diagnostics.
    • Run the Estart command to initialize the switch network.
    • Call IBM Hardware service if the problem persists.
    SP_SW_ECLOCK_RE

    9EEBC3C4

    Explanation: Eclock command issued by user.

    Cause: Eclock command run by the administrator.

    Action: Run the Estart command to initialize the switch network.

    SP_SW_ADPT_BUS_ER

    2ABBD2C0

    Explanation: Switch adapter bus signal received.

    Cause: Bus error occurred during adapter access. Adapter configuration or hardware failure.

    Action: Call the IBM Support Center.

    TBS_HARDWARE_ER

    F3448CD2

    Explanation: Switch board hardware error.

    Cause: Switch board failure.

    Action: call IBM Hardware service.

    SP_CSS_IF_FAIL_ER

    FCACBBD3

    Explanation: Switch adapter interface system call failed.

    Cause: Could not communicate with the switch adapter.

    Action:

    • Check the switch adapter configuration.
    • Run adapter diagnosis, see SP Switch adapter diagnostics.
    • Call IBM Hardware service if the problem persists.
  4. Adapter Diagnostic errors, which are recognized by labels with a prefix of: SWT_DIAG_

    Table 20. Possible causes of adapter diagnostic failures - SP Switch

    Label and Error ID Error description and analysis
    SWT_DIAG_ERROR1_ER

    8998B96D

    Explanation: SP Switch adapter failed post-diagnostics, see diag command.

    Cause: Faulty switch adapter or missing external clock source.

    Action: See SP Switch adapter diagnostics.

    SWT_DIAG_ERROR2_ER

    2FFF253A

    Explanation: Switch adapter failed pre-diagnostics, failed power-on-self-test diagnostics.

    Cause: Faulty switch adapter or missing external clock source.

    Action: See SP Switch adapter diagnostics.

Adapter configuration error information

The following table lists the possible values of the adapter_config_status attribute of the switch_responds object of the SDR. Use the following command to determine its value:

SDRGetObjects switch_responds

Use the value of the adapter_config_status attribute for the node in question, to index into Table 21.

Note: The adapter_config_status table that follows uses the phrase "adapter configuration command". This refers to the SP Switch adapter (CSS adapter) configuration method. Use the following syntax to invoke it:

/usr/lpp/ssp/css/cfgtb3 -v -l css0 > output_file_name

Table 21. adapter_config_status values - SP Switch

adapter_config_status Explanation and recovery
css_ready Correctly configured CSS adapter.
odm_fail

genmajor_fail

genminor_fail

getslot_fail

build_dds_fail

Explanation: An ODM failure has occurred while configuring the CSS adapter.

Action: Rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply the command output.

Iname_error Explanation: The device logical name specified on the CSS adapter configuration command was incorrect.

Action: Rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply the command output.

undefine_system_fail

define_system_fail

xilinx_system_fail

Explanation: The System Standard C Library Subroutine failed during CSS adapter configuration.

Action: Rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply the command output.

define_fail Explanation: The current instance of the CSS logical device could not be redefined.

Action: Rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply the command output.

chkslot_fail Verify that the CSS adapter is properly seated, then rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply the command output.
busresolve_fail Explanation: There are insufficient bus resources to configure the CSS adapter.

Action: Contact the IBM Support Center.

xilinx_load_fail

dd_load_fail

fs_load_fail

Action: See Verify software installation. If software installation verification is successful and the problem persists, contact the IBM Support Center.
make_special_fail Explanation: The CSS device special file could not be created during adapter configuration.

Action: Rerun the adapter configuration command. If the problem persists, contact the IBM Support Center and supply them with the command output.

dd_config_fail

fs_init_fail

Explanation: An internal device driver error occurred during CSS adapter configuration.

Action: See Information to collect before contacting the IBM Support Center.

diag_fail Explanation: SP Switch diagnostics failed.

Action: See SP Switch adapter diagnostics.

not_configured Explanation: The CSS adapter is missing or not configured.

SP Switch device and link error information

The device and link current status is gathered in the annotated switch topology file, out.top, that is created on the primary node. The file looks like the switch topology file except that for each device or link that differs from the operational default status, an additional comment is made.

These additional comments are appended to the file by the fault service daemon and reflect the current connectivity status of the link or device. No comment on a link or device line means that the link or device exists and is operational.

Not all the comments reflect an error. Some may be a result of the system configuration or current administration.

An example of a failing entry and description is in out.top. If the listed recovery actions fail to resolve your problem, contact the IBM Support Center.

The possible device status values for SP Switch systems, with their recovery actions, are listed in Table 22. The possible link status values for SP Switch systems, with their recovery actions, are listed in Table 23.

Table 22. SP Switch device status and recovery actions

Device status number Device status text Explanation and recovery actions
-4 Device has been removed from the network - faulty. Explanation: The device has been removed from the switch network, because of a fault on the device.

Cause: A fault on the device.

Action: If the device in question is a node, see Verify SP Switch node operation. Otherwise contact IBM Hardware Service.

-5 Device has been removed from the network by the system administrator. Explanation: The device was placed offline by the system administrator (Efence).

Cause: The switch administrator ran the Efence command.

Action: Eunfence the device.

-6 Device has been removed from network - no AUTOJOIN. Explanation: The device was removed and isolated from the switch network.

Cause: The node was Efence without AUTOJOIN, the node was rebooted or powered off, or the node faulted.

Action: First attempt to Eunfence the device. If the node fails to rejoin the switch network, see AIX Error Log information. If the problem persists contact the IBM Support Center.

-7 Device has been removed from network for not responding. Explanation: The device was removed from the switch network.

Cause: An attempt was made to contact the device, but the device did not respond.

Action: If the device in question is a node, see Verify SP Switch node operation. Otherwise contact the IBM Support Center.

-8 Device has been removed from network because of a miswire. Explanation: The device is not cabled properly.

Cause: Either the switch network is miswired, or the frame supervisor tty is not cabled properly.

Action: First view the /var/adm/SPlogs/css/ cable_miswire file. Verify and correct all links listed in the file. Then issue the Eclock -d command and rerun the Estart command. If the problem persists, contact IBM Hardware Service.

-9 Destination not reachable. Explanation: The device was not reachable through the switch network

Cause: This is generally due to other errors in the switch network fabric.

Action: Investigate and correct the other problem, then run the Estart command.


Table 23. SP Switch link status and recovery actions

Link status number Link status text Explanation and recovery actions
0 Link has been removed from network - no AUTOJOIN. Explanation: The device was removed and isolated from the switch network.

Cause: The node was Efence without AUTOJOIN, the node was rebooted or powered off, or the node faulted.

Action: First attempt to Eunfence the device. If the node fails to rejoin the switch network, see AIX Error Log information. If the problem persists contact the IBM Support Center.

-2 Wrap plug is installed. Explanation: This link is connected to a wrap plug.

Cause: The wrap plug is connected to the port in order to test the port. This is not normally a problem.

Action: None.

-4 Link has been removed from network or miswired - faulty. Explanation: The link is not operational and was removed from the network.

Cause: Either the link is miswired or the link has failed.

Action: First check the /var/adm/SPlogs/css directory for the existence of a cable_miswire file. If the file exists, verify and correct all links listed in the file. Then issue the Eclock -d command and rerun the Estart command.

If the cable_miswire file does not exist, examine the /var/adm/SPlogs/css/flt file for entries relating to this link. If entries are found, verify that the cable is seated at both ends, then rc.switch the primary node and rerun the Estart command.

If the problem persists, contact the IBM Support Center.

-7 Link has been removed from network - fenced. Explanation: The device was placed offline by the Systems Administrator (Efence).

Action: Eunfence the associated node.

-8 Link has been removed from network - probable miswire. Explanation: The link is not cabled properly.

Action: View the /var/adm/SPlogs/css/cable_miswire file. Verify and correct all links listed in the file, then rc.switch the primary node and rerun the Estart command.

-9

Link has been removed from network - not connected.

Explanation: The link cannot be reached by the primary node, therefore initialization of the link is not possible.

Cause: This is generally caused by other problems in the switch network, such as a switch chip being disabled.

Action: Investigate and correct the underlying problem, then run the Estart command.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]