[ Previous | Next | Contents | Home | Search ]
AIX Version 4.3 Kernel Extensions and Device Support Programming Concepts

SCSI Error Recovery

The SCSI error-recovery process handles different issues depending on whether the SCSI device is in initiator mode or target mode. If the device is in initiator mode, the error-recovery process varies depending on whether or not the device is supporting command queuing.

SCSI Initiator-Mode Recovery When Not Command Tag Queuing

If an error such as a check condition or hardware failure occurs, transactions queued within the SCSI adapter device driver are terminated abnormally with iodone calls. The transaction active during the error is returned with the sc_buf.bufstruct.b_error field set to EIO. Other transactions in the queue are returned with the sc_buf.bufstruct.b_error field set to ENXIO. The SCSI device driver should process or recover the condition, rerunning any mode selects or device reservations to recover from this condition properly. After this recovery, it should reschedule the transaction that had the error. In many cases, the SCSI device driver only needs to retry the unsuccessful operation.

The SCSI adapter device driver should never retry a SCSI command on error after the command has successfully been given to the adapter. The consequences for retrying a SCSI command at this point range from minimal to catastrophic, depending upon the type of device. Commands for certain devices cannot be retried immediately after a failure (for example, tapes and other sequential access devices). If such an error occurs, the failed command returns an appropriate error status with an iodone call to the SCSI device driver for error recovery. Only the SCSI device driver that originally issued the command knows if the command can be retried on the device. The SCSI adapter device driver must only retry commands that were never successfully transferred to the adapter. In this case, if retries are successful, the sc_buf status should not reflect an error. However, the SCSI adapter device driver should perform error logging on the retried condition.

The first transaction passed to the SCSI adapter device driver during error recovery must include a special flag. This SC_RESUME flag in the sc_buf.flags field must be set to inform the SCSI adapter device driver that the SCSI device driver has recognized the fatal error and is beginning recovery operations. Any transactions passed to the SCSI adapter device driver, after the fatal error occurs and before the SC_RESUME transaction is issued, should be flushed; that is, returned with an error type of ENXIO through an iodone call.

Note: If a SCSI device driver continues to pass transactions to the SCSI adapter device driver after the SCSI adapter device driver has flushed the queue, these transactions are also flushed with an error return of ENXIO through the iodone service. This gives the SCSI device driver a positive indication of all transactions flushed.

If the SCSI device driver is executing a gathered write operation, the error-recovery information mentioned previously is still valid, but the caller must restore the contents of the sc_buf.resvdw1 field and the uio struct that the field pointed to before attempting the retry. The retry must occur from the SCSI device driver's process level; it cannot be performed from the caller's iodone subroutine. Also, additional return codes of EFAULT and ENOMEM are possible in the sc_buf.bufstruct.b_error field for a gathered write operation.

SCSI Initiator-Mode Recovery During Command Tag Queuing

If the SCSI device driver is queuing multiple transactions to the device and either a check condition error or a command terminated error occurs, the SCSI adapter driver does not clear all transactions in its queues for the device. It returns the failed transaction to the SCSI device driver with an indication that the queue for this device is not cleared by setting the SC_DID_NOT_CLEAR_Q flag in the sc_buf.adap_q_status field. The SCSI adapter driver halts the queue for this device awaiting error recovery notification from the SCSI device driver. The SCSI device driver then has three options to recover from this error:

When the SCSI adapter driver's queue is halted, the SCSI device drive can get sense data from a device by setting the SC_RESUME flag in the sc_buf.flags field and the SC_NO_Q flag in sc_buf.q_tag_msg field of the request-sense sc_buf. This action notifies the SCSI adapter driver that this is an error-recovery transaction and should be sent to the device while the remainder of the queue for the device remains halted. When the request sense completes, the SCSI device driver needs to either clear or resume the SCSI adapter driver's queue for this device.

The SCSI device driver can notify the SCSI adapter driver to clear its halted queue by sending a transaction with the SC_Q_CLR flag in the sc_buf.flags field. This transaction must not contain a SCSI command because it is cleared from the SCSI adapter driver's queue without being sent to the adapter. However, this transaction must have the SCSI ID field (sc_buf.scsi_command.scsi_id) and the LUN fields (sc_buf.scsi_command.scsi_cmd.lun and sc_buf.lun ) filled in with the device's SCSI ID and logical unit number (LUN). If addressing LUNs 8 - 31, the sc_buf.lun field should be set to the logical unit number and the sc_buf.scsi_command.scsi_cmd.lun field should be zeroed out. See the descriptions of these fields for further explanation. Upon receiving an SC_Q_CLR transaction, the SCSI adapter driver flushes all transactions for this device and sets their sc_buf.bufstruct.b_error fields to ENXIO. The SCSI device driver must wait until the sc_buf with the SC_Q_CLR flag set is returned before it resumes issuing transactions. The first transaction sent by the SCSI device driver after it receives the returned SC_Q_CLR transaction must have the SC_RESUME flag set in the sc_buf.flags fields.

If the SCSI device driver wants the SCSI adapter driver to resume its halted queue, it must send a transaction with the SC_Q_RESUME flag set in the sc_buf.flags field. This transaction can contain an actual SCSI command, but it is not required. However, this transaction must have the sc_buf.scsi_command.scsi_id , sc_buf.scsi_command.scsi_cmd.lun ,and the sc_buf.lun fields filled in with the device's SCSI ID and logical unit number. See the description of these fields for further details. If this is the first transaction issued by the SCSI device driver after receiving the error (indicating that the adapter driver's queue is halted), then the SC_RESUME flag must be set as well as the SC_Q_RESUME flag.

Analyzing Returned Status

The following order of precedence should be followed by SCSI device drivers when analyzing the returned status:

  1. If the sc_buf.bufstruct.b_flags field has the B_ERROR flag set, then an error has occurred and the sc_buf.bufstruct.b_error field contains a valid errno value.

    If the b_error field contains the ENXIO value, either the command needs to be restarted or it was canceled at the request of the SCSI device driver.

    If the b_error field contains the EIO value, then either one or no flag is set in the sc_buf.status_validity field. If a flag is set, an error in either the scsi_status or general_card_status field is the cause.

    If the status_validity field is 0, then the sc_buf.bufstruct.b_resid field should be examined to see if the SCSI command issued was in error. The b_resid field can have a value without an error having occurred. To decide whether an error has occurred, the SCSI device driver must evaluate this field with regard to the SCSI command being sent and the SCSI device being driven.

    If the SCSI device driver is queuing multiple transactions to the device and if either SC_CHECK_CONDITION or SC_COMMAND_TERMINATED is set in scsi_status , then the value of sc_buf.adap_q_status must be analyzed to determine if the adapter driver has cleared its queue for this device. If the SCSI adapter driver has not cleared its queue after an error, then it holds that queue in a halted state.

    If sc_buf.adap_q_status is set to 0, the SCSI adapter driver has cleared its queue for this device and any transactions outstanding are flushed back to the SCSI device driver with an error of ENXIO.

    If the SC_DID_NOT_CLEAR_Q flag is set in the sc_buf.adap_q_status field, the adapter driver has not cleared its queue for this device. When this condition occurs, the SCSI adapter driver allows the SCSI device driver to send one error recovery transaction (request sense) that has the field sc_buf.q_tag_msg set to SC_NO_Q and the field sc_buf.flags set to SC_RESUME. The SCSI device driver can then notify the SCSI adapter driver to clear or resume its queue for the device by sending a SC_Q CLR or SC_Q_RESUME transaction.

    If the SCSI device driver does not queue multiple transactions to the device (that is, the SC_NO_Q is set in sc_buf.q_tag_msg ), then the SCSI adapter clears its queue on error and sets sc_buf.adap_q_status to 0.

  2. If the sc_buf.bufstruct.b_flags field does not have the B_ERROR flag set, then no error is being reported. However, the SCSI device driver should examine the b_resid field to check for cases where less data was transferred than expected. For some SCSI commands, this occurrence may not represent an error. The SCSI device driver must determine if an error has occurred.

    If a nonzero b_resid field does represent an error condition, then the device queue is not halted by the SCSI adapter device driver. It is possible for one or more succeeding queued commands to be sent to the adapter (and possibly the device). Recovering from this situation is the responsibility of the SCSI device driver.

  3. In any of the above cases, if sc_buf.bufstruct.b_flags field has the B_ERROR flag set, then the queue of the device in question has been halted. The first sc_buf structure sent to recover the error (or continue operations) must have the SC_RESUME bit set in the sc_buf.flags field.

Target-Mode Error Recovery

If an error occurs during the reception of send command data, the SCSI adapter device driver sets the TM_ERROR flag in the tm_buf.user_flag field. The SCSI adapter device driver also sets the SC_ADAPTER_ERROR bit in the tm_buf.status_validity field and sets a single flag in the tm_buf.general_card_status field to indicate the error that occurred.

In the SCSI subsystem, an error during a send command does not affect future target-mode data reception. Future send commands continue to be processed by the SCSI adapter device driver and queue up, as necessary, after the data with the error. The SCSI device driver continues processing the send command data, satisfying user read requests as usual except that the error status is returned for the appropriate user request. Any error recovery or synchronization procedures the user requires for a target-mode received-data error must be implemented in user-supplied software.

Related Information

Understanding the sc_buf Structure.

Understanding the Execution of Initiator I/O Requests.


[ Previous | Next | Contents | Home | Search ]