**************************************************************** | | | | | Build ID: | | Revision: a | | | | (C) Copyright International Business Machines Corporation | | xxxx, 2004. All rights reserved. | | | | US Government Users Restricted Rights -- Use, duplication | | or disclosure restricted by GSA ADP Schedule Contract with | | IBM Corp. | | | | Note: Before using this information and the product it | | supports, read the general information under "NOTICES AND | | TRADEMARKS" in this document. | | | | Updated: 11/04/2004 | | | **************************************************************** CONTENTS: 1.0 Change history This release note summarizes the firmware changes made to the IBM DDS/5 DAT drive firmware from version VA060 to version VA1B0. Reliability Improvements/Tape handling Number Description 1 Modified firmware to allow Mode Sense command to run in the Front End. Reduces the possibility of a host timeout condition during system booting or after a host SCSI bus reset. 2 Modified firmware to allow Log Sense command to run in the Front End. Reduces the possibility of a host timeout condition during system booting or after a host SCSI bus reset 3 Added more commands to work in the Front End. Added Prevent/Allow, Reserve, Release and Log Select. 4 If a tape jam or reel error occurs during tape insertion and initialization the tape will be ejected. This will reduce the number of reported 03/52/00 errors. Test data has shown the re-inserting the tape will result in a successful load and initialization. 5 During a stress test, the drive timed out and hung on a space to EOD. A power cycle was required to revive the drive. Modified interrupt masking logic method to prevent MCP interrupts while searching to the EOD frames. 6 Created enhancements to the firmware to improve tape handling during heavy forward/reverse movements of the tape and address some field 03/3B/00 issues. 7 Firmware testing encountered a Hardware Error condition during a format of a DDS-3 tape. Modified firmware to detect if tape stopped status is encountered near BOT. If so continue with format operation. Reliability Improvements/Media Errors/Drive Errors Number Description 1 Fixed a rare problem seen in interchange testing where the drive did not correctly detect tracking issues during pre-play so the required recalibration of the timing tracking system did not take place. 2 Improved the re-read algorithm based on a case seen on a specific tape where re-reads carried out in group 1 triggered timing tracking calibration that occurred at the end of Group 0. Modified the calibration position for rereads for this case. 3 During Quantum stress testing, A read error (03/3B/00 E5 - Sequential Position Error) occurred. Modified the firmware to strengthen append validation logic to include check for the correct group count to address this failure. 4 During extensive stress testing a drive reported an 03/52/00 0xF2 - Tape Jam error while rewinding to BOT. Modified the firmware to add an internal reset to the motor control program while performing reel jam recovery. 5 During stress testing, a drive returned data from an incorrect data block near BOT. Corrected this by modifying the frame assembly algorithm to detect that a valid buffered frame has a different checksum than one that is encountered in a subsequent rewrite. If this condition is detected then read in the rewrite frame. 6 During an extensive stress test the drive reported an incorrect tape block number position after a Space to EOD command. In this case the drive misdetected the current EOD and overshot to a prior EOD. Modified the firmware to add logic to monitor mode change timing in order to prevent EOD overshoot in play mode. 7 When a Bus Reset was received by the drive during writing to tape the drive reported a Hardware Error. Two causes were seen and fixed: a) The Reselection Timeout handler did not reset an internal Front End State IDLE, which caused ill behavior when the RESET event was reported and acted on. b) The task level StreamOp (Quantum performance) mode did not put the command reference onto the internal Back End queue. 8 During an extensive stress test a drive hang occurred during a Space block operation when the tape was positioned at BOT. The cause of the hang was an unexpected BOT status seen by the read state machine. Modified code to allow the read state machine to continue in this case. 9 Update Cleaning Warning LED and Preventative Cleaning algorithms. Combined Preventive Clean/Cleaning Warning LED: Use 1000 sample moving average. a) Use Cleaning roller if C1 error rate on either channel is > .10. b) Approximately .5 GB after cleaning roller, use tape path service if C1 error rate on either channel is > .10 c) Approximately .5 GB after tape path service and C1 error rate on either channel is > .15 then turn on Cleaning LED. d) Limit max frequency of Preventive Cleaning Roller use to once every 3 GB. 10 A rare timeout case occurred during stress testing using marginal drives. The fix was to strengthen subcode validation for this case. 11 A write append error, 03/50/00 0xF6, was encountered during stress testing with marginal drives that had out of spec C1 error rates. In this case the last frame in a DDS group had many rewrites. The fix is to modify append point storage logic to store the AFC corresponding to the last instance of frame 23 when this frame has been rewritten many times. To further enhance append reliability, also added logic to increase reposition distance on successive retries to find a valid append point. 12 An 04/44/00 0x8B DC_INVALID_DESCRIPTOR occurred during extended tests with marginal drives that had high C1 error rates. This test wrote and read using used 512 KByte blocks, unusual in field use. The error occurred when the firmware’s SCSI task queued a new DATA descriptor before the data for the last DATA descriptor had been transferred into the ring buffer. To correct this, the polling time for the current “write_burst_done “ flag was extended, and an incorrect condition for clearing the write_burst_done flag was removed. 13 A timeout occurred during stress testing when a Space to End Of Data (EOD) command was issued where the EOD position was very near the physical end of tape. In this case, the cylinder was left spinning with tape engaged, causing possible head and media damage. It was discovered that during this time critical Servo firmware (MCP) variables were vulnerable to corruption. The firmware was modified to disable interrupts around critical areas of code where the Tape Task is updating some MCP variables that control tape motion. 14 One particular drive failed with Media Error, 03/3B/00, when rewinding a tape from Early Warning. The drive failed to read subcode correctly and left the tape in the wrong position at the end of the rewind. Added more subcode validation to rewind logic to make sure rewind search command completes successfully. 15 During a stress test a Media Error 03/3B/08 0x99 was encountered during a Space Reverse command. Corrected by modifying the firmware to only trigger filemark/save set detection logic if all subcode validation flags are set. 16 A Media Error, 03/11/00 0x90, occurred during a stress test that wrote/read/ searched over a short area of tape. The error occurred during a read of an area of tape that had been previously overwritten with rewrites, so some old, invalid frames existed in the DDS group. Modified the firmware to ensure that previously written invalid frames were not incorrectly validated. 17 Testing found one particular drive that could not read one particular tape. Enhanced the reread algorithm to safely perform an increase of the tape tension if necessary to recover data. 18 A stress test that writes/spaces/overwrites/reads returned a Media Error, 03/50/00 0xF6 during an attempt to append data. In this case, an append table for the frames was incorrect. Modified the firmware to prevent invalid append frame numbers from being entered into the append frame table. 19 A test case revealed a rare possibility that a Media Error, 03/31/00, could occur if a write 1 block command was issued and then the tape was ejected and inserted, and two blocks read. Cause is that in this case two DDS groups were written to tape instead of one. Modified the firmware to prevent two groups from being written. 20 A particular write/read/search/append test which involved underlength and overlength reads would sometimes shift the read data by 32 bytes. Modified the firmware ring buffer flushing logic to prevent the data shift. 21 During a stress test a drive returned a Write Append error, 03/50/01/F3. At the time of error the drive could not decode the format subcode. Modified the firmware to add a reset to the subcode decode logic. 22 During a stress test on drives with the newer, Sancho 3 ASIC, a drive returned a Hardware Error erroneously. In this case, a very rare race condition occurred when, after the Sancho Host Interface was programmed for DMA transfer into the ring buffer and before the ring buffer had a chance to update a particular variable, the firmware DC task interrupted and invoked other ring buffer functions. This caused the delay in updating the variable. Changed the order of the sequence, first update the variable and then program the Sancho for transfer to ensure this variable is properly updated in all cases. 23 Drive returns were seen where Hardware Error sense data 04/44/00 was recorded. Trace data showed that if a Write command encountered parity error retries, and a Send Diagnostic command was received, the drive would erroneously report the 04/44. The parity error condition caused an internal Abort command to be sent to the firmware Tape task. Modified Tape Task command processing to make sure that abort status condition is cleared before executing the diagnostic 24 In a multi-LUN environment it is possible that a hang could occur when a Write command is issued when writing in Early Warning Zone. Modified code to prevent checking for free space in the ring buffer if the ring buffer code has already returned that space is available. 25 A rare race condition was uncovered during a particular multi-LUN test with a particular SCSI HBA. In this case the front end queue for the pseudo command did not match the head of queue and resulted in a Hardware Error, sense data 04/44/00/51. Modified the firmware to remove a small delay between an internal disconnect done message and the front end state. 26 During a particular multi-LUN test using a particular SCSI HBA, a Hardware Error with sense data 04/44/00/3F occurred. In this case a write burst flag was set too early, before the drive actually won the SCSI bus and initiated the DMA data transfer. Modified code to set the write burst flag when the write command actually won the arbitration of the SCSI bus, just before initiating the DMA transfer. 27 During a particular multi-LUN test using a particular SCSI HBA a Hardware Error with sense data 04/44/00/60 occurred. Due to the multi-LUN environment in this case, a previous write command DMA done interrupt was incorrectly preempting the SDTR negotiation. Modified code to correct this timing issue. 28 Quantum testing encountered an 03/3B/00 search error. Enhanced reliability of Area ID subcode decode to address this case. 29 Added ability to detect invalid BAT format (Data Record of Size 0) and report 0x0330C4 check condition. 30 Quantum testing experienced an 03/3B/00 error on a Space command. Added improved subcode validation to prevent the type of BOT misdetect that occurred in this case. 31 Review of traces from stress testing at Quantum with firmware supporting the Sancho 3 ASIC, revealed that the drive was missing interrupts during Read, on drives with the Sancho 2 ASIC. Changed firmware to allow certain ring buffer registers to take longer to stabilize, accounting for the potential of the presence of the slower clock speed Sancho 2. 32 IBM eServer testing encountered a tape jam during space to EOD, which then led to a (43/00/02) media error at end of partition. Modified firmware to detect the EOD in the data format pack items and properly recover. 33 Stress testing (IBM eServer) encountered a search timeout error 03/3B/00/E5 during a Locate command. Implemented improved subcode validation method for this Case. 34 Enhanced the timing of when PRML is reset during read recovery situations. Helps to reduce the possibility of a media error while reading. 35 Quantum testing found a case where a write append error 03/50 occurred during the formatting of a two partition tape. In this case a preventative head clean occurred at EOD of partition 1. Modified the firmware to position correctly after the head clean in this case. 36 Stress testing found that a media error could occur when repeatedly performing a space to EOD and space block reverse sequence. The drive would move tape during the space EOD command and then again during the space block reverse commands. Modified the firmware to buffer 6 DDS groups of data during the space to EOD function so that subsequent reverse searches within the 6 groups can be performed without tape movement. 37 Testing with Sancho 3 drives, with DDS 3 media, encountered a data Miscompare. Failure analysis determined that, under very rare data input pattern conditions, the Sancho 3 can simultaneously reach End of Record, End of Group and Dictionary Rest conditions. When this occurs, due to variability of the ARM processor firmware timing, a race condition can occur that results in 2 extra bytes being clocked into the Compression Core input. Implemented and tested a firmware fix to lock out the possibility of this simultaneous timing conflict under any and all possible data patterns in any possible compression mode. 38 AIX testing encountered a Sequential Positioning Error (03/3B/00) error. There were rewrites near end of the group before the target group in the failure case. Modified overshoot detection to wait until rewrites at end of previous group have completed to correct for this. 39 During testing of a tape cartridge with an illegal hole ID pattern, specifically 0xF, it was found that the drive would reject the tape, but at the same time cause a tape loop to be left when the cartridge was rejected. Added code to do a half-eject if an invalid hole pattern is detected to address this case. (Note that the drive correctly identifies and accepts or rejects all valid whole patterns; this issue only causes damage to an illegal tape which had been physically modified from acceptable patterns. 40 Testing with VA1A0 firmware encountered a hang when writing DDS-4 media. (In the test program used, each Write command is variable and the transfer length is different; this is a highly unlikely sequence in actual field operation.) The problem in this case is timing dependent. Modified the firmware to ensure an “interrupt pending” flag is properly set in this case. 41 A drive running an OEM (IBM) interchange test failed with a 03/11/00 0x90 C3 ECC error. The tape was written correctly, and readable on another drive, showing the issue was an intermittent read problem. A problem in reread recovery was identified where it was possible to move the wrong frame data when reassembling fragments of the target data. 42 Improved BOT detection to prevent intermittent 07/30 write protect check condition seen in EVT tests on some drive/tape combinations. 43 Testing reported no Filemark Detected in early warning area, when the FM bit should have been set. Test used RSMK setting=0 (off). Corrected a bug that did not calculate tape position correctly with RSMK off. Performance/Debug Improvements 1 Because CSO returns were being seen with no EEPROM data, FA began to investigate save trace operations. They determined that on some Load failures, the stored trace was not being correctly saved. This was duplicated with an 03/52/00 0xF2 Reel Jam problem. Multiple SCSI commands were required to trigger saving the trace to flash. Corrected this. 2 Added Page Code 81h to the Receive Diagnostics command to report Synchronous Data Transfer values, Wide Data Transfer values, and bus type (Single Ended, HVD, LVD) to the host. 3 Added code to save Aborted Command sense key and data in drive flash prom. 4 Enhanced the firmware to log the Emergency Eject attempts in the EEPROM with a time stamp. Miscellaneous Improvements 1 When retrieving a trace dump after successfully reading data it is possible for the drive to hang. Modified code to remove a use of buffered data which can become corrupted in the buffer by a prior trace dump operation. 2 After code review, a possible condition was identified where the mode motor could be left on longer than anticipated after a tape is ejected. The firmware was modified to provide additional functionality ensuring the mode motor is turned off as intended. 3 Added a half eject error recovery algorithm to help reduce 03/52 errors. This is primarily an issue seen in DDS-4 drives, and resolved with the inclusion of a Reel Cap hardware change incorporated in both SP-40 and DAT-72 on 6/28/2004. Adding this algorithm in the firmware will help address this issue for drives in the field that do not have the hardware change. 4 A timeout during reselection occurred during extended tests using marginal drives and 512 KByte/block transfers. This large block spanned three Groups, which exposed a bug where the ring buffer partial descriptor information was not updated correctly. This problem was resolved by updating the record type field correctly. 5 A rare case was observed where a command sequence of Write, Rewind, Read could result in high C1 error counts during the Read command. The C1 counts would return to normal after a Timing Tracking Calibration was performed as part of a reread recovery. Modified the firmware to suppress a timing calibration on Group 0 if a recalibration request is made while trying to read Group 1. Timing tracking calibration will be performed on Group 1 resulting in better servo tracking 6 Testing revealed that causing an Emergency Reset by holding the eject button until all three LEDs turned on and then releasing the button would result in the drive being unable to finish the Power On Self Test sequence. This only occurred on Sancho 3 hardware. The Sancho 3 ASIC has a hardware initialization sequence that is slightly different from that of the Sancho 2. Modified the system initialization logic that is executed during the hardware reset portion of the Emergency Eject sequence. 7 A particular hardware failure on a returned drive allowed that drive to write data that could not be read – reporting a 03/31/00 0x9F (DC Detected Group out of sequence) error. The returned drive had an open trace in the PCB causing the problem. Modified firmware to perform a verification that the Group Count field in the GIT in Kukai DRAM is correct while writing data. If not, post 04/44/BF/00D7 error. This will prevent the read side failure. 8 IBM Request - There are certain VSC/VSCQ codes that, when set in conjunction with a 03 sense key, do not set the cleaning LED to on. IBM requests that any time the 03 sense key is set we turn on the cleaning LED. Log Page 3C cleaning bit would be set also. 9 Modified fragment assembly portion of read recovery firmware Reliability Improvements/SCSI 1 DVT found a case where the drive would hang if a verify command with no disconnect privilege was received. This problem was resolved. 2 A SCSI Abort message during a Read command Data in Phase caused an 04/44/00 sense key to be returned by the drive. Added code to reinitialize the transfer in progress flag when the SCSI bus goes bus free due to bus exceptions (abort message, exhausted parity error retry) while in the middle of data transferring. In addition made changes so that at the beginning of command phase that flag will still remain clear. 3 Added new feature. Activated Select Data Compression Algorithm field in mode page 10h according to SSC-2 rev. 09. 4 Fix for multi-initiator command timeout. When a read command disconnected, and was followed by an inquiry command from same initiator with non LUN 0, the code sent out read data in the inquiry data. Removed identify message locking flag after reading out CDB's from AIC FIFO. 5 Tape Alert flags 5, 6, 30, and 33 are not set after issuing a Mode Select for page 0x1C where the Test bit is set to 1 and the Test flag was set to 0x7FFF Modified firmware so that this part of the Tape Alert implementation operates correctly. 6 Stress testing revealed a hang could occur if many Read commands that encountered End of Data were issued. Modified firmware to ensure that two different tasks could not update the same virtual descriptor. 7 A test sequence of write 10 fixed blocks followed by write one variable block followed by a Rewind returned an 04/44/00 0x8B (Invalid Ring Buffer Descriptor) sense key. Modified firmware so that when there is a switch between fixed and variable modes that the firmware will drop out of StreamOp mode temporarily. 8 DVT found that an IDE Message during Identify-Reselection caused the drive to return an incorrect block address. Root cause was logical address had been updated for a command that was aborted by the message. Corrected this. 9 Modified the firmware so that if a Rewind Immediate command was executing and a Read Position command was received the drive would set the BPU bit in the Read Position data. 10 DVT Testing revealed that it was possible for a Log Sense Page 3h command to not return data while an Immediate type command was in progress. The firmware was modified to ensure that the Log Page 3h data would always be returned. 11 Testing revealed that if these two conditions exist, a) The disconnect privilege is denied by the host b) The Back End queue is occupied the following commands could return Busy Status: Mode Sense, Release, Reserve, Prevent Allow Media Removal and Log Sense. Modified the firmware to ensure that these commands are processed in the Front End if the two above conditions exist. 12 Per request from a customer, modified firmware to update the Decompression Algorithm field in Data Compression Control Mode Page 0x0F (Bytes 8-11) to determine the compression algorithm of the latest data transmitted to host during a Read command. 13 If a Mode Select Page 0F is sent with DCE = 1 and compression algorithm set to 0, the drive will not compress the data while writing. Modified firmware to return Illegal Request sense key in this case. 14 Testing found that if a Long Erase is aborted by a host or BUS reset, the drive will not respond to any tape access command. The drive will report not ready, and an emergency eject or power cycle is required to eject the media. Made a fix to properly pass the abort status between two modules in the firmware for this case. 15 Tape Alert testing revealed that when the drive is booted (e.g., via Power Cycle or Emergency Eject), the Loader Magazine Tape Alert 2dh (45 decimal) would be set. Corrected a bug in the firmware that was created during overhaul of TapeAlert code.