PROBLEMS WITH HOT PLUGGING EXTERNAL SCSI DEVICES

ITEM: RTA000035301



QUESTION:                                                                       
Customer is having many problems with the SCSI on a 570.  If they               
1)  don't have a terminator upon boot-up or 2) try to 'hot plug' a              
SCSI device like an external 8mm tape drive or 3) accidentally unplug           
the terminator, the machine will no longer boot and they have to                
re-load the whole machine.                                                      
                                                                                
I thought that SCSI was 'hot-pluggable' -- am I wrong?  I was also told         
by the CE who covers this account that he thought there was a RETAIN            
item relating specifically towards this problem.                                
Can you help?                                                                   
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
A:  When removing or adding a SCSI device to the SCSI bus all power must        
    be removed from the system.  Do not turn on, turn off, or disconnect       
    any SCSI device while power is present at the system unit.  Such "hot       
    plugging" is forbidden because it might blow the controller fuse,           
    trip the PTC resistor, corrupt data, or permanently damage SCSI             
    controller chips in adapters or devices.                                    
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
QUESTION:                                                                       
Would it be normal then if the terminator was accidentally unplugged            
that the machine would need to be reloaded                                      
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
A:  Yes, if a terminator is removed from the SCSI bus data can be               
    corrupted and reinstalling the operating system may be necessary.           
    The following question which was taken from another ASKQ item               
    illustrate this point:                                                     
                                                                                
************************************************************************        
                                                                                
(1) With AIX 3.1.5, I must first shutdown and power off the system              
before connecting / disconnecting any external SCSI devices. Is this a          
true statement?  I would be very very surprised if the answer was no.           
At least twice after I added an external device to the system without           
powering off the system, I had to reload the system because the system          
hung with a "552" on LED.  I also heard from many of my co-workers              
the same thing happened to them.                                                
                                                                                
                                                                                
************************************************************************        
                                                                                
---------- ---------- ---------- --------- ---------- ----------               
QUESTION:                                                                       
The error that originally showed up on the LED readout was 555 and              
did require a reload.  A SCSI device was again accidentally bumped              
off and now the error is a 557 -- an in service mode, the machine               
cannot be reloaded -- we are now calling a CE.  We can't find                   
documentation on these error codes.                                             
                                                                                
Is there ANY workaround that is known to exist or something that we             
might try to 'unlock' the SCSI channel and allow it to boot again               
from the hard disk??  Understanding that we need to address the root            
of the problem (the SCSI cables being too easy to bump/unplug), is              
there anyway around this problem???  Thanks in advance.                         
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
A:  As long as no vital data was corrupted when your SCSI bus hung,            
  it may be possible to recover your system.  Following is a document           
  that explains your error, and gives some options you can try to               
  get your system running again.                                                
                                                                                
    RECOVERY FROM AN LED 551, 555, OR 557 IN AIX 3.1 OR 3.2                     
                                                                                
    SPECIAL NOTICES                                                             
                                                                                
      The problem for which you received this document is not con-              
      sidered a code warranty issue.  This document is provided                 
      as an aid by the Austin AIX Support Center.  If you need                  
      further assistance, contact your local branch office or                   
      point of sale, or call 1-800-CALL-AIX for information about               
      support offerings.  All of the above services may be                      
      billable.  Faxes on a variety of subjects may be ordered                 
      free of charge from 1-800-IBM-4FAX.                                       
                                                                                
      Comments about this document may be sent by fax to "Info                  
      Feedback" at (512) 823-7634.  IBM representatives can send                
      comments internally to ROUSHC at AUSVM8.                                  
                                                                                
      The information contained in this document is distributed                 
      "AS IS" without any warranties of any kind either expressed               
      or implied.  IBM will not be responsible for any direct,                  
      incidental, consequential, special or indirect damages.  IBM              
      EXPRESSLY DISCLAIMS ANY IMPLIED WARRANTY OF MERCHANTABILITY               
      AND ANY IMPLIED WARRANTY OF FITNESS FOR A PARTICULAR                      
      PURPOSE.                                                                  
                                                                                
      The use of this information or the implementation of any of              
      these techniques is the sole responsibility of the customer               
      and depends on the customer's ability to evaluate and inte-               
      grate this information or implementation into the customer's              
      operational environment.                                                  
                                                                                
    CAUSES OF AN LED 551, 555, OR 557                                           
                                                                                
      The known causes of an LED 551, 555, or 557 during IPL on an              
      RISC System/6000 are:                                                     
                                                                                
        o   A corrupted file system.                                            
                                                                                
        o   A corrupted journaled-file-system (JFS) log device.                 
                                                                                
        o   A failing fsck (file-system check) caused by a bad file-           
            system helper.                                                      
                                                                                
        o   A bad disk in the machine that is a member of the                   
            rootvg.                                                             
                                                                                
    SUMMARY OF THE RECOVERY PROCEDURE                                           
      To diagnose and fix the problem, you will need to boot from               
      bootable media and run logform on /dev/hd8.  Then run fsck                
      to fix any file systems that may be corrupted.                            
                                                                                
    STEPS                                                                       
                                                                                
      1.  Turn the key to the Service position.                                 
                                                                                
      2.  With bosboot diskettes or tape OF THE SAME VERSION AND               
          LEVEL AS THE SYSTEM, boot the system.  (If booting from               
          diskettes, insert the Display diskette when you see LED               
          c07.)                                                                 
                                                                                
      ------------------------------------------------------------              
                                                                                
      WARNING:  If you boot a 3.2 system with 3.1 media, or boot a              
      3.1 system with 3.2 media, then you will not be able to use               
      the standard scripts (getrootfs or /etc/continue) to bring                
      your workstation into full maintenance mode.                              
                                                                                
      Moreover, performing the scripts on a 3.1 system with 3.2                 
      boot media may actually remove some files and prevent your                
      system from booting successfully in normal mode until                     
      missing files (/etc/mount and /etc/umount) are replaced on               
      the disk.                                                                 
                                                                                
      ------------------------------------------------------------              
                                                                                
          NOTE:  If you get a 551, 555, or 557 on this step, the                
          diskette or tape is bad, and the machine is trying to                 
          boot off the fixed disk.  Try it again with new bosboot               
          diskettes or tape.                                                    
                                                                                
          Follow the prompts to the installation/maintenance menu.              
                                                                                
      3.  Choose the maintenance shell (option 5 for AIX 3.1,                   
          option 4 for AIX 3.2).                                                
                                                                                
      4.  Determine the hdisk# to use with the getrootfs or                    
          /etc/continue command.  If you have only one disk, then               
          "hdisk0" is the proper hdisk# to use.  If you have more               
          than one disk, run                                                    
                                                                                
             lqueryvg -Atp hdisk# | grep hd5                                    
                                                                                
          for each hdisk# (hdisk0, hdisk1, etc.) until you get                  
          output that looks like:                                               
                                                                                
             00005264feb3631c.2  hd5 1                                          
                                                                                
          You may find more than one disk has this output.  These               
          will all be disks which belong to the rootvg volume                   
          group.  You may use any of the disks identified to be in              
          rootvg in the following step.                                        
                                                                                
      5.  Now access the rootvg volume group by running                         
          /etc/continue (for AIX 3.1) or getrootfs (for AIX 3.2).               
          ('#' is the number of the fixed disk, determined in step              
          4.)                                                                   
                                                                                
          For AIX 3.1 only, run                                                 
                                                                                
             /etc/continue hdisk# sh                                            
                                                                                
          For AIX 3.2 only, run                                                 
                                                                                
             getrootfs hdisk# sh                                                
                                                                                
          If you get errors indicating that a physical volume is               
          missing from the rootvg, run diagnostics on the physical              
          volumes to find out if you have a bad disk.  Do not con-              
          tinue with the rest of the steps in this document.                    
                                                                                
          If you get other errors from getrootfs or /etc/continue,              
          do not continue with the rest of the steps in this docu-              
          ment.  Correct the problem causing the error.  If you                 
          need assistance correcting the problem causing the                    
          error, contact one of the following:                                  
                                                                                
            o   local branch office                                             
            o   your point of sale                                              
            o   call 1-800-CALL-AIX (to register for fee-based ser-             
                vices)                                                          
                                                                               
          All of the above avenues for assistance may be billable.              
                                                                                
      6.  Format the default jfslog for the rootvg jfs file                     
          systems.                                                              
                                                                                
          For AIX 3.1 only, run                                                 
                                                                                
             /etc/aix/logform /dev/hd8                                          
                                                                                
          For AIX 3.2 only, run                                                 
                                                                                
             logform /dev/hd8                                                   
                                                                                
          Answer YES when asked if you want to destroy the log.                 
                                                                               
      7.  Next, run the following commands to check and repair                  
          file systems.  (The "-y" option gives fsck permission to              
          repair file systems when necessary.)                                  
                                                                                
             fsck -y /dev/hd1                                                   
             fsck -y /dev/hd2                                                   
             fsck -y /dev/hd3                                                   
             fsck -y /dev/hd4                                                   
                                                                                
          For AIX 3.2 only, also run                                            
                                                                                
             fsck -y /dev/hd9var                                                
                                                                                
      8.  Type "exit".  The file systems will automatically mount               
          after you type "exit".                                               
                                                                                
      9.  If you are running the Andrew File System (AFS), use the              
          following commands to save the AFS file-system helper                 
          and replace it with the original file-system helper.                  
                                                                                
          In AIX 3.1,                                                           
                                                                                
             cd /etc/helpers                                                    
                                                                                
          In AIX 3.2,                                                           
                                                                                
             cd /sbin/helpers                                                   
                                                                                
          Then in both AIX 3.1 and 3.2,                                         
                                                                               
             copy v3fshelper v3fshelper.afs                                     
             copy v3fshelper.orig v3fshelper                                    
                                                                                
      10. Determine which disk is the boot disk with the lslv                   
          command.  The boot disk will be shown in the PV1 column               
          of the lslv output.                                                   
                                                                                
             lslv -m hd5                                                        
                                                                                
      11. Recreate the boot image.  (hdisk# is the boot disk                    
          determined in step 10.)                                               
                                                                                
             bosboot -a -d /dev/hdisk#                                          
                                                                                
      12. If you are running the Andrew File System (AFS), copy                
          the AFS file-system helper back:                                      
                                                                                
             copy v3fshelper.afs v3fshelper                                     
                                                                                
      13. With the key in Normal position, run                                  
                                                                                
             shutdown -Fr                                                       
                                                                                
      If you followed all of the above steps and the system still               
      stops at an LED 551, 555, or 557 during a reboot in Normal                
      mode, you may want to pursue further system recovery assist-              
      ance from one of the following:                                           
                                                                                
        o   local branch office                                                 
        o   your point of sale                                                 
        o   1-800-CALL-AIX (to register for fee-based services)                 
                                                                                
      All of the above avenues for assistance may be billable.                  
                                                                                
      For reasons of time and the integrity of your AIX operating               
      system, the best alternative at this point may be to                      
      reinstall AIX.                                                            
                                                                                
    END OF DOCUMENT                                                             
                                                                                
  If the above does not help, then you will probably need to reinstall          
  AIX.                                                                          
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
                                                                               
                                                                                
This item was created from library item Q644324      CLZQH                      
                                                                                
Additional search words:                                                        
CLZQH DEVICE DEVICES EXTERN EXTERNAL HARDWARE HOT IX JAN94 OZNOTPID             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                               


WWQA: ITEM: RTA000035301 ITEM: RTA000035301
Dated: 04/1996 Category: RISCOHW
This HTML file was generated 99/06/24~12:43:13
Comments or suggestions? Contact us