RE-SYNCHRONIZING STALE COPIES AFTER A DISK FAILURE.

ITEM: RTA000022527


QUESTION:                                                                       
I have a model 550 with 3 9334's attached to my system.  Each                   
of the disks is attached to a separate SCSI controller.  On each disk           
is a copy of a very large database - in essence, triple mirroring.              
I have researched how the disks are re-synched in case of a failure of          
one a physical disk.  I need to know how the sync process works - in            
other words, if the system is asked to re-sync the new disk once it is          
brought back online - and a large database is involved - how much system        
overhead can I expect this process to take?  Will the process take all          
system resources to re-sync the disk or is there a way to control the           
process in such a way so as not to impact the user's operations?  Is            
there such a control in the new version?                                        
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
A: If a disk fails and copies exist on that disk, then the other               
   available copy of the data is written, and data is updated in the            
   volume group descriptor area (VGDA) on the available disk indicating         
   that the physical partition on the failed disk is "stale". As                
   data is attempted to be written to different physical partitions on          
   the failed disk, each partition is marked stale.                             
                                                                                
   When the volume group is re-varied on, or if the "syncvg" command is         
   run, the VGDA is always checked to see if stale partitions exist. If         
   so, the copy is read from the most recently written copy and then            
   written over the stale partition on the recovered disk. In this              
   manner, the system needs to resync ONLY the partitions on the                
   recovered disk that differ from the most recent copy.                        
                                                                                
   Each physical partition on the disk is 4 MB by default. This means           
   that to resync a single partition, you must read 4 contiguous MB from       
   one disk and write that same data to the stale PP on the other disk.         
   The amount of time it takes to resync is obviously this amount of            
   time multiplied by the number of stale physical partitions. I can't          
   tell you the exact time it takes to read and write 4 MB of data              
   between disks, because this depends on the type of drive, the                
   SCSI connection, other CPU or micro-channel activity, etc., but this         
   should give you some idea.                                                   
                                                                                
   The resynchronization process will not halt other system activity,           
   but will cause a great deal of I/O activity, possibly affecting the          
   performance of other processes. If you wish to postpone the sync             
   process, you can varyon the volume group after returning the failed          
   drive using "varyonvg -n VGname". The "-n" flag tells varyonvg not           
   to run the sync process. You may then run the "syncvg -v VGname"             
   command to resynchronize stale partitions at your conveniences, or          
   possibly using a "nice" value that will less impact other system             
   activity.                                                                    
                                                                                
---------- ---------- ---------- --------- ---------- ----------                
                                                                                
                                                                                
This item was created from library item Q574480      BCXLP                      
                                                                                
Additional search words:                                                        
BCXLP COPICS DASD DISC DISKETTE DRUM FAILURE HARDWARE IX LVM MAR92              
RE RESYNC RISCSYSTEM RISCSYSU STALE SYNC SYNCHRONI SYNCHRONIZIN                 
SYNCVG SYS SYSTEM UNIT VARYONVG                                                 
                                                                                
                                                                                
                                                                               



WWQA: ITEM: RTA000022527 ITEM: RTA000022527

Dated: 07/1996 Category: RISCMGMT

This HTML file was generated 99/06/24~12:43:08

Comments or suggestions?
Contact us