ITEM: W4429L

How to determine which LV a bad block is part of


ENV:

Machine Model:  3BT
AIX Level:      3.2.5 on all machines

Desc:

Customer has a 3BT where hdisk0 is a 2 GB drive and, after they loaded
AIX for the first time and brought it up, a bad block was found and
mapped out.  The messages they got from errpt indicate that the device
in question doesn't support hardware mapping, so the block was mapped 
via software, as shown (from errpt -a):
.Block number
     2894645
Relocation block number
     3932783
.The Customer would like to know is which LV was this block in?
Here is the output of "lspv -l hdisk0":
.hdisk0:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
hd5                   2     2     02..00..00..00..00    /blv
hd7                   5     5     05..00..00..00..00    /mnt
hd9var                257   257   89..00..00..72..96    /var
hd6                   128   128   00..96..32..00..00    N/A
hd8                   1     1     00..00..01..00..00    N/A
hd4                   2     2     00..00..02..00..00    /
hd2                   80    80    00..00..61..19..00    /usr
hd1                   1     1     00..00..00..01..00    /home
hd3                   4     4     00..00..00..04..00    /tmp
.There are no free PP's.

Action:

The Software defect map is written at block \#8 of the device (the SCSI
controller translates this into proper cyl/head/sector \#).  This map
is presently 22 blocks large, which contains space enough for
approximately 1400 defects.

The first 6 bytes in this map identify this as the bad block directory
with the characters "DEFECT", in ascii.  The next 2 bytes specify the
number of entries in this bad block map (table).

The remaining entries in the table are the bad block entries, 8 bytes
(2 full words) each.

  Word 1:  Bits 0-3 = Defect Reason Code:
                       '0' = Defect found by PV Mfgr.
                       'A' =  -"-   found by Surface Verif. diag. test
                       'B' =  -"-   found by the System
                       'C' =  -"-   found by Mfgr. test.
           Bits 4-31= Physical Sector \# containing defect

  Word 2:  Bits 0-3 = Status Code:
                       '0' = Defect has been relocated.
                       '8' = Defect has NOT been relocated.
                              (awaiting a write command)
           Bits 4-31= Physical Sector \# of new block containing data
                      from defective block.

Relocations are performed only on WRITE operations.  Errors found on
READs will result in the Status Code of '8'.  After a subsequent WRITE
operation is performed, and the relocation is accomplished, the status
code will be changed to '0'.  If status = '0', any READs will be to
the NEW block.  If status = '8', any READs will be to the OLD block.

Note that any defect relocated by Hardware will NOT appear in the
Software defect relocation map.  These maps are totally separate from
each other.

/*
 * This structure represents an entry in the bad block directory
 */ 
struct bb_entry {
    unsigned reason     :  4;   /* reason the block was marked bad */
    unsigned bb_lsn     : 28;   /* the logical sector number of the*/
                                /* bad block                       */
    unsigned rel_stat   :  4;   /* relocation status, where a 1 in */
                                /* most significant bit indicates  */
                                /* bad block needs to be relocated */
    unsigned rel_lsn    : 28;   /* the logical sector number of the*/
                                /* relocated block                 */
};

You can examine the Software defect relocation map using the following
command at AIX 3.2.x:

  \# hdf /dev/hdisk\# 1000

Given the Block number that was given in your error report, here is
the algorithm that can be used to determine which partition on the
disk was effected by the block relocation:

First determine the starting point for the first partition:

  \# hdf /dev/hdisk\# 13200

00013200   00000721 FE1C9325 00000000 00000000  |...!...%........|
00013210   00F80100 00000500 00020001 00000000  |................|
00013220   00020000 00000001 00010000 00000000  |................|
00013230   00000000 00000000 00000000 00000000  |................|
00013240   00050000 0000008A 00010000 00000000  |................|
00013250   00000000 00000000 00000000 00000000  |................|
00013260   00050000 0000008B 00010000 00000000  |................|
00013270   00000000 00000000 00000000 00000000  |................|
00013280   00050000 0000008C 00010000 00000000  |................|
00013290   00000000 00000000 00000000 00000000  |................|
000132A0   00050000 0000008D 00010000 00000000  |................|
000132B0   00000000 00000000 00000000 00000000  |................|
000132C0   00050000 0000008E 00010000 00000000  |................|
000132D0   00000000 00000000 00000000 00000000  |................|
000132E0   00050000 0000008F 00010000 00000000  |................|
000132F0   00000000 00000000 00000000 00000000  |................|

  The start of the first partition can be found on the second
  row, third column.  In this example, this is 00000500 (this
  is in sectors which are 512 bytes) -->  This is offset A0000 
  (0x500 * 0x200) from the beginning of the disk. 

  Next translate the Block number that is recorded in the 
  Software Bad Block Table:

    bc
    obase=16            \<- Change output base to hex
    2894645*512         \<- Get offset from beginning of disk
    58566A00            \<- Offset in hex
    ibase=16            \<- Change input base to hex
    58566A00-A0000      \<- Subtract out start of first partition
    584C6A00            \<- Offset including start of first partition
    584C6A00/400000     \<- Divide by partition size (4MB in this case)
    161                 \<- Partition in hex
    obase=A             \<- Change output base to decimal
    161                 \<- Partition in hex
    353                 \<- Partition in decimal
    Ctrl-D

    lspv -M hdisk\# | grep hdisk\#:353

  This is assuming that the offset for your disk is A0000.  Be sure
  to get the correct offset for your disk using the hdf command with
  an offset of 13200.


Support Line: How to determine which LV a bad block is part of ITEM: W4429L
Dated: August 1995 Category: N/A
This HTML file was generated 99/06/24~13:30:34
Comments or suggestions? Contact us