ITEM: W4429L
How to determine which LV a bad block is part of
ENV:
Machine Model: 3BT
AIX Level: 3.2.5 on all machines
Desc:
Customer has a 3BT where hdisk0 is a 2 GB drive and, after they loaded
AIX for the first time and brought it up, a bad block was found and
mapped out. The messages they got from errpt indicate that the device
in question doesn't support hardware mapping, so the block was mapped
via software, as shown (from errpt -a):
.Block number
2894645
Relocation block number
3932783
.The Customer would like to know is which LV was this block in?
Here is the output of "lspv -l hdisk0":
.hdisk0:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
hd5 2 2 02..00..00..00..00 /blv
hd7 5 5 05..00..00..00..00 /mnt
hd9var 257 257 89..00..00..72..96 /var
hd6 128 128 00..96..32..00..00 N/A
hd8 1 1 00..00..01..00..00 N/A
hd4 2 2 00..00..02..00..00 /
hd2 80 80 00..00..61..19..00 /usr
hd1 1 1 00..00..00..01..00 /home
hd3 4 4 00..00..00..04..00 /tmp
.There are no free PP's.
Action:
The Software defect map is written at block \#8 of the device (the SCSI
controller translates this into proper cyl/head/sector \#). This map
is presently 22 blocks large, which contains space enough for
approximately 1400 defects.
The first 6 bytes in this map identify this as the bad block directory
with the characters "DEFECT", in ascii. The next 2 bytes specify the
number of entries in this bad block map (table).
The remaining entries in the table are the bad block entries, 8 bytes
(2 full words) each.
Word 1: Bits 0-3 = Defect Reason Code:
'0' = Defect found by PV Mfgr.
'A' = -"- found by Surface Verif. diag. test
'B' = -"- found by the System
'C' = -"- found by Mfgr. test.
Bits 4-31= Physical Sector \# containing defect
Word 2: Bits 0-3 = Status Code:
'0' = Defect has been relocated.
'8' = Defect has NOT been relocated.
(awaiting a write command)
Bits 4-31= Physical Sector \# of new block containing data
from defective block.
Relocations are performed only on WRITE operations. Errors found on
READs will result in the Status Code of '8'. After a subsequent WRITE
operation is performed, and the relocation is accomplished, the status
code will be changed to '0'. If status = '0', any READs will be to
the NEW block. If status = '8', any READs will be to the OLD block.
Note that any defect relocated by Hardware will NOT appear in the
Software defect relocation map. These maps are totally separate from
each other.
/*
* This structure represents an entry in the bad block directory
*/
struct bb_entry {
unsigned reason : 4; /* reason the block was marked bad */
unsigned bb_lsn : 28; /* the logical sector number of the*/
/* bad block */
unsigned rel_stat : 4; /* relocation status, where a 1 in */
/* most significant bit indicates */
/* bad block needs to be relocated */
unsigned rel_lsn : 28; /* the logical sector number of the*/
/* relocated block */
};
You can examine the Software defect relocation map using the following
command at AIX 3.2.x:
\# hdf /dev/hdisk\# 1000
Given the Block number that was given in your error report, here is
the algorithm that can be used to determine which partition on the
disk was effected by the block relocation:
First determine the starting point for the first partition:
\# hdf /dev/hdisk\# 13200
00013200 00000721 FE1C9325 00000000 00000000 |...!...%........|
00013210 00F80100 00000500 00020001 00000000 |................|
00013220 00020000 00000001 00010000 00000000 |................|
00013230 00000000 00000000 00000000 00000000 |................|
00013240 00050000 0000008A 00010000 00000000 |................|
00013250 00000000 00000000 00000000 00000000 |................|
00013260 00050000 0000008B 00010000 00000000 |................|
00013270 00000000 00000000 00000000 00000000 |................|
00013280 00050000 0000008C 00010000 00000000 |................|
00013290 00000000 00000000 00000000 00000000 |................|
000132A0 00050000 0000008D 00010000 00000000 |................|
000132B0 00000000 00000000 00000000 00000000 |................|
000132C0 00050000 0000008E 00010000 00000000 |................|
000132D0 00000000 00000000 00000000 00000000 |................|
000132E0 00050000 0000008F 00010000 00000000 |................|
000132F0 00000000 00000000 00000000 00000000 |................|
The start of the first partition can be found on the second
row, third column. In this example, this is 00000500 (this
is in sectors which are 512 bytes) --> This is offset A0000
(0x500 * 0x200) from the beginning of the disk.
Next translate the Block number that is recorded in the
Software Bad Block Table:
bc
obase=16 \<- Change output base to hex
2894645*512 \<- Get offset from beginning of disk
58566A00 \<- Offset in hex
ibase=16 \<- Change input base to hex
58566A00-A0000 \<- Subtract out start of first partition
584C6A00 \<- Offset including start of first partition
584C6A00/400000 \<- Divide by partition size (4MB in this case)
161 \<- Partition in hex
obase=A \<- Change output base to decimal
161 \<- Partition in hex
353 \<- Partition in decimal
Ctrl-D
lspv -M hdisk\# | grep hdisk\#:353
This is assuming that the offset for your disk is A0000. Be sure
to get the correct offset for your disk using the hdf command with
an offset of 13200.
Support Line: How to determine which LV a bad block is part of ITEM: W4429L
Dated: August 1995 Category: N/A
This HTML file was generated 99/06/24~13:30:34
Comments or suggestions?
Contact us