ITEM: G0330L

Can't recover from boot logical volume problem


Question:

AIX 3.2.3E.

My system was giving some LED number errors when trying to recover from 
failed boot logical volume.  I talked with Support and was faxed an 
article on "HOW TO RECOVER FROM FAILED BOOT LOGICAL VOLUME".  I still 
could not get my system to work properly.

Response:

Rebooted the machine with the 3.2.3E bootable media in Service Mode. 
getrootfs failed to varyon rootvg because two physical volumes are 
missing from the device data base.  The system was giving two extra PVIDs
because these disks had been physically replaced without removing the
definitions for the volume groups.  We'll need the ldeletepv command to
remove these disks...

We went to another AIX 3.2.3E system and tar'ed the ldeletepv command
onto diskette.  We then restored the ldeletepv command from the diskette 
with the following command:

\# pax -rvf /dev/rfd0 ./ldeletepv

After that we used the following commands to delete the extra physical
volumes and mount the rootvg filesystems:

\# lqueryvg -p hdisk0 -vt         (to get the VGID)
\# ldeletepv -g VGID -p PVID      (PVID of the missing disk to delete) 
\# importvg-y rootvg -f hdisk0    (importing VG by forcing) 
\# varyonvg -f rootvg             (varying on vg by forcing) 
\# mount -f /dev/hd4 /
\# mount /dev/hd2 /usr
\# mount /dev/hd3 /tmp
\# mount /dev/hd9var /var

After mounting the filesystems, we tried to recreate the boot
logical volume with the command:

\# bosboot -a -d /dev/hdisk7

This gave the error that hd5 was not on hdisk7 even though an
lquerylv said that it was...  We decided to remove hd5 and re-add
it with the following commands: 

  rmlv hd5
  mklv -y hd5 -t boot -a e rootvg 2 hdisk7

We then tried "bosboot -a -d/dev/hdisk7" again.  Received the 
following error:

  odmget: Could not retrieve object for CuAt, odm error \#5904

Error 5904 means that the criteria is incorrect.  We put debug
statements in the bosboot script to help isolate the problem.
This helped us pinpoint the error as a problem with the results
of the following statements:

                ODMDIR=/dev/objrepos pvid=`odmget -q \\
                        "name=$search_str and attribute=pvid" CuAt |
                        awk '$1 ~ /value/ {gsub(/"/,"",$3);print $3}'`

This was not returning a pvid and so subsequent use of the pvid
variable was incorrect, since it had no value.  The actual statements
that were producing the odmget error were:

                ODMDIR=$OLD_ODMDIR
                \# match pvid to diskname from hard disk ODM
                ram_dev=`odmget -q "value=$pvid" CuAt |
                        awk '$3 ~ /hdisk/ {gsub(/"/,"",$3);print $3}'`

Notice that two different object repositories are being referenced:
/dev/objrepos and /etc/objrepos.  Upon closer inspection it was 
discovered that the disk definitions in these objrect repositories
were different.  The hdisk\# in /dev/objrepos for hdisk7's PVID was
hdisk3.  We ran "bosboot -a -d/dev/hdisk3" and this returned 
successfully.

We switched the key to normal position and rebooted the system using
the following command: 

\# shutdown -Fr

The system came up fine.


Support Line: Can't recover from boot logical volume problem ITEM: G0330L
Dated: February 1994 Category: N/A
This HTML file was generated 99/06/24~13:30:50
Comments or suggestions? Contact us