ITEM: J0700L

System down with LED 223/229.


Question:

I have a RISC/6000 model 53H running AIX version 3.2.4.  We were
adding SoftWare packages (ArcInfo) and after we were done, we rebooted
and the system hung on alternating 223/229 on the LED.  Since then 
we have rebooted in Service mode but were unable to get the system 
up.  Please help.

Response:

I had the customer boot off his 3.2.4 diskette in Service mode and run
getrootfs on hdisk0.  It came back with an "invalid disk" error.  When
we ran /usr/sbin/getrootfs it returned hdisk5 as a valid boot disk, so 
we ran getrootfs on hdisk5.  We were able to import the volume group and
did an lsdev and all the hard disks came up "available".

I had the customer run the bootlist command and we rebooted in Normal
mode.  Upon reboot the system hung at led 814, which indicated that 
the NVRAM was being identified or configured.  We powered down and 
removed the battery for a few minutes and then put it back and 
rebooted.  This time it hung on led 221, which meant that NVRAM CRC 
was miscompared.  We booted up in Service mode, re-ran the bootlist 
command, and booted in Normal, this time the system hung on led 814.  
We tried to boot in Service mode off the hard drive to run diagnostics 
but the system hung on led 253.  This tells us that there is not a
valid boot image on any of the hard drives.  Since we were still
coming up with LED 814 we removed the NVRAM battery for 30 minutes 
and then attempt a reboot in Normal.  We should have let it drain
for about 30 minutes earlier.

After draining the battery for 30 minutes you rebooted in normal 
and got stuck at led 221.  Had you bootup in Service mode off your
boot diskettes.  We did an lqueryvg on hdisk0 (lqueryvg -pAt hdisk0), 
and looked at the logical volumes (paging00, paging01, loglv01, etc).  
We continued to do lqueryvg's on the other hdisks and found that 
hdisk5 and hdisk3 were members of rootvg.  Running "getrootfs hdisk5"
failed saying that it was an invalid disk.  This left hdisk3 to have
the boot logical volume on it (also known as hd5).  We ran a getrootfs 
on hdisk3 and rootvg imported and vary'ed on.

Now, at this point some things were happening that can be quite 
confusing.  We know that hdisk3 and hdisk5 were part of rootvg.
We had run getrootfs and now rootvg was imported and vary'ed on.
Also, all the filesystems were mounted for maintenance work.  This
also meant the the ODM information from normal mode was in our
default path.  So, if we ran any commands that dealt with the ODM,
we were getting information about the system configuration from 
normal mode, not service mode.  This became the most confusing when
running commands like lspv and lslv.  lspv showed hdisk0 and hdisk1
to be part of rootvg, not hdisk3 and hdisk5.  In reality, hdisk3 and
hdisk5 are the real physical volumes with rootvg on them.  This is 
due to the fact that when booting up in maintenance mode (MM) the
hdisks are not necessarily numbered the same as in Normal mode.  All
the system does is go out along the SCSI bus and assigns the first
disk as hdisk0, second as hdisk1 and so on.  Now assume that you had
your system and had a scsi adapter in slot 8 and had 2 hard drives
off that adapter.  They would be hdisk0 and hdisk1.  Now, you add 
another adapter in slot 7 and you have hdisk2 and hdisk3 off of it.
This is all in normal mode.  But, if you boot up in service mode, or
maintenance mode, it will see the adapter in slot 7 first and name 
those hard drives as hdisk0 and hdisk1 and then go to the adapter in
slot 8 and name then hdisk2 and hdisk3.  Note that these are named 
backwards.  This is why it can be confusing when you are booted in
MM and you are using the ODM from normal mode.  Not all the devices
are necessarily the same names as they were in normal mode.

When we booted in MM and got to a prompt in the maintenance shell, 
we found that hdisk3 and hdisk5 were part of rootvg by issuing:

   \# lqueryvg -pAt hdisk3 (and also on hdisk5)

We found that hdisk3 had the blv on it so we did a:

   \# getrootfs hdisk3

This brought up the filesystems just fine.  People were getting
confused at this point because when we brought in the rootFS we also
brought in the ODM information that we use in normal mode.  This ODM
information had hdisk0 and hdisk1 being rootvg and so on.  But, since 
we knew that we were still in MM and that rootvg was really on hdisk3 
we did:

   \# bosboot -a -d /dev/hdisk3 (which is actually hdisk0 in normal)
   \# bootlist -m normal hdisk3
   \# shutdown -Fr (with key in normal)

We hung again on a 221.  Did a reboot with key in service and it was
able to boot of hard drive successfully into diag.  We were not able
to do this before so it showed that the bosboot was successful.  Now,
to "jump start" the system into normal mode we rebooted with the key
in service mode.  As soon as the LED hit 299 you flipped the key to
normal mode.  This booted the machine up normally and you got a login
prompt.  You logged in as root and ran:

   \# bootlist -m normal hdisk0
   \# shutdown -Fr

This rebooted your system and it came up just fine now in normal mode.

You are going to kick off a mksysb very soon now that your system
is back up and running.

The customer could not get his other volume groups to varyon after
getting the system back up. I stepped him through exporting/importing
all the other volume groups in his system after making sure the VGDA's
were okay.  After that, they all varied on fine and all his
filesystems mounted.


Support Line: System down with LED 223/229. ITEM: J0700L
Dated: May 1994 Category: N/A
This HTML file was generated 99/06/24~13:30:44
Comments or suggestions? Contact us