How To Upgrade AIX & PSSP on a Patent Server Machine. This presumes the machine is at AIX 4.2.1 & DCE 2.1, and you're going to AIX 4.3.2 & DCE 2.2. These notes are based on many iterations, resolving things ahead of time before they become problems. Some steps are different based on whether you're acting on a standalone, RS/6000 box, or an SP/2 node. Unless otherwise stated, the step works as described for either type. I'll point out the differences. 0) A good thing to do before you get going is to verify that NIM has the correct en0 ethernet adapter's MAC address by comparing a netstat -in | grep en0 | grep -i link on the machine to be upgraded, with a lsnim -l as0nnne0 | grep if1 on the CWS. For SP/2 nodes, insure the SDR has the same MAC address with a splstdata -b frame# node# 1 | grep as0 command on the CWS. There have been ethernet adapter changes, so this just eliminates questions later should you have troubles. Another good thing to do is to check the error log on the machine, fixing and cleaning out any old entries. errpt -d H For Hardware errors, eg errclear -N 0 To clear them out Yet another good thing to do is to clear out root's mail, both already-inc'd mail (using scan, show, rmm) and yet-to-be-inc'd mail (inc it). My ksh hint to remove a lot of mail is, rmm $(scan | grep expression | awk '{print $1}' | tr -d '+') rm /Mail/inbox/,* You might also insure the ODM accurately reflects the current hardware configuration with diag -a 1) On the CWS, insure /etc/exports does not have the normal /spdata/sys1/install exported. There's a copy of the normal, everyday /etc/exports saved at /etc/exports.save, which has /cd and /spdata/sys1/install exported. This is what the CWS normally has exported, but when you run setup_server, it tries to export the /spdata/sys1/install/aix432/lppsource directory which, since it's a subdirectory of /spdata/sys1/install, get an NFS error. Use exportfs to see what's exported now, and vi /etc/exports exportfs -u /spdata/sys1/install to unexport whatever's now exported. You can keep /cd if you want. 2) Prepare the machine for the DCE conversion. mkdir /var/dce/tmp /var/dce/config /var/dce/dced/backup vi /etc/inittab and replace the rccleancred:2:wait:/etc/dce/rc.cleancred -v > /dev/console 2>&1 rcdce:2:wait:/etc/dce/rc.dcestart all > /dev/console 2>&1 lines with Replace the rccleancred line in /etc/inittab, with : cleanupdce:2:wait:/usr/bin/clean_up.dce > /dev/console 2>&1 : rcdce:2:wait:/etc/rc.dce all > /dev/console 2>&1 # Start DCE/DFS Daemons Comment out the /etc/rc.local line as well, especially for something like the DB/2 nodes, so nothing gets started up. While you're at it, check for and delete if they exist, the following lines in /etc/inittab - db, i4ls, imnss, imqss, httpdlite vi /etc/dce/rc.dce and remove unneeded parms on the dfsd startup command, at the end of the file. Unneccessary parms are - callback $RPC_SUPPORTED_NETADDRS - blocks 20000 - cachedir /dfscache - chunksize 16 Reboot the machine/node to get an IPL without DCE/DFS up. When it's back up, you may have troubles getting to the IBM Internal network 'cause /etc/rc.local is commented out. You may need to dsh -was0nnne0 route add -net 9 -netmask 255.0.0.0 192.168.56.252 from the CWS to get that routing back. Clear out the DFS cache directory. for i in $(jot 10 0);do rm /dfscache/V1$i*;done for i in $(jot 10 0);do rm /dfscache/V$i*;done find /dfscache -type f -exec rm {} \; vi /etc/inittab and un-comment-out, i.e. restore the rcdce line with /rcdce 0xx:x 3) For SP/2 nodes only, configure NIM on the CWS. For machines, this step is consolidated into step 6) below. For non-SP/2 nodes, skip to step 6). This is the command one uses with PSSP 3.1 instead of the old, spbootins command to set the AIX & PSSP level. spchvgobj -i bos.obj.ssp.432 -r rootvg -v aix432 -p PSSP-3.1 frame# node# 1 See the results of the command if you want to, with splstdata -b frame# node# 1 You'll see that next_install_image = bos.obj.ssp.432 lppsource_name = aix432 and the pssp_ver = PSSP-3.1 4) Again, this step is for SP/2 nodes only. Set the boot "response" field of the SDR to "migrate" and run setup_server, with a spbootins -r migrate frame# node# 1 command. You can see the boot response of the SDR with a splstdata -b frame# node# 1 command. The setup_server command, which takes a few minutes to complete. After it completes, check to see what's NFS exported with an exportfs command. For the as0204 install on 10/6/99, I had a lot of trouble here. It appeared setup_server wasn't doing the right thing with /etc/exports. What should be there is /spdata/sys1/install/aix432/spot/spot_aix432/usr /spdata/sys1/install/pssp/bosinst_data_migrate /spdata/sys1/install/pssp/pssp_script /export/nim/scripts/as0204e0.script All with -ro,root=as0204e0.patent.ibm.com,access=as0204e0.patent.ibm.com at the end, as well as /spdata/sys1/install/aix432/lppsource -ro I was getting different things, e.g. the lppsource line not having the extra stuff on it or missing lines altogether, which causes LED 611, I opened up PMR 24566,49R to investigate, but a week later, everything was working ok. If this situation occurs again, reopen the PMR or open a new one. The quickest way to debug, is to use a series of allnimres & unallnimres commands. allnimres -l 2 4 1, puts the lines in /etc/exports. unallnimres -l 2 4 1, takes them out presuming you've already done the spchvgobj & spbootins commands. 5) 4/3/2000 Note: I don't see any /etc/rc.sp anywhere that has this local mod I did long ago. I wonder what happened to it? Anyway, this bullet no longer applies since I lost my fix. If you want details, see my PMR 59782,49R dated 11/23/98, in my ~jasper/aixnotes/aix.support.center.history file. Beware of your local mod to /etc/rc.sp where you call /local/bin/lsof to insure inetd is indeed listening to kshell. The lsof in /local/bin doesn't work on AIX 4.3.2. I modified /etc/rc.sp to check the oslevel, and run /local/bin/lsof.for.AIX.4.3 instead. Insure the target node has either an unmodified /etc/rc.sp, or this fixed one. 6) For non-SP/2 nodes, insure the machine is up and ready to be installed, because these next commands will configure NIM on the CWS, and start the install!! On the machine, /local/bin/rearm-nim.sh On the CWS, cd $s1 ./domigratetoaix432now.sh ar0078e0 Within 20 seconds, you should see two of the following messages on the target machine, 4 seconds apart ******************************************************************* ******************************************************************* ******************************************************************* NIM has initiated a bos installation operation on this machine. Automatic reboot and reinstallation will follow shortly... ******************************************************************* ******************************************************************* ******************************************************************* Then you're logged off a few seconds later, so unless you're watching for them, you may miss these. For SP/2 nodes, these too should be powered up, else the setup_server process will hang with a dsh to the down nodes to run /usr/lpp/ssp/install/bin/services_config (if you get into that state, you could kill the individial dsh commands to the down nodes). Start the install with nodecond frame# node# which network boots the node. This will upgrade AIX & DCE on the node, but not SSP. Don't have a R/W spmon window open during this step, it'll cause it to fail. If you want to monitor things, from the CWS, tail -f /var/adm/SPlogs/spmon/nc/nc.2.5 (or whatever the latest file is in the /var/adm/SPlogs/spmon/nc directory). You can also monitor things with a R/O window via a s1term frame# node# command. Another handy way to monitor what's going on is to have a spled & window open on the CWS. On the 4/3/2000, as0205 conversion, this step hung after these lines in the /var/adm/SPlogs/spmon/nc/nc.2.5 file, Nodecond Status: command is boot /pci@fef00000/ethernet@10 Nodecond Status: network boot successfull Nodecond Status: bootp sent over ethernet Nodecond Status: waiting for "Welcome to AIX" message Nodecond Status: network boot proceeding, nodecond is exiting Nodecond Status: finished I opened a R/O s1term window to watch it more closely, and saw the bootp packets getting sent just fine, but when that image was booted, as0205 hung on LED E105. The http://rshelp.austin.ibm.com web site pointed me to PMR 48501,7TD which was an 8-week, Sev 1 problem that they were guessing at all kinds of things, then finally found an SP node, not involved with the problem at all, that had full duplex turned on for its private SP/2, BNC, 10Mbs, en0 interface, which is wrong. I checked my nodes with an appropriate dsh 'lsattr -E -l ent0 -a full_duplex 2>/dev/null' command, looking for full_duplex yes Full duplex True instead of full_duplex no Full duplex True I found that as0211 had full duplex turned on. I fixed as0211 by a chdev -l ent0 -a full_duplex=no command and as0205 installed just fine. Unfortunately, this was after I also wasted time updating the SDR 'cause a splstdata -a command showed nodes a lot of enet_rate & duplex settings to be null. The PMR said to insure they were set to 10 & half respectively. Using smitty, I changed this with a spethernt command, discovering 2 options not documented in my book, -f 10 & -d half, which are the defaults. I also fixed all the en1 interfaces, which at the time (4/4/00), were all 100/full duplex except for nodes 3-5 & 7-12 on the first frame. Now the splstdata -a command shows the right thing. I also tried applying service to the spot, but that didn't fix things either. First, restore the route to the IBM Internal network. Since /etc/rc.local is commented out, you'll need to dsh -was0nnne0 route add -net 9 -netmask 255.0.0.0 192.168.56.252 to get that routing back. On the as0101e1 install, I quickly hung on LED c48. Opening up an R/W window to the node, I saw Disks specified in bosinst.data do not define a root volume group. On the CWS, lsnim -l as0101e1 ---> bosinst_data = migrate lsnim -l migrate ---> /spdata/sys1/install/pssp/bosinst_data_migrate which had in it target_disk_data: HDISKNAME = hdisk0 I wonder if as0101e1's rootvg is on hdisk1, not hdisk0. Rebooting AIX 4.2.1 spbootins -r disk 1 1 1 nodecond 1 1 I saw lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 537 290 104..00..00..78..108 hdisk1 active 537 314 108..57..00..41..108 It looks right, so what's wrong? Hmmmm. The SDR, with a splstdata -v 1 1 1 ---> pv_list = hdisk0 Ahhh, as0101 is the only node with 2 PV's in rootvg and the SDR needs to show that. I used smitty to change this to hdisk0,hdisk1. The command it used was spchvgobj -r rootvg -h 'hdisk0,hdisk1' -c 1 -p 'PSSP-3.1' 1 1 1 Then tried installing it again with spbootins -r migrate 1 1 1 nodecond 1 1 Seems to be working this time. Hmmmm. 7) After the install, check to insure DCE/DFS got upgraded ok. I used to have a lot of troubles here, but now I think things go pretty smoothly. If DFS isn't up already after the reboot, you'll probably see on the console if you've been watching it, or in the /etc/dce/cfgdce.log file, DCE Migration cannot be performed unless no DCE daemons are running. 0x11315b66: You can attempt to stop the daemons by running the command stop.dce, or you can stop them manually. Simply rerun /etc/rc.dce all and the conversion will finish and DCE & DFS will come up. When DCE & DFS are both there ok, if a lsdce -r command doesn't show the Integrated Login (dceunixd) Configured Running line, then config.dce dce_unixd command to configure dceunixd. Restore the cleanupdce line that was commented out from /etc/inittab. 8) Check to see what you've got. For example, run oslevel -l 4.3.2.0 to see which filesets still need updating to get to AIX 4.3.2. For the 201, 204 & 206 updates, I've had to run installp -u bos.msg.en_US.diag.rte For the 201 & 204 updates, I also saw 3 devices.ssa.* filesets were still at the 4.2.1 level. I'm not sure why those filesets didn't get updated, probably because we used to have SSA drives on this system, but they got moved from when we moved the DB/2 database to as0209e0/1? I dunno. From the node, you can mount cws:/spdata/sys1/install/aix432/lppsource /mnt smitty update_all pointing it to the /mnt directory. Interestingly enough, this reinstalled ssp.jm & ssp.css. But these 3 filesets, Fileset Actual Level Maintenance Level --------------------------------------------------------------- devices.ssa.disk.rte 4.2.1.10 4.3.2.0 devices.ssa.IBM_raid.rte 4.2.1.7 4.3.2.0 devices.ssa.tm.rte 4.2.1.5 4.3.2.0 were a problem to install. Had to go to smitty installp and select "Install and Update from LATEST Available Software" at the bottom of the screen, and explicitly select the base 4.3.2.0 filesets for pointing it to the /mnt directory. and select these 3 guys, viz devices.ssa.IBM_raid.rte 4.3.2.0 devices.ssa.disk.rte 4.3.2.0 and devices.ssa.tm.rte 4.3.2.0 When you're done, umount /mnt. 9) For non-SP/2 machines, especially those I did years ago, I used to install more than was minimally required to get dsh (part of ssp.clients) working. So, remove the unneccessary filesets installp -u ssp.ha ssp.ha_clients ssp.pman ssp.sysctl ssp.sysman installp -u ssp.perlpkg ssp.basic All you need, is ssp.clients. For an SP/2 node, to upgrade the PSSP code, set the node's bootp response to customize, copy the PSSP 3.1 pssp_script to the node and run it. First, from the node itself, route add -net 9 -netmask 255.0.0.0 192.168.56.252 to restore the routing back to IBM's internal network. Then mkitab -i rc 'fbcheck:2:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot' installp -u ssp.jm ssp.css Then from the CWS, spbootins -r customize frame# node# 1 which will run setup_server, which takes a few minutes. From the CWS, type pcp -was0xxxe0 /spdata/sys1/install/pssp/pssp_script /tmp/pssp_script Now before you do this last step, check /tftpboot/script.cust on the CWS. Comment out the updating of everything except /etc/hosts, that is, at the end of the script, insure /etc/resolv.conf, /etc/passwd, /etc/group, /etc/security/passwd, /etc/security/group, and /etc/security/user are all commented out. I had to use ADSM to restore these files on 206. Then from the node, type /tmp/pssp_script or if you prefer, from the CWS, you can type dsh -was0xxxe0 /tmp/pssp_script If you want to watch the progress of this step, from another CWS window, tail -f /var/adm/SPlogs/sysman/NODE.config.log.21428 where NODE is e.g. as0202e0, and 21428 is the PID of the ksh process that got forked off by the /tmp/pssp_script program. If nothing else, check this file when you're done to insure things went ok. For some reason, a lot of times lately, it complains that the fbcheck line you just added to /etc/inittab, isn't there, and sure enough, it isn't. I've had to re-add that line, then rerun /tmp/pssp_script. Dunno why. When it's completed successfully, rm /tmp/pssp_script 10) vi /etc/motd and put the IBM message lines back in so that AIXCOPS will run clean. 11) Run lockdown.sh to insure things are still ok. It's not uncommon for lockdown to find ports that were previously either not defined or not running, now enabled. 12) Check for known errors, ls -l /etc | grep ' 0 ' # 8 zero-length files are ok. ls -l /usr/bin | grep ' 0 ' # There should be no zero-length files. ls -l /usr/sbin | grep ' 0 ' # There should be no zero-length files. ls -l /etc/services /etc/inetd.conf # Insure these are over 4,000 bytes big. grep -v '^#' /etc/inetd.conf # Should only be sshdfail, ssalld, kshell, klogin, and maybe clutild. grep '^start' /etc/rc.tcpip # Should only be syslogd, portmap, inetd, xntpd, and sshd. 13) Insure any 100Mbs ethernet interface (ent1 on SP/2 nodes, else probably ent0) is configured correctly for 100 Mb, full duplex. For unknown reasons, the initial boot has the en1 interface working ok, but subsequent reboots try configuring the interface to be 10 Mbs & half duplex. To see what the setting is in the ODM, type odmget -q 'name=ent0 AND attribute=media_speed' CuAt | grep value or odmget -q 'name=ent1 AND attribute=media_speed' CuAt | grep value It should be value = "100_Full_Duplex" not value = "10_Half_Duplex" If it's wrong, you can set/fix it with chdev -l ent1 -a media_speed=100_Full_Duplex -P The -P option says to just change the ODM database. Since en1 is working for this boot, you don't have to change it. It's just the next boot you need to fix it for, and to do that, all you need do is change the ODM. (Note these commands are different, due to it being a different ethernet adapter, than the E105 problems I describe in step 5) above.) 14) If this system has SSA on it, upgrade the SSA code with these commands, mount cws:/spdata/sys1/install/aix432/lppsource/PROD/SSA_Upgrades /mnt smitty installp Insure you select "Install and Update from ALL Available Software" at the bottom of the screen, else you won't get the SSA microcode filesets (ssamcode.* and ssadiskmcode.*). Point it to /mnt and go ahead and install everything, i.e. type "all" in the "SOFTWARE to install" field. Again, when you're done, umount /mnt To see what levels of microcode you have now, before they get updated, For each D40 enclosure, each SSA adapter, and each pdisk, lscfg -vl enclosure0 | grep ROS lscfg -vl ssa0 | grep ROS or lscfg -vl pdisk0 | grep ROS For each, Look at ROS Level and ID. See the http://www.hursley.ibm.com/ssa/rs6k web page for the latest levels for each type of device. The adapter microcode will automatically get updated the next time cfgmgr runs and the adapter & drives aren't being used. I.E. since the system is now quiesced, you can either run cfgmgr or lscfg -vl pdisk0 now (easiest), or shutdown -Fr. The enclosure or pdisk microcode, require a manual step. To update the enclosure or pdisk microcode, ensure all drives aren't in use, including, for an enclosure, any drives being used by other systems. Then - For each enclosure, type /etc/microcode/ssa_sesdld -d enclosure0 -f coral014.hex Even though the enclosure microcode is really updated, a lscfg -vl enclosure0 command will still show the downlevel code. To fix this, mkdev -l enclosure0 - To update all the drives at once, ssadload -u This takes about 30 seconds per drive. See my notes in ~jasper/aixnotes/ssa for further information. 15) vi /etc/inittab and restore both the cleanupdce & the /etc/rc.local lines that was commented out earlier. 16) shutdown -Fr You're done. You can now proudly move the line for this system from the file /dishes/aix421 on the CWS, into the /dishes/aix432 file. ==================================================================================