Notes on Installing the IBM Blades Server 2/2/2006 The blades are known collectively as "BladeCenter". The blade chassis is known as a "BladeCenter unit". We have a blade chassis (8677 model 3XU), 8 JS20 POWER blades (8842 model 42U), and 5 HS20 Intel blades (8843 model 05U). 4 of the JS20 blades were ordered with fiber with the intention of putting databases on there, but the disks we got in are all NAS, not SAN, so these fiber cards aren't doing us much good. One of the HS20 blades was ordered with a "PCI Expansion" unit and an "IBM Ultra320 SCSI Controller 2" with the intention of connecting up the Aureka tape library and building up an ADSM server for the office. This is the last blade, suil5. ----------------------------------------------------------------------------- The naming scheme I'll use is su = Sunnyvale p|i = Power or Intel a|l|w = AIX or Linux or Windows 1-n = Sequence number The plan is 6 JS20's will run AIX 5.3 and be named supa1-6 2 JS20's will run RHEL v4 and be named supl1-2 2 HS20's will run Windows and be named suiw1-2 (for Stephanie), 3 HS20's will run RHEL v4 and be named suil1-3 Slot Type/Mdl AKA Serial# Name IP Address Notes ==== ======== ======= ======= ===== ============= ==================== 8677/3XU Chassis KQFWX1V 1 8842/42U JS20 KQ00A2A supa1 10.222.211.62 Fiber, NIS, NIM, w3, contract 2 8842/42U JS20 KQ00A2C supa2 10.222.211.63 Fiber 3 8842/42U JS20 KQ00A32 supa3 10.222.211.64 Fiber 4 8842/42U JS20 KQ00A19 supa4 10.222.211.65 Fiber 5 8842/42U JS20 KQ00A25 supa5 10.222.211.66 6 8842/42U JS20 KQ00A24 supa6 10.222.211.67 7 8842/42U JS20 KQ00774 supl1 10.222.211.68 8 8842/42U JS20 KQ00A29 supl2 10.222.211.69 9 8843/05U HS20 KQKHH2C suiw1 10.222.211.70 10 8843/05U HS20 KQKHH2L suiw2 10.222.211.71 11 8843/05U HS20 KQKHH2A suil1 10.222.211.72 12 8843/05U HS20 KQKHH2F suil2 10.222.211.73 13 8843/05U HS20 KQKBZ0L suil3 10.222.211.74 TSM 14 PCI Expansion Card & SCSI Adapter -> Tape Library ----------------------------------------------------------------------------- The back of the blade chassis has a "Management Module" in the upper right hand corner that takes an ethernet cable for the ILO (Integrated Lights Out) as well as a KVM (Keyboard, Video, Mouse). The default IP address is 192.168.70.125, which by connecting my laptop to an ethernet hub along with the MM ethernet cable, and hard-wiring a 192.168.70 IP address on my laptop, I was able to connect to http://192.168.70.125, go to MM Control -> Network Interfaces *********************************************************** * * * I left this MMC all screwed up 'cause I don't quite * * understand how this should work. I have a new * * network 10.222.208.32/27 (32-63) defined using the * * ussusw01 ports FastEthernet4/39 & 4/40. * * * * 10.222.208.33 = ussusw01 * * 10.222.208.34 = MMC eth0 * * 10.222.208.35 = MMC eth1 * * 10.222.208.36 = Switch's MGT1, if 128, vlan 4095 * * 10.222.208.36-45 = sol1-sol10 (which is wrong!!) * * * * Right now (3/10/2006), MMC eth0 -> ussusw01 Fa4/40 * * and from my laptop (10.222.211.151) I can ping * * 10.222.208.34, but I can't get an http response * * back from it. I can if my laptop is connected to * * a Dialog network on ussusw01, eg 199.221.82.173. * * * * I also can telnet 10.222.208.34 to port 80 from * * ussusw01, but not from ussusw02 or ussusw03. * * This definitely is some kind of router configuration * * problem. I sent a note to Roger Goy. * * * * 3/20: Mike learned that 10.222.208.34 is the CPFW! * * I need to move my IP addresses down one. * * 10.222.208.33 = ussusw01 0012.4388.e37f * * 10.222.208.34 = Dialog's CP FW ??? * * 10.222.208.35 = blades1 = MMC eth0 000d.60f7.95cc * * 10.222.208.36 = MMC eth1 000d.60f7.95cd * * 10.222.208.37 = Switch's MGT1, if 128, vlan 4095 * * 10.222.208.38-50 = sol1-sol13 (add 37) * * * * 3/22: I found that from ussusw02, pinging * * 10.222.208.34 is talking to the Sunnyvale * * Checkpoint Firewall. From ussusw02, I did * * ping 10.222.208.34 * * show arp | include 10.222.208.34 * * which showed MAC=0004.23ae.40cd. Then a * * show arp | include 40cd * * showed that MAC address was also 10.222.208.17 and * * 10.222.208.18, which is USSUSW01's and USSUSW02's * * default route. I sent a note to Roger & Peter. * * * *********************************************************** setting Hostname to blades1 IP address to 10.222.208.34 Subnet mask to 255.255.255.224 Gateway address to 10.222.208.33 Then for the other "Internal Network Interface (eth1)", IP address to 10.222.208.35 Subnet mask to 255.255.255.224 Gateway address to 10.222.208.33 Over in MM Control -> Network Protocols -> Domain Name System (DNS) I set DNS to Enabled DNS server IP address 1 to 199.221.80.141 DNS server IP address 2 to 199.222.82.28 The default userid/password is USERID/PASSW0RD (zero, not Oh). Just for kicks, I added another userid/password, admin/admin. You can point a browser to http://blades1 (not https) and login to poke around. ----------------------------------------------------------------------------- In the upper left hand corner of the back of the blade chassis, is the "Nortel Networks Layer 2/3 GbE (GbE stands for Gigabit Ethernet) Switch Module for IBM eserver BladeCenter" (whew!). This also had to be configured. EG, tn 10.222.211.101 Enter password: admin (For some reason, you can't use the other interface, eg tn 10.222.208.37 to login. It asks you for a userid & password prompt, unlike telnet-ing to 10.222.211.101, and admin/admin doesn't work.) [Main Menu] Mar 22 10:53:03 NOTICE mgmt: admin login from host 10.222.211.18 info - Information Menu stats - Statistics Menu cfg - Configuration Menu oper - Operations Command Menu boot - Boot Options Menu maint - Maintenance Menu diff - Show pending config changes [global command] apply - Apply pending config changes [global command] save - Save updated config to FLASH [global command] revert - Revert pending or applied changes [global command] exit - Exit [global command, always available] >> Main# cfg ----------------------------------------------------------------------------- [Configuration Menu] sys - System-wide Parameter Menu port - Port Menu l2 - Layer 2 Menu l3 - Layer 3 Menu qos - QOS Menu acl - Access Control List Menu pmirr - Port Mirroring Menu setup - Step by step configuration set up dump - Dump current configuration to script file ptcfg - Backup current configuration to FTP/TFTP server gtcfg - Restore current configuration from FTP/TFTP server >> Configuration# port Enter port (INT1-14, MGT1-2, EXT1-6): MGT1 ----------------------------------------------------------------------------- [Port MGT1 Menu] gig - Gig Phy Menu aclqos - Acl/Qos Configuration Menu 8021ppri - Set default 802.1p priority pvid - Set default port VLAN id name - Set port name dscpmrk - Enable/disable DSCP remarking for port tag - Enable/disable VLAN tagging for port fastfwd - Enable/disable Port Fast Forwarding mode ena - Enable port dis - Disable port cur - Display current port configuration Nope, that was wrong. Use .. to back up one level menu >> Port MGT1# .. ----------------------------------------------------------------------------- [Configuration Menu] sys - System-wide Parameter Menu port - Port Menu l2 - Layer 2 Menu l3 - Layer 3 Menu qos - QOS Menu acl - Access Control List Menu pmirr - Port Mirroring Menu setup - Step by step configuration set up dump - Dump current configuration to script file ptcfg - Backup current configuration to FTP/TFTP server gtcfg - Restore current configuration from FTP/TFTP server >> Configuration# l3 ----------------------------------------------------------------------------- [Layer 3 Menu] if - Interface Menu gw - Default Gateway Menu route - Static Route Menu arp - ARP Menu frwd - Forwarding Menu nwf - Network Filters Menu rmap - Route Map Menu rip - Routing Information Protocol Menu ospf - Open Shortest Path First (OSPF) Menu bgp - Border Gateway Protocol Menu igmp - IGMP Menu dns - Domain Name System Menu bootp - Bootstrap Protocol Relay Menu vrrp - Virtual Router Redundancy Protocol Menu rtrid - Set router ID cur - Display current IP configuration >> Layer 3# if Enter interface number: (1-128) 128 ----------------------------------------------------------------------------- [IP Interface 128 Menu] addr - Set IP address mask - Set subnet mask vlan - Set VLAN number relay - Enable/disable BOOTP relay ena - Enable IP interface dis - Disable IP interface del - Delete IP interface cur - Display current interface configuration >> IP Interface 128# addr Current IP address: 192.168.70.127 Enter new IP address: 10.222.208.37 >> IP Interface 128# mask Current subnet mask: 255.255.255.0 Enter new subnet mask: 255.255.255.224 You were then suppose to apply then save, but this didn't work for me 'till the next day. I dunno. You figure it out. And by the way, the easier way to configure this interface is through the MM web interface. http://blades1 I/O Module Tasks -> Configuration On that screen, modify the "Bay 1 (Ethernet SM)" section as appropriate, then click the "Save" button. I have IP address 10.222.208.37 Subnet Mask 255.255.255.224 Gateway address 10.222.208.33 ----------------------------------------------------------------------------- The blades feature Serial Over Lan (SOL) to get to the serial consoles. I don't know much about SOL, but in one screen of the Managment Module web interface, Blade Tasks -> Serial Over LAN it talks about assigning a block of consecutive IP address for SOL down in the "BSMP IP address range" field. The "BladeCenter: Serial Over LAN Setup Guide" on the DVD-based documentation, says - BSMP = Blade System Managment Processor and it's something each blade server has, ie, it's not the Management Module (MM). (pg 7) - There's a section on page 8 re: RHEL config on the HS20's. - On page 21, it says "The SOL blade system managment processor (BSMP) address range can include any valid IP addresses that do not conflict with any of the IP addresses of the blade servers. The BSMP address range is used for internal communication between the blade server and the I/O module and is not exposed to the external network." I tried using 10.222.208.38-50, calling them sol1-sol13, but this didn't work. I went back to 192.168.0.1-13 and I removed the sol1-sol13 IP names. To get a serial console connection to the first blade, ssh blades1 (Before configuring ssh, you had to use telnet, but since telnet & ssh both use port 23, you can't have both active. It's one or the other.) Authenticate as USERID/PASSW0RD (or admin/admin) env -T system:blade[1] (the square brackets are required) console or console -T blade[3] The escape sequence is ctrl-[ followed by (, aka ^[( This had a horrible 2-minute idle timeout, which I increased to 2 days (172800 seconds) via the http://blades1 screens. ----------------------------------------------------------------------------- I originally tried to use NIM to install AIX on supa2-6, but ... - Define supa2-6 as NIM clients. The manual says you have to define their MAC addresses, but this didn't use to be required. I remember just IP addresses was enough. But, I got them from the MMC screen, so this wasn't too difficult. - Allocate the following NIM resources. smitty Software Installation and Maintenance Network Installation Management Perform NIM Administration Tasks Manage Machines Manage Network Install Resource Allocation Allocate Network Install Resources Select the machine from the list Select > 530lpp_res lpp_source > 530spot_res spot > Extra_Software installp_bundle > Extra_AIX installp_bundle > openssh installp_bundle > blades_bosinst_data bosinst_data - "Perform a Network Install" smitty Software Installation and Maintenance Network Installation Management Perform NIM Administration Tasks Manage Machines Perform Operations on Machines Select the machine from the list Select bos_inst Change "Source for BOS Runtime Files" from rte to spot "Initiate Boot Operation on Client?" from yes to no and "ACCEPT new license agreements?" from no to yes - tried booting via a broadcast bootp, but this didn't work - tried configuring a directed bootp and it had me type in this long cmd on a serial over LAN console, boot /pci@8000000f8000000/pci@0/ethernet@1,1:bootp,10.222.211.62,,10.222.211.63,10.222.211.1 but all I got was LINK IS DOWN! I reverted to installing AIX from the CDs. Damn! After all that work setting up NIM on supa1. Need to update the microcode, maybe? ============================================================================= To install AIX via CDs, see my aixnotes/installing.aix.5.3 file. ============================================================================= While trying to install supa3, I got the following Machine Check. MCK: log error to NVRAM MCK: Error Log at 0x0FACE008 MCK: Length: 114 0x04444409 0x0000006A 0xC6008200 0x08304200 0x20060324 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x49424D00 0x55383834 0x322E3432 0x552E4B51 0x30304133 0x322D5031 0x43340000 0x000C4646 0x00000000 0x080000F5 0x000C4646 0x00000000 0x10060000 0x000C4646 0x00000000 0x00000004 0x00020000 0x00000000 0x00000000 0x00000000 sysmgt.websm.webaccess sysmgt.websm.diag sysmgt.help.msg.en_US.websm sysmgt.help.en_US.websm sysmgtlib.framework sysmgtlib.libraries sysmgt.sguide.rte Java14.sdk WARNING: Unable to stop all CPUs. RTAS/PAL Machine Check .pal_command+0000E0 b <.pal_command+0005BC> r3=000010B2 KDB(0)> I called this into IBM on Friday, 3-24 @ noon and they were more interested in what the Management Module had to say in its "Event Log" than the above. They also wanted to insure all the microcode was up-to-date. Sigh! There sure is a lot of it. Each JS20 blade has a BIOS at level 2b405_131 and a Blade sys. mgmt. proc. at level BQ8T22A Each HS20 blade has a BIOS at level BWE116AUS and Diagnostics at level BWYT08AUS and a Blade sys. mgmt. proc. at level BWBT16A Then you have the I/O Module Firmware. In Bay 1 is the Ethernet SM with Boot ROM at level WMZ01001 and Main Application 1 at level WMZ01001 In Bay 3 is the Optical PM with Boot ROM at level BROROPSM And finally, there's the Management Module Firmware VPD In Bay 1 is the SunnyvaleBlade1 with the Main application at level BRET82F and the Boot ROM at level BRBR82A and the Remote Control BRRG82F They had me take out the blade, open its cover and move a jumper so that the blade would boot from permanent, system BIOS code, but I still got a machine check, so they finally decided to replace the JS20 blade's mother board. Case # 32R7JVB. - - - - - - - - - - - - - - - - - - - - - - - - - Tony came in on Tuesday, March 28 and swapped the mother board for blade 3, but I quickly got this machine check when installing AIX. MCK: log error to NVRAM MCK: Error Log at 0x0FACE008 MCK: Length: 114 0x04444409 0x0000006A 0xC6008200 0x00463600 0x20060329 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x49424D00 0x55383834 0x322E3458 0x582E4B51 0x30304133 0x322D5031 0x43340000 0x000C4646 0x00000000 0x080000C7 0x000C4646 0x00000000 0x93FAE648 0x000C4646 0x00000000 0x00000004 0x00020000 0x00000000 0x00000000 0x00000000 WARNING: Unable to stop all CPUs. RTAS/PAL Machine Check .pal_command+0000E0 b <.pal_command+0005BC> r3=000010B2 which is the same error, so we tried swapping memory with blade 6. That seemed to do it, so one of the 4 memory chips are bad. ============================================================================= While trying to install suil3 (blade 13), I couldn't boot the DVD. The graphical console said very early in the BIOS built-in self-test, ... Intel Xeon 2.8 GHz 2 Processors Installed See the System Error Log. Re-booting due to Unexpected NMI at 222F:356E (NMI = Non-Maskable Interrupt) On the blade Management Module Monitors -> System Status -> Event Log, were multiple of these pairs of Error entries for BLADE_13 (suil3) Critical Interrupt - Front panel NMI (suil3) SMI Hdlr: 00150400 SERR: Device Signaled SERR Slot=00 VendID=8086 DevID=3597 Status=0000 Called IBM Repair (1-800-426-7378) for this 8843 (HS20), serial # KQKBZ0L. BIOS name for the bios update www.pc.ibm.com/support 41y7909.img -> Blade 13 BIOS 41y7903.img -> Blade 13 BMC EMT4INST.EXE - downloaded, installed, created C:\Program Files\EMT4WIN to write two diskettes using the above 2 image files. Do the system BIOS first. Remove the PCI expansion unit (!!), then set the media blade focus to blade 13, put the diskette in and reboot. It'll walk you through this update. Then shutdown, put the BMC diskette in, and reboot again. This time, it updates things on its own. Tom also suggests I update something else for the 2 Broadcom NIC adapters in this blade. Find 42C5634.exe, download, extract to c:\junk\bcom\fw\pkg120_16 and run SelfExtract_v120_16.EXE you'll find there. This writes to yet another diskette that you boot. So shutdown again, put this diskette in, and reboot again. At the DOS prompt, enter update 8843 Then shut it down again, put the PCI unit back on then reboot. Repeated this trio of boot floppies for each HS20. =============================================================================