AIX Tip of the Week

Subject: My Best Practices for Power5 Microcode

Audience: All

Date: December 12, 2005

System microcode and HMC versions are becoming increasingly coupled. New microcode requires the latest HMC version. And the latest HMC version may not be fully backward compatible with older microcode.

Customers are responsible for maintaining the HMC and system microcode on most pSeries models. To avoid problems, here are "my best practices" for maintaining HMC-microcode. This list intended for mission critical, production systems. Further it is intended for system microcode, rather than adapter microcode. You'll notice that some of the practices are conflicting. In those cases, use the "best compromise" for your environment. And a final disclaimer: these are my best practices, and not necessarily those of my employer.

My Best System Microcode/HMC Practices

  1. "If it ain't broke, don't fix."
  2. Be on an IBM supported and compatible combination of microcode and HMC.
  3. Let new releases "age" a couple of months before applying.
  4. Check with IBM Support for known problems before updating
  5. Practice the update on a test system before implementing in production.
  6. Backup the HMC and configuration data before updating.
  7. For 7x24 systems, use a backup server to run production while doing the update.
  8. In extremely critical situations, have a spare FSP available.
  9. Check the microcode on new servers. Update as necessary.
  10. If unsure, call IBM Support.

Valid reasons for upgrading include moving to a supported version, fixing a problem or adding a new function. Otherwise, if a production system is running smoothly, I would not upgrade. Upgrades are disruptive (IPL) and can introduce new problems.

Don't feel compelled to upgrade to every new release level. There have been four microcode releases in the last five months. Three of the four releases required downtime. This is neither reasonable or necessary. All else being equal, I recommend updating once or twice per year to the "n-1" version.

Let new versions age a couple of months before applying. Check with IBM Support before to determine whether there are any known problems before proceeding. (I recommend calling Support during prime shift, which tends to have higher skill levels.)

Be sure the microcode/HMC levels are compatible. Unfortunately, compatible versions are not documented. This will be fixed next year. In the mean time, my advice is to use microcode and HMC releases with similar release dates. The closer the release dates, the lower the chance of having a problem.

New servers may arrive at different microcode levels. (Depends on how long the system sat in the warehouse.) Be sure to check all the microcode levels and update as necessary.

Tools, Practical Considerations

The AIX "invscout" command lists current microcode levels, and can be used to produce a report that can be uploaded to an IBM web site and compared against current microcode levels. For more information, see http://publib.boulder.ibm.com/infocenter/pseries/topic/com.ibm.aix.doc/cmds/aixcmds3/invscout.htm

Updates can be downloaded from the Internet at IBM Fix Central: http://www-912.ibm.com/eserver/support/fixes /

Naming convention for microcode updates:

01SFXXX_YYY_ZZZ


XXX is the release level
YYY is the service pack level
ZZZ is the last disruptive service pack level

Upgrades between release levels (xxx) are disruptive (must re-IPL). Updates between service pack levels may be run concurrently.

You may need to upgrade an existing HMC to support a new server. As soon as you upgrade the HMC for the new server, you may need to upgrade the existing managed servers.

Current Issues

You can run into problems if the microcode and HMC levels are far apart (1 year). The systems can range from not being able to create new virtual devices, do dynamic operations, or install a new partition. If you have upgraded the HMC, and experience problems on the server, check the microcode level. I've twice experienced false failures when upgrading microcode. The failure message appears on the HMC, and looks pretty serious. However, in both cases, it was a timing error on the HMC which corrupted the server's profile. The fix was to reboot the HMC. If you experience a similar problem, try rebooting the HMC before calling support.



Bruce Spencer,
baspence@us.ibm.com

December 12, 2005