IBM Books

Planning Volume 2, Control Workstation and Software Environment


Developing your migration strategy

The intent of this stage of your migration planning activity is to focus primarily on the scope of your migration in terms of the number of nodes, and the methodology to be employed in doing this. You should be entering this planning stage with a basic definition of what you want to migrate (like how many and which nodes), and possibly with some thoughts on how to go about this.

If you are migrating an entire SP system or an existing system partition, the next two sections on coexistence and system partitions might be unnecessary. If, on the other hand, you are interested in migrating a subset of your system and you are not familiar with the options available to you, the information in those sections on using multiple system partitions and coexistence for migration might be helpful.

Coexistence and system partitioning are options that can provide flexibility in the number of nodes that you need to migrate at any one time. Your migration goals might suggest the use of multiple system partitions, coexistence, a combination of the two, or neither. Understanding the advantages and disadvantages of coexistence and system partitioning will help you assess their suitability for your needs.

Other factors that will influence your migration strategy include:

Each of these factors is discussed later.

Using system partitions for migration

The SP system supports multiple system partitions, |except in switchless clustered enterprise server configurations and |SP Switch2 system configurations. SP system partitions effectively subdivide an SP system into logical systems. These logical systems have two primary features:

  1. SP Switch traffic in a system partition is isolated to nodes within that system partition.
  2. Multiple system partitions can run different levels of AIX and SP software.

These features facilitate migrating RS/6000 nodes in relative isolation from the rest of the system. Using these features, you can define a system test partition for newly migrated nodes. After the migration is complete and you have validated system performance, the nodes can be returned to production.

You have the ability to use system partitioning for migration due to the fact that the SP Switch traffic in a system partition is isolated to nodes within that system partition. This factor stems from the SP Switch architecture, in which the switch chip connects nodes in a specific sequence. The switch chip therefore becomes the basic building block for a system partition and establishes a minimum partition size that depends on the partition's node types. It is this partition size that sets the granularity with which an SP system can be upgraded to new software levels. Coexistence, described in the next section, can provide even finer granularity within a system partition.

See Chapter 7, Planning SP system partitions for additional information on the use of SP system partitions.

Using coexistence for migration

In traditional SP system partitions, all nodes within a single system partition generally run the same levels of operating system and system support software. However, different partitions can run different levels of operating system and system support software. Therefore multiple release levels of licensed programs like Parallel Environment can run on an SP without restriction as long as each different release level is within a separate system partition.

For many installations with the need to migrate a small number of |nodes, for switchless clustered enterprise servers, or for SP |Switch2 system configurations, the system partition approach is not viable. This is true in a small system (in terms of number of nodes), or a system with a migration requirement that includes migrating less nodes than can be represented by a system partition, possibly only one node (such as for LAN consolidation). It might also be that the switch isolation function is not desired. Coexistence is aimed specifically at providing additional flexibility for migration scenarios where system partitioning is not desired.

Coexistence support is provided for multiple levels of PSSP and coordinating levels of AIX in the same system partition. However, there are requirements and certain limitations that must be understood and adhered to in considering the use of coexistence. Some of the SP-related licensed programs are not supported or are restricted in a mixed system partition. For example, the SP parallel processing licensed programs (such as Parallel Environment) are generally not supported in mixed system partitions. Inter-node communication over the SP Switch using TCP/IP is supported, but user space communication is not available in a coexistence configuration. The supported coexistence configurations and the limitations that apply to these coexistence configurations are described in the remainder of this section.

Migration and coexistence limitations

|PSSP 3.4 is supported on AIX 5L 5.1 or AIX |4.3.3. The PSSP software does have some 64-bit support |features. See Support for applications with 64-bit addressing.

|PSSP 3.4 supports multiple levels of AIX and PSSP in the same |system partition. Keep in mind that an unpartitioned system is actually |a single default system partition. However, only certain combinations |of PSSP and AIX are supported to coexist in a system partition.

Coexistence is supported in the same system partition or a single default system partition (the entire SP system) for nodes running any combination of:

Table 48 lists the levels of PSSP and corresponding levels of AIX that are supported in a mixed system partition.
|

|Table 48. Levels of PSSP and AIX supported in a mixed system partition


AIX 4.2.1 AIX 4.3.3 AIX 5L 5.1
PSSP 3.4
S S
PSSP 3.2
S
PSSP 3.1
S
PSSP 2.4 S S
Note:
PSSP 2.4 is supported for migration only.

In general, any combination of the PSSP levels listed in Table 48 can coexist in a system partition and you can migrate to a new level of PSSP or AIX one node at a time. However, some PSSP components and related licensed programs still have some limitations, |including some that relate to hardware. For instance, in a |system with the SP Switch2 all nodes must run PSSP 3.2 or later and if |you want to have optional connectivity to the SP Switch2 you must run PSSP |3.4 on all nodes. Also, many software program features have PSSP |and AIX level dependencies - you must ensure that the proper release levels of these licensed programs are used on nodes running the coordinating supported PSSP and AIX levels.

|Coexistence combinations that are not supported, while |all others are allowed, are the following: |

The PSSP components and related licensed programs discussed in this section have notable exceptions that might restrict your ability to migrate one node at a time or might limit your coexistence options.

Note:
Before migrating any node, the control workstation must be migrated to the highest level of PSSP and AIX that you intend to have on your system. Program Temporary Fixes (PTFs) might be required. See the Read This First document for latest updates.

Switch management and TCP/IP over the switch

|The switch support component of PSSP (CSS), provides for switch |management and TCP/IP over the switch between nodes in a mixed |partition. Certain hardware, new nodes and adapters, might require a |specific level of PSSP and related licensed programs. Be very careful |to coordinate both your current and new hardware requirements with the |software requirements. For example, a p690 must run AIX 5L 5.1 |and the control workstation must also run AIX 5L 5.1.

Coexistence statement

For IP communication over the switch:

Note:
If you are using Parallel Environment, see the PE coexistence statement for more limitations that might apply.

Migration statement

Switch support software does not interfere with migrating one node at a time. Converting from one type of switch to another is a hardware configuration change, not a migration change, but the sequence of activities must be considered. Convert a High Performance switch to the |SP Switch2 or SP Switch before migrating to PSSP 3.4. |To convert from the SP Switch to the SP Switch2, all nodes must run PSSP |3.2 or later and the system must be configured as a single SP system |partition.

|Though the SP Switch2 requires PSSP 3.2 or later, to use the |optional connectivity feature all nodes must run PSSP 3.4, even those |not connected to the switch. You cannot add the SP Switch2 to a system |in any other configuration. The SP Switch2 does not support SP system |partitioning. If your SP is system partitioned and you plan to install |an SP Switch2, your SP must be reconfigured to a single system partition |before that switch can be installed and the PSSP software migrated to PSSP |3.4.

More of switch management has been automated. If you have switch commands in local scripts and procedures, consider removing them and rely on the automation available in PSSP. On the other hand, if you prefer, you can turn off automatic switching. You will have to turn it off any time you boot the control workstation.

Extension node support

Extension node support will function in a mixed system partition of nodes running any combination of the supported PSSP levels. If a node with PSSP 2.2 or earlier is allowed to become the primary node for the SP Switch, the dependent node connection to the SP Switch will be disabled automatically. In the event of a failure it is the administrator's responsibility to override the newly assigned primary or primary backup (to ensure it is a node running PSSP 2.4 or later).

Extension nodes are not supported in SP systems with the SP Switch2.

|Security support

|

|As of PSSP 3.2 you have the option of running PSSP with an enhanced |level of security. The restricted root access option removes the |dependency PSSP has to internally issue rsh and rcp |commands as a root user from a node. Any such actions can only be run |from the control workstation or from nodes configured to authorize |them. PSSP does not automatically grant authorization for a root user |to issue rsh and rcp commands from a node. If you |enable restricted root access some procedures might not work as |documented. For example, to run HACMP an administrator must grant the |authorizations for a root user to issue rsh and rcp |commands that PSSP would otherwise grant automatically. See Considering restricted root access for a description of this option.

|With PSSP 3.4 you can have a secure remote command process to run in |place of the rsh and rcp commands. See Considering a secure remote command process for a description of this option.

|See Limitations when using restricted root access and Considering choosing none for AIX remote command authorization for a complete list of limitations.

|As of AIX 4.3.1 the AIX remote command suite was enhanced to |support DCE security services. The suite includes rsh, |rcp, rlogin, telnet, and ftp. PSSP |3.1 or later uses these enhanced AIX commands. For SP migration |purposes, the AIX remote commands, rsh and rcp, were |enhanced to call an SP-supplied Kerberos V4 set of rsh and |rcp subroutines. The AIX /usr/bin/rsh and the |/usr/bin/rcp on the SP supports multiple authentication methods, |including Kerberos V5, Kerberos V4, and standard AIX authentication.

|As of PSSP 3.1, the /usr/lpp/ssp/rcmd/bin/rsh and |/usr/lpp/ssp/rcmd/bin/rcp commands are symbolic links to the AIX |/usr/bin/rsh and /usr/bin/rcp commands |respectively. Be aware of the following with respect to the |rsh and rcp commands: |

|If you already have DCE for remote command authentication with PSSP |3.1 or later (for k5 support) you can continue using this security |configuration with PSSP 3.4. You will need the same procedures |you might have developed for mksysb node installs with PSSP 3.1. |Alternatively, you might choose to move to the full DCE implementation |provided by PSSP after upgrading your nodes to PSSP 3.4. |Remember, instructions are provided in the book PSSP: Installation and Migration Guide.

DCE and HACWS restriction:

If you plan to have DCE authentication enabled, you cannot use HACWS. If you already use HACWS, do not enable DCE authentication.

|Coexistence statement

|For coexistence, the following criteria apply: |

|Migration statement

|SP security does not interfere with a node by node migration of PSSP or AIX |levels. The following criteria apply: |

High Availability Cluster Multi-Processing

IBM's tool for building UNIX-based mission-critical computing platforms is the High Availability Cluster Multi-Processing (HACMP) licensed program. The HACMP program ensures that critical resources are available for processing. Currently HACMP is one licensed program with several features, one of which is the Enhanced Scalability feature called HACMP/ES.

Note:
Except in statements that are explicitly about HACMP and HACMP/ES separately, all statements about HACMP apply to HACMP/ES as well.

|While PSSP has no direct requirement for HACMP, if you install |hardware that requires PSSP 3.4 regardless of system partitioning, and |you already use or are planning to use HACMP, you must also install and use |HACMP 4.4.1. See the appropriate HACMP documentation for |the latest information on which levels of HACMP you need for the hardware on |your SP system.

If you have existing HACMP clusters, you can migrate to the HACMP/ES feature and re-use all of your existing configuration definitions and customized scripts.

|HACMP can be run on the SP nodes and on servers whether SP-attached |or in a cluster system configuration. Do not run HACMP and HACMP/ES on |the same node. Typically, HACMP is run on the control workstation only |if HACWS is being used. HACMP or HACMP/ES can run on the control |workstation. HACWS can run on an HACMP/ES cluster or an HACMP cluster. HACMP has a dependency on the RSCT software. See The RS/6000 Cluster Technology components.

Enhanced security options restrictions:

HACMP is not automatically supported with restricted root access enabled. As of PSSP 3.2 you have the option of running PSSP with an enhanced level of security. The restricted root access option removes the dependency PSSP has to internally issue rsh and rcp commands as a root user from a node. Any such actions can only be run from the control workstation or from nodes configured to authorize them. PSSP does not automatically grant authorization for a root user to issue rsh and rcp commands from a node. If you enable this option some procedures might not work as documented. For example, to run HACMP an administrator must grant the authorizations for a root user to issue rsh and rcp commands that PSSP would otherwise grant automatically.

See Considering restricted root access, Considering a secure remote command process, and Considering choosing none for AIX remote command authorization for option descriptions and a complete list of limitations.

Coexistence statement

HACMP 4.4.1 is not compatible with any of the earlier versions. While there is a version compatibility function to allow HACMP 4.4.1 to temporarily coexist in a cluster with mixed releases, it is intended as a migration aid only. Once the migration is completed, each processor in the HACMP cluster must be at the same AIX and HACMP release levels, including all PTFs.

HACMP and HACMP/ES can coexist in a mixed system partition containing nodes running the supported combinations of PSSP with the following conditions:

Migration statement

Table 49 lists the levels of PSSP and corresponding levels of AIX in which HACMP levels can coexist during migration only. HACMP function will not work during this time.
|

|Table 49. HACMP Levels supported during migration only

HACMP AIX 4.3.3 AIX 5L 5.1 PSSP 3.1 PSSP 3.2 PSSP 3.4
4.4.1 S S
S S
4.4.0 S S
S S
4.3.1 S S S

4.3.0 S
S

Note:
HACMP 4.4.1 is not compatible with previous releases. The HACMP version compatibility function exists only to ease migration, not to provide long-term compatibility between versions of the program.

|HACMP 4.4.1 on the SP requires AIX 5L or AIX |4.3.3 (or later) and PSSP 3.4. You have the |following migration options: |

|The functions provided by HACMP 4.4.1 will not work |until after it is installed on all the nodes in the cluster.

Virtual Shared Disk

The Virtual Shared Disk component of PSSP can coexist and interoperate in a mixed system partition with any combination of the supported levels of PSSP, but the level of Virtual Shared Disk function in the configuration is that of the earliest version of PSSP in the system partition. For example, if you have a mixed system partition with nodes running |PSSP 3.4 and PSSP 3.1, the level of Virtual Shared Disk function available is that of PSSP 3.1.

The Virtual Shared Disk component does not interfere with migrating one node at a time.

Note:
|IBM virtual shared disks are supported only on an SP system or |clustered enterprise servers with an SP switch. In an SP Switch2 system |where some nodes are not on the switch, virtual shared disks can work only |with those nodes that are on the switch.
Enhanced security options restrictions:

The Virtual Shared Disk component is not supported with restricted root access enabled. See Considering restricted root access, Considering a secure remote command process, and Considering choosing none for AIX remote command authorization for option descriptions and a complete list of limitations.

Recoverable Virtual Shared Disk

Planning to migrate Recoverable Virtual Shared Disk calls for careful consideration. Before PSSP 3.1, IBM Recoverable Virtual Shared Disk was a separate licensed program. Now it is an optional component which comes with PSSP but you must choose to install it along with the Virtual Shared Disk component. It has dependencies on other optional components of PSSP.

Recoverable Virtual Shared Disk functions at the earliest installed level of PSSP in the system partition. You must use the rvsdrestrict command to choose the specific level that Recoverable Virtual Shared Disk is to run in a mixed system partition. |If a node has a lower level of the Recoverable Virtual Shared Disk |software installed than what is set with this command, the rvsd subsystem will |not start on that node.

|The rvsdrestrict command does not dynamically change |Recoverable Virtual Shared Disk run levels across the system. An |instance of Recoverable Virtual Shared Disk only reacts to this information |after being restarted. If your cluster runs at a given level, and you |want to override that level, stop Recoverable Virtual Shared Disk on all |nodes, run the rvsdrestrict command to change the level, then restart.

Enhanced security options restrictions:

Recoverable Virtual Shared Disk is not supported with restricted root access enabled. See Considering restricted root access, Considering a secure remote command process, and Considering choosing none for AIX remote command authorization for option descriptions and a complete list of limitations.

Recoverable Virtual Shared Disk coexistence

The Recoverable Virtual Shared Disk component of PSSP can coexist and interoperate in a mixed system partition with any combination of the following groupings:

  1. |Recoverable Virtual Shared Disk component of PSSP 3.4
  2. |Recoverable Virtual Shared Disk component of PSSP 3.2
  3. |Recoverable Virtual Shared Disk component of PSSP 3.1

IBM Recoverable Virtual Shared Disk migration

The following migration criteria apply:

IBM Recoverable Virtual Shared Disk levels supported

Table 50 shows the supported levels. The rvsdrestrict command allows you to choose the specific level that Recoverable Virtual Shared Disk is to run without your having to reinstall that level.
|

|Table 50. Supported Recoverable Virtual Shared Disk levels

RVSD AIX 4.2.1 AIX 4.3.3 AIX 5L 5.1 PSSP 2.4 PSSP 3.1 PSSP 3.2 PSSP 3.4
3.4
S S


S
3.2
S S

S S
3.1
S S
S S S
2.1.1 S S
S


Note:
RVSD 2.1.1 with AIX 4.2.1 and PSSP 2.4 is supported for migration only.

General Parallel File System (GPFS)

|A file system managed by the GPFS licensed program can only be |accessed from within the GPFS nodeset to which it belongs. GPFS |requires that all nodes in a GPFS nodeset run the same level of GPFS. |Migrating one node at a time is not supported.

|GPFS is supported in systems with the SP Switch or the SP |Switch2. GPFS might not support all the new adapters for connecting nodes to the SP Switch2.

Enhanced security options restrictions:

GPFS is not supported with restricted root access enabled. See Considering restricted root access, Considering a secure remote command process, and Considering choosing none for AIX remote command authorization for option descriptions and a complete list of limitations.

GPFS coexistence in an SP environment

GPFS will coexist in a mixed system partition under certain circumstances. In order to run multiple levels of GPFS within a partition, you must create multiple GPFS nodesets as described in the GPFS documentation. All nodes within a nodeset must run the same level of GPFS. Nodes do not share access to file systems across nodesets.

GPFS 1.5 will not coexist or interoperate with earlier releases of GPFS in the same nodeset. All applications that execute on earlier releases of GPFS will execute on GPFS 1.5. All file systems created with earlier releases of GPFS can be used with GPFS 1.5 and can be upgraded to use new GPFS 1.5 file system functions.

GPFS coexistence in an HACMP cluster environment

GPFS file systems and nodesets must follow these coexistence guidelines:

Due to common components shared by GPFS, IBM Multi-Media Server, and IBM Video Charger, the kernel extensions for GPFS cannot coexist with these licensed programs on the same system.

|File systems created for an SP environment cannot be used in an |HACMP environment. See Planning for General Parallel File System (GPFS) for an explanation of these environments.

Migration statement

It is required that all nodes within a GPFS nodeset, that use a given file system, are not only at the same level of GPFS, but are also upgraded to that level at the same time. Nodes cannot be migrated one node at a time to GPFS 1.5, however the migration can be completed within a 4 hour time frame. All nodes must be rebooted to pick up kernel extensions.

If a system is going to have some GPFS node sets with GPFS 1.5 installed and others with lower levels installed, the CWS must have GPFS 1.5 installed.

New file system functions existing in GPFS 1.5 are not usable until you explicitly authorize these changes by issuing the mmchfs -V command.

If all of the nodesets are going to be running GPFS 1.5, to change the file system format to the latest format supported, thereby reducing the number of configuration files stored in the GPFS cluster data, you will need to run the mmchconfig release=LATEST command.

In order to use the 64-bit versions of the GPFS programming interfaces, you must recompile your code using the appropriate 64-bit options for your compiler.

GPFS 1.5 works on PSSP 3.4. GPFS 1.3 and 1.4 work on PSSP 3.2 and PSSP 3.4. This allows for node by node migration from PSSP 3.2 to PSSP 3.4 without having to change the level of GPFS that is running on the node. GPFS 1.5 is dependent on the RVSD component of PSSP 3.4. Because GPFS 1.3 and 1.4 are only supported on AIX 4.3.3, a migration to GPFS 1.5 must happen before a migration to AIX 5L 5.1.

Table 51 shows the supported levels. See Table 47 for a possible migration staging scenario.
|

|Table 51. Supported GPFS levels

GPFS AIX 4.3.3 AIX 5L 5.1 PSSP 3.1 PSSP 3.2 PSSP 3.4
1.5 S S

S
1.4 S

S S
1.3 S
S S S

Parallel Environment

Parallel applications, like IBM Parallel Environment (PE), are not supported in a mixed system partition. This applies to their use for either IP or user space communication. All the nodes involved in a parallel job must be running the same level of Parallel Environment.

Parallel Environment is comprised of:

Note:
|IBM Parallel Environment for the SP requires that all the nodes |involved in a parallel job be running the same level of PE. See LoadLeveler for associations between Parallel Environment and |LoadLeveler. |

|Coexistence in Parallel Environment

|PE 3.2 runs on PSSP 3.4 with AIX 5L 5.1. A |single job uses a homogeneous environment.

|There are some limitations relevant to hardware: |

|Migration in Parallel Environment

|Parallel Environment does not support node by node migration. |Limitations are as follows: |

|Table 52 shows the supported levels. See Table 47 for a possible migration staging |scenario.
|

|Table 52. Supported Parallel Environment levels

PE AIX 4.3.3 AIX 5L 5.1 PSSP 3.1 PSSP 3.2 PSSP 3.4
3.2
S

S
3.1 S

S S
2.4 S
S

Parallel Tools

Parallel Tools include:

These tools are shipped with Parallel Environment which requires that all the nodes involved in a parallel job be running the same level of Parallel Environment. These have the same coexistence limitations as stated for Parallel Environment with the exception of Xprofiler.

Xprofiler was introduced in Parallel Environment 2.3 and has no dependency on PSSP. It does not interoperate with other instances of Xprofiler but it does not interfere with coexistence or migration in PSSP. |Every level of Xprofiler works with every level of PSSP. Use |Xprofiler 1.0 with AIX 4.3, Xprofiler 1.1 with AIX |4.3.2 (or later), and Xprofiler 1.2 with AIX 5L |5.1.

Parallel ESSL

Parallel ESSL is not supported in a mixed system partition. Which level of Parallel ESSL runs on a particular level of PSSP and AIX is based on which level of PE runs on that particular level of PSSP and AIX, as follows:

Parallel ESSL coexistence and migration is the same as for Parallel Environment because it is dependent on it. It also requires the ESSL licensed program.

|Communications Low-Level Application Programming Interface

|The Communications Low-Level Application Programming Interface (LAPI) |provides a more primitive interface to the SP Switch than either MPI or |IP. Parallel Environment is required for using LAPI. LAPI |supports the SP Switch2 and the SP Switch. The 64-bit support for LAPI |is available only on AIX 5L 5.1 or later.

|Coexistence

|PSSP 3.4 LAPI can run in a mixed environment of nodes with PSSP |3.2 or later.

|Pre-PSSP 3.2 LAPI will not interfere with being able to run any |combination of PSSP 2.4 and 3.1.1 but might be subject to |coexistence limitations of the PSSP components and related licensed programs |with which the application using LAPI interacts. Pre-PSSP 3.2 |LAPI can interoperate between PSSP 2.4 and 3.1 nodes.

|You need to recompile all programs that use PSSP 3.2 LAPI or |later.

|Migration

|You can migrate PSSP from any supported release of PSSP to PSSP 3.4 |one node at a time. Applications that use LAPI cannot migrate one node |at a time. LAPI users need to zero any unused fields in the |lapi_info_t structure and recompile the programs to use PSSP |3.2 LAPI or later. Migrate such applications on all nodes in the |same service window.

|Since PSSP 3.2 LAPI or later cannot coexist with PSSP |3.1.1, in order to migrate one node at a time, you can define a |separate LoadLeveler pool for this set of nodes.

LoadLeveler

The LoadLeveler (LL) licensed program supports scheduling and load balancing of parallel jobs on the SP system.

|LoadLeveler 3.1 supports coexistence and node by node |migration.

|Coexistence Statement

|AIX 5L 5.1 is a prerequisite to installing LoadLeveler |3.1. LoadLeveler 3.1 will not coexist with AIX |4.3.3 on a node.

|LoadLeveler 3.1 can coexist with LoadLeveler 2.2 within a |cluster with the following restrictions: |

|LoadLeveler 3.1 can coexist with PSSP 3.2 and 3.4 with |the following restrictions: |

|GPFS, LoadLeveler, and Parallel Environment with AIX 4.3.3 |might not support |

Migration Statement

|LoadLeveler 3.1 supports node by node migration from |LoadLeveler 2.2. LoadLeveler provides other mechanisms for |migration to LoadLeveler 2.2, including the use of separate LoadLeveler |clusters.

|To migrate from LoadLeveler version 2.2 to version |3.1, first migrate the Central Manager node. After the Central |Manager is running the new level, all other nodes can be migrated on a node by |node basis without taking the entire cluster down. Do not use the |llacctmrg command while there are mixed levels on the |system.

Table 53 shows the supported levels.
|

|Table 53. Supported LoadLever levels

LL AIX 4.3.3 AIX 5L 5.1 PSSP 3.1 PSSP 3.2 PSSP 3.4
3.1
S

S
2.2 S

S S
Note:
Running LL 2.2 on AIX 4.3.3 requires that LL be at the PTF level that is noted in the PSSP 3.4 Read This First document. For the most up-to-date LoadLeveler migration instructions, see the README file distributed with LoadLeveler 3.1.

Coexistence within LoadLeveler

LoadLeveler has added the versioning framework to allow the coexistence of different levels within clusters. LoadLeveler 3.1 is compatible with LoadLeveler 2.2 (with PTF).

Coexistence of LoadLeveler and Parallel Environment with PSSP

When LoadLeveler and Parallel Environment exist on the same node, they must be at one of the following combinations:

Job Switch Resource Table services

|The JSRT services are available in PSSP 3.1 or later. The package consists of three categories of externally available APIs:

  1. Local functions to be run on the node: swtbl_load_table, swtbl_unload_table, swtbl_status_node, swtbl_clean_table, and swtbl_query_adapter.
  2. Remote functions to be run on a remote node: swtbl_load_job, swtbl_unload_job, and swtbl_status.
  3. Commands: st_status, and st_clean_table.

|The commands and functions of the Job Switch Resource Table services |support coexistence of PSSP 3.1.1 and PSSP 3.2 with one |exception. The remote function sw_load_job was updated for PSSP |3.2 and cannot coexist in a mixed system partition with PSSP |3.1. All nodes with applications that use the sw_load_job |function, must be migrated to PSSP 3.2 or later in one service |window.

IP performance tuning

This section presents some high-level considerations related to performance of TCP/IP over the switch in a coexistence environment. Note that these are simply important factors to be considered in approaching tuning, and that the SP organization has not conducted significant performance evaluation studies in this area.

In general, with all else being equal, the goal for performance achieved between nodes running different levels of PSSP should be the performance delivered by the earlier level of PSSP (each release of PSSP has included performance improvements). Traditional tuning considerations, such as those derived from the performance characteristics of different SP node types and installation/application communication patterns will still apply. For example, the switch throughput is limited to the speed of the slowest node in an IP connection. With coexistence, tuning activities might also need to reflect the levels of PSSP on the particular nodes running (communicating) in a mixed system partition.

There are two main areas where this might come into play:

  1. Tuning for AIX - tuning methodologies typically employed for different releases.
  2. Tuning for the switch - appropriate settings for the adapter device driver buffer pools.

In tuning for the switch, the values used for the switch adapter or device driver IP buffer pools are the primary considerations. The rpoolsize and spoolsize parameters are changed using the chgcss command. The aggregate pool size is a function of the size of kernel memory.

|In summary, the recommended approach for factoring coexistence into |your overall SP tuning strategy is to start with the above general approach to |tuning for mixed levels of AIX and PSSP. Consider the other |characteristics that influence performance for your specific configuration, |making trade-offs if necessary. Then, as with any performance tuning |strategy, make refinements based on your results or as your SP migration |strategy progresses.

Note:
See performance tuning information on the web at http://www.rs6000.ibm.com/support/sp

Root volume group mirroring

|If you already have root volume group mirroring on SP nodes or on |servers that you now want to attach to the SP system, enter the mirroring |information into the SDR before either migrating a node to PSSP |3.4 or attaching a server. Failure to enter existing root volume |group mirroring information will result in the root volume group being |unmirrored during migration to PSSP 3.4 or attaching a |server to a PSSP 3.4 system.

Boot-install servers and other resources

Your boot-install server must be at the highest level of AIX and PSSP that it is to serve.

Your migration plan must also consider additional resources; for example, additional DASD to support multiple levels of software, particularly if you plan to use coexistence. In that case, plan on allocating 2 GB of disk for each level of AIX and PSSP being served by your control workstation or boot-install server. The space is used for additional directories, specifically:

|Changes in recent levels of PSSP

|Here are some recent changes in PSSP, PSSP-related licensed programs, or |AIX support that might affect your migration plans.

|Support for applications with 64-bit addressing

|PSSP 3.4 supports applications that use 64-bit addressing. |User applications might require recompile with the -q64 flag, but |that is application dependent.

|All nodes in a job must run the same level of PE and PSSP, there is no mix |and match. Applications that use LAPI must be at the same level of PSSP |throughout. The 64-bit and 32-bit libraries coexist. Tasks |running 32-bit and 64-bit may coexist on the same node. An MPI job must |consist of all 32-bit or all 64-bit tasks. LAPI jobs can consist of a |mixture of 32-bit and 64-bit tasks. Applications that use 64-bit |addressing must use AIX 5L 5.1 and PSSP 3.4.

|Support for checkpoint-restart

|The new checkpoint-restart function can run only on a system running the |latest versions of LoadLeveler, Parallel Environment, and PSSP with AIX 5L |5.1. The older checkpoint function will not work on any upgraded |node.

|The RS/6000 Cluster Technology components

| | |

|RS/6000 Cluster Technology (RSCT) is the package of high availability |support programs that began as components of PSSP: |

|They have been separated from PSSP as a package to be made available for |broader use: |

|Coexistence Statement: The RSCT subsystem can coexist in a |mixed system partition with any combination of supported levels of |PSSP. Both PSSP and HACMP require the RSCT components.

|Migration Statement: RSCT does not interfere with the PSSP |being able to migrate one node at a time from PSSP 2.4, 3.1, or |3.2 to PSSP 3.4.

|When PSSP 3.4 is installed on the control workstation, SP nodes, and |servers, the RSCT components interoperate with any existing PSSP high |availability components on nodes containing prior supported levels of the PSSP |licensed program. In other words, RSCT supports coexistence in a mixed |system partition running any combination of the supported levels of |PSSP. RSCT also supports node by node migration from the supported |levels of PSSP to PSSP 3.4. However, you need to be aware of |several considerations.

|Event Management
| |

|When you monitor hardware or virtual shared disks via the problem |management component or the SP Perspectives graphical user interface, they are |using the Event Management services. PSSP 3.4 supports new |hardware that was not supported in prior releases. In order to monitor |this new hardware, after you install PSSP 3.4 on the control |workstation you must perform the procedure Activating the Configuration |Data in the SDR. The procedure is documented in the Event |Management subsystem chapter of the book PSSP: Administration |Guide. The new function is not enabled until all nodes within the |SP system partition are running PSSP 3.4.

|Group Services
| |

|Programmers writing to the Group Services API and also systems |administrators with systems using the Group Services API need to be aware that |all nodes must be at PSSP 3.2 or later in order to use the Group |Services API functions that were introduced in PSSP 3.2. If a |system partition contains nodes with an earlier release of PSSP, the level of |Group Services functionality supported in that system partition is only that |of the earliest PSSP release. For example, if the system partition |contains some nodes at PSSP 2.4 and others at PSSP 3.4, then the |Group Services function is that of PSSP 2.4 until all nodes get |migrated to PSSP 3.4.

|Performance Toolbox, Agent Component (perfagent)
| |

|The functions needed by the Event Management component of PSSP comes with |AIX 5L 5.1 or AIX 4.3.3 in file set |perfagent.tools, which you must be sure to install. |The correct level of perfagent needs to be installed on the control |workstation and copied to the lppsource directory so it can be installed on |the nodes.

|Perfagent must be installed to obtain events from the AIX resource |monitor. The level of perfagent required is dependent upon the level of |AIX. See Table 54.

|To verify whether the correct level of perfagent is installed, issue the |command:

|lslpp -l perfagent.server

|

|Table 54. PAIDE levels


AIX 4.2.1 AIX 4.3.3 AIX 5L 5.1 PSSP 2.4 PSSP 3.1 PSSP 3.2 PSSP 3.4
perfagent.tools 5.1.x

S


S
perfagent.server 3.0.x
S S S S S S
perfagent 2.2.33.x
S S
S S S
perfagent 2.2.1.2 S S
S S S

|Performance Toolbox Parallel Extensions

| |

|The Performance Toolbox Parallel Extensions for AIX (PTPE) software, a |separate licensed program before PSSP 3.1, an optional component of |PSSP 3.1 or later, has been withdrawn from service and is not in PSSP |3.4. This does not prevent the PTPE software from running on |earlier PSSP nodes in mixed SP configurations.

|SP TaskGuides

| |

|The SP TaskGuides component of PSSP 3.1 and PSSP 3.2 has been |withdrawn from service and is not in PSSP 3.4.

|AIX support

|

|AIX 5L 5.1 does not preserve binary compatibility for 64-bit |applications because it introduces a new 64-bit ABI. Therefore PSSP |3.2 and earlier releases are not supported on AIX 5L 5.1 or |later. Because of this binary incompatibility, if you want to migrate |to AIX 5L 5.1, you must migrate your control workstation first to PSSP |3.4 and AIX 4.3.3. Then you can migrate to AIX 5L |5.1. The nodes can be migrated from any supported base release |to PSSP 3.4 on AIX 4.3.3 or to PSSP 3.4 on AIX 5L |5.1 with a single nodecond operation.

|A direct migration from PSSP 2.4 on AIX 4.2.1, |which is no longer supported, to PSSP 3.4 on AIX 5L 5.1 is not |available. If you want to do that, first migrate from PSSP 2.4 |on AIX 4.2.1 to PSSP 3.4 on AIX 4.3.3 and |subsequently migrate to AIX 5L 5.1 with a second nodecond |operation.

|Information about AIX can be found in the relevant edition of the book |Differences Guide. See the Bibliographyfor how to find AIX information on the Web.

|TCP/IP Internet Protocol Version 6 (IPv6) extends the maximum number of IP |addresses from 32 bit addressing to 128 bit addressing. IPv6 is |compatible with the current base of IPv4 host and routers. IPv6 and |IPv4 hosts and routers can tunnel IPv6 datagrams over regions of IPv4 |routing topology by encapsulating them within IPv4 packets. IPv6 is an |evolutionary change from IPv4 and allows a mixture of the new and the old to |coexist on the same network. | |

|Restriction:
IPv6 is not supported for use by the PSSP components. It cannot be |used with SP adapters and is incompatible with the RSCT components. If |you are using an SP-attached server or clustered enterprise server, be sure |that the server does not use IPv6. Some PSSP components tolerate IPv6 |aliases for IPv4 network addresses but not with DCE, HACMP, HACWS, or an SP |switch. For more information about the SP system tolerating IPv6 |aliases for IPv4 network addresses, see the appendix on the subject in the |book PSSP: Administration Guide. |

|Resource Manager

|

|Resource Manager functionality was merged into the LoadLeveler 2.1 |licensed program for PSSP 3.1. Support for managing PSSP |2.4 nodes from the control workstation is still available. If |you will have no nodes running PSSP 2.4, do not install the |ssp.jm file set.

|The resource manager daemons are not functional on nodes with PSSP |3.1 or later. The ssp.jm file set has the |commands and library necessary to support from the control workstation any |system partitions using the Resource Manager in earlier versions of |PSSP. Any existing resource manager clients can convert to use the Job |Switch Resource Table APIs or the LoadLeveler job management APIs.

|Migration installs on nodes will result in the Resource Manager daemon |being removed from nodes with PSSP 3.1 or later. When you |migrate to PSSP 3.4 and some nodes have PSSP 2.4, install the |ssp.jm file set only on the control workstation.

|A PSSP 3.4 control workstation can support system partitions with |earlier levels of PSSP running Resource Manager. A pre-PSSP 3.1 |Resource Manager can coexist in a mixed system partition with nodes running |any combination of PSSP 2.2, 2.3, or 2.4. It |cannot coexist in any system partition with PSSP 3.1 or later and |LoadLeveler 2.1 or later. The pre-PSSP 3.1 Resource |Manager must be stopped before LoadLeveler 2.1 or later is |started.

|The Resource Manager daemon must run on PSSP 2.4 nodes when running |in a mixed environment partition. The Resource Manager will not allow |nodes PSSP 3.1 or later to be configured in a parallel pool. See |the LoadLeveler and POE coexistence statements for further details.

|The Resource Manager supports migration of the control workstation from |PSSP 2.2, 2.3, 2.4, 3.1, or 3.2 to PSSP |3.4. Existing Resource Manager configuration files are preserved |during migration of the control workstation. Any node being migrated to |PSSP 3.4 must first be removed from a pre-PSSP 3.1 Resource |Manager configuration file in the system partition. If the node being |migrated to PSSP 3.4 is running the Resource Manager, then another node |that is not to be migrated must be identified in the configuration file as the |new Resource Manager server node.

|When you no longer have any nodes with earlier releases than PSSP |3.1, uninstall the ssp.jm file set. It will not |be automatically manipulated by the PSSP installation and configuration |procedures.

|If you have been using the Resource Manager for interactive parallel jobs |and have not been using LoadLeveler, you will now need to use LoadLeveler or |some comparable tool.

|If you have been using the Resource Manager to manipulate the switch table |you now need to use the Job Switch Resource Table services.

HACWS migration strategy

The High Availability Control Workstation (HACWS) optional component of PSSP only runs on the SP control workstations. An HACWS configuration at the PSSP 3.4 level requires the following software on both control workstations:

Whether or not you need to upgrade all three of these at the same time depends on your software levels before migration. You can choose to upgrade your HACWS configuration a little at a time, stopping along the way to run your system long enough to become confident that it is stable before proceeding to the next phase.

For more information see the book PSSP: Installation and Migration Guide.

DCE and HACWS restriction:

If you plan to have DCE authentication enabled, you cannot use HACWS. If you already use HACWS, do not enable DCE authentication.

Enhanced security options restrictions:

Consider the following:

  • HACWS has no problems in the authentication of both control workstations on the nodes. However, you have to copy the authorization files, /.rhosts or /.klogin, to the backup control workstation.
  • It is important that you activate restricted root access from the active primary control workstation, since it is the Kerberos V4 Master.
  • If you use HACWS you might also use HACMP which has more restrictions.

See Considering restricted root access, Considering a secure remote command process, and Considering choosing none for AIX remote command authorization for option descriptions and a complete list of limitations.

AIX and PSSP migration options

There are three main ways to migrate your system each with their own advantages:

  1. Migration install - preserves base configuration
  2. Overwrite install - provides a clean start
  3. Migration then re-install - migrate one node then use this image to re-install remaining nodes

|After performing any needed system preparation steps, the next step |in migrating your SP system is to migrate the control workstation to the |appropriate level of AIX and PSSP. That is, the control workstation |must be migrated to PSSP 3.4 and AIX 4.3.3 or AIX 5L |5.1 before migrating any of the nodes. The control workstation |must be at the highest level of PSSP and AIX to be used on any of the |nodes.

|For example, to migrate to PSSP 3.4 and AIX 5L 5.1 |from PSSP 2.4 and AIX 4.3.3, first migrate only |PSSP. Then validate your SP system (still at the AIX |4.3.3 level). Then migrate to AIX 5L 5.1 after you |are satisfied and scheduling permits.

For systems running unsupported levels of PSSP, or if an overwrite install is desired, the control workstation migration must include both AIX and PSSP upgrades before the SP system can be returned to production.

Note:
Both upgrades must be done in the same service window.

After the control workstation has been migrated to |AIX 4.3.3 and PSSP 3.4, and the system has been validated, the nodes can be migrated. Start the node migration with boot-install servers if applicable. The same basic migration options exist for migrating the nodes, such as:

Also, you can optionally migrate one node, then use the mksysb from that node to install the remaining nodes to be migrated.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]