IBM Books

Command and Technical Reference, Volume 1

haemunlkrm

Purpose

haemunlkrm - Unlocks and starts a Resource Monitor.

Syntax

haemunlkrm -s subsys_name -a resmon_name

Flags

-s subsys_name
Specifies the name of the Event Management subsystem. On a node of a system partition, this is haem. On the control workstation, this is haem.syspar_name, where syspar_name is the name of the system partition for which you want to specify the subsystem. This argument must be specified.

-a resmon_name
Specifies the name of the Resource Monitor to unlock and start.

Operands

None.

Description

If the Event Management daemon cannot successfully start a resource monitor after three attempts within a two hour interval, or if the daemon has successfully connected to the instances of a resource monitor N times within a two hour interval, the resource monitor is "locked" and no further attempts are made to start it or to connect to any of its instances. N is three times the maximum number of instances a resource monitor can have, as specified by the rmNum_instances attribute in the EM_Resource_Monitor SDR class. Once the cause of the problem is determined and the problem corrected, the haemunlkrm command can be used to unlock the Resource Monitor and attempt to start it or connect to the resource monitor instances.

The status of the Event Manager daemon, as displayed by the lssrc command, indicates if a Resource Monitor is locked.

Security

You must have root privilege to run this command.

Implementation Specifics

This command is part of RS/6000 Cluster Technology (RSCT), which is included with the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

"The Event Management subsystem" chapter of PSSP: Administration Guide

Location

/usr/sbin/rsct/bin/haemunlkrm

Examples

If the output of the lssrc command indicates that the hardware Resource Monitor IBM.PSSP.hmrmd is locked, then after correcting the condition that prevented the Resource Monitor from being started, enter:

haemunlkrm -s haem -a IBM.PSSP.hmrmd
Note:
This example applies to unlocking a Resource Monitor on a node.

hagsctrl

Purpose

hagsctrl - A control script that starts the Group Services subsystems.

Syntax

hagsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}

Flags

-a
Adds the subsystems.

-s
Starts the subsystems.

-k
Stops the subsystems.

-d
Deletes the subsystems.

-c
Cleans the subsystems, that is, delete them from all system partitions.

-u
Unconfigures the subsystems from all system partitions.

-t
Turns tracing on for the subsystems.

-o
Turns tracing off for the subsystems.

-r
Refreshes the subsystem.

-h
Displays usage information.

Operands

None.

Description

Group Services provides distributed coordination and synchronization services for other distributed subsystems running on a set of nodes on the IBM RS/6000 SP. The hagsctrl control script controls the operation of the subsystems that are required for Group Services. These subsystems are under the control of the System Resource Controller (SRC) and belong to a subsystem group called hags. Associated with each subsystem is a daemon.

An instance of the Group Services subsystem executes on the control workstation and on every node of a system partition. Because Group Services provides its services within the scope of a system partition, its subsystems are said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

From an operational point of view, the Group Services subsystem group is organized as follows:

Subsystem
Group Services

Subsystem Group
hags

SRC Subsystems
hags and hagsglsm

The hags subsystem is associated with the hagsd daemon. The hagsglsm subsystem is associated with the hagsglsmd daemon.

The subsystem names on the nodes are hags and hagsglsm. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test , the subsystems on the control workstation are named hags.sp_prod, hags.sp_test, hagsglsm.sp_prod, and hagsglsm.sp_test.

Daemons
hagsd and hagsglsmd

The hagsd daemon provides the majority of the Group Services functions.

The hagsglsmd daemon provides global synchronization services for the switch adapter membership group.

The hagsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

The hagsctrl script provides a variety of controls for operating the Group Services subsystems:

Before performing any of these functions, the script obtains the current system partition name (using the spget_syspar command) and the node number (using the node_number ) command. If the node number is zero, the control script is running on the control workstation.

Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.

Adding the Subsystem

When the -a flag is specified, the control script uses the mkssys command to add the Group Services subsystems to the SRC. The control script operates as follows:

  1. It makes sure that both the hags and hagsglsm subsystems are stopped.
  2. It gets the port number for the hags subsystem for this system partition from the Syspar_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The range of valid port numbers is 10000 to 10100, inclusive.

    The service name that is entered in the /etc/services file is hags.syspar_name.

  3. It removes the hags and hagsglsm subsystems from the SRC (just in case they are still there).
  4. It adds the hags and hagsglsm subsystems to the SRC. The system partition name is configured as a daemon parameter on the mkssys command.
  5. It adds an entry for the hags group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if hagsctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

Starting the Subsystem

When the -s flag is specified, the control script uses the startsrc command to start the Group Services subsystems, hags and hagsglsm .

Stopping the Subsystem

When the -k flag is specified, the control script uses the stopsrc command to stop the Group Services subsystems, hags and hagsglsm .

Deleting the Subsystem

When the -d flag is specified, the control script uses the rmssys command to remove the Group Services subsystems from the SRC. The control script operates as follows:

  1. It makes sure that both the hags and hagsglsm subsystems are stopped.
  2. It removes the hags and hagsglsm subsystems from the SRC using the rmssys command.
  3. It removes the port number from the /etc/services file.
  4. If there are no other subsystems remaining in the hags group, it removes the entry for the hags group from the /etc/inittab file.

Cleaning Up the Subsystems

When the -c flag is specified, the control script stops and removes the Group Services subsystems for all system partitions from the SRC. The control script operates as follows:

  1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g hags command.
  2. It removes the entry for the hags group from the /etc/inittab file.
  3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.

Unconfiguring the Subsystems

When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Group Services subsystems.

Note:
The -u flag is effective only on the control workstation.

Prior to executing the hagsctrl command with the -u flag on the control workstation, the hagsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.

Turning Tracing On

When the -t flag is specified, the control script turns tracing on for the hagsd daemon, using the traceson command. Tracing is not available for the hagsglsmd daemon.

Turning Tracing Off

When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hagsd daemon, using the tracesoff command. Tracing is not available for the hagsglsmd daemon.

Refreshing the Subsystem

The -r flag has no effect for this subsystem.

Logging

While they are running, the Group Services daemons provide information about their operation and errors by writing entries in a log file in the /var/ha/log directory.

Each daemon limits the log size to a pre-established number of lines (by default, 5,000 lines). When the limit is reached, the daemon appends the string .bak to the name of the current log file and begins a new log. If a .bak version already exists, it is removed before the current log is renamed.

Files

/var/ha/log/hags_nodenum_instnum. syspar_name
Contains the log of the hagsd daemons on the nodes.

/var/ha/log/hags.syspar_name_nodenum_instnum.syspar_name
Contains the log of each hagsd daemon on the control workstation.

/var/ha/log/hagsglsm_nodenum_instnum.syspar_name
Contains the log of the hagsglsmd daemons on the nodes.

/var/ha/log/hagsglsm.syspar_name_nodenum_instnum.syspar_name
Contains the log of each hagsglsmd daemon on the control workstation.

The file names include the following variables:

Standard Error

This command writes error messages (as necessary) to standard error.

Exit Values

0
Indicates the successful completion of the command.

1
Indicates that an error occurred.

Security

You must have root privilege to run this command.

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

"The Group Services subsystem" chapter of PSSP: Administration Guide

PSSP: Group Services Programming Guide and Reference

AIX Commands Reference

Information about the System Resource Controller (SRC) in AIX General Programming Concepts: Writing and Debugging Programs

Location

/usr/sbin/rsct/bin/hagsctrl

Related Information

Commands: hagsd, hagsglsmd , lssrc, startsrc, stopsrc , syspar_ctrl

Examples

  1. To add the Group Services subsystems to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -a
    
  2. To start the Group Services subsystems in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -s
    
  3. To stop the Group Services subsystems in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -k
    
  4. To delete the Group Services subsystems from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -d
    
  5. To clean up the Group Services subsystems on all system partitions, enter:
    hagsctrl -c
    
  6. To unconfigure the Group Services subsystem from all system partitions, on the control workstation, enter:
    hagsctrl -u
    
  7. To turn tracing on for the Group Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -t
    
  8. To turn tracing off for the Group Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hagsctrl -o
    
  9. To display the status of all of the subsystems in the Group Services SRC group, enter:
    lssrc -g hags
    
  10. To display the status of an individual Group Services subsystem, enter:
    lssrc -s subsystem_name
    
  11. To display detailed status about an individual Group Services subsystem, enter:
    lssrc -l -s subsystem_name
    

    In response, the system returns information that includes the running status of the subsystem, the number and identity of connected GS clients, information about the Group Services domain, and the number of providers and subscribers in established groups.

  12. To display the status of all of the daemons under SRC control, enter:
    lssrc -a
    

hagsd daemon

Purpose

hagsd - A Group Services daemon that provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes.

Syntax

hagsd daemon_name

Flags

None.

Operands

daemon_name
Specifies the name used by the daemon to name log files and identify its messages in the error log.

Description

The hagsd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes. This daemon provides most of the services of the subsystem.

One instance of the hagsd daemon executes on the control workstation for each system partition. An instance of the hagsd daemon also executes on every node of a system partition. The hagsd daemon is under System Resource Controller (SRC) control.

Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.

For more information about the Group Services daemons, see the hagsctrl man page.

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

"The Group Services subsystem" chapter of PSSP: Administration Guide

PSSP: Group Services Programming Guide and Reference

AIX Commands Reference

Information about the System Resource Controller (SRC) in AIX General Programming Concepts: Writing and Debugging Programs

Location

/usr/sbin/rsct/bin/hagsd

Related Information

Commands: hagsctrl, hagsglsmd

Examples

See the hagsctrl command.

hagsglsmd daemon

Purpose

hagsglsmd - A Group Services daemon that provides global synchronization services for the switch adapter membership group.

Syntax

hagsglsmd daemon_name

Flags

None.

Operands

daemon_name
Specifies the name used by the daemon to name log files and identify its messages in the error log.

Description

The hagsglsmd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes.

One instance of the hagsglsmd daemon executes on the control workstation for each system partition. An instance of the hagsglsmd daemon also executes on every node of a system partition. The hagsglsmd daemon is under System Resource Controller (SRC) control.

Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.

For more information about the Group Services daemons, see the hagsctrl man page.

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

"The Group Services subsystem" chapter of PSSP: Group Services Programming Guide and Reference

PSSP: Group Services Programming Guide and Reference

AIX Commands Reference

Information about the System Resource Controller (SRC) in AIX General Programming Concepts: Writing and Debugging Programs

Location

/usr/sbin/rsct/bin/hagsglsmd

Related Information

Commands: hagsctrl, hagsd

Examples

See the hagsctrl command.

|hagsns

|Purpose | |

|hagsns - Gets Group Services nameserver information.

|Syntax

|hagsns [-h host] |[-c] -g group_name

|hagsns [-h host] |[-c] -s subsystem_name

|hagsns [-h host] |[-c] -p subsystem_pid

|Flags

|

|-c
|Forces the output as "English_only." If the -c flag |is not specified, the daemon's locale will be used for the output.

|-g group_name
|Specifies a group of subsystems to get status for. The command is |unsuccessful if the group_name variable is not contained in the |subsystem object class.

|-h host
|Specifies the host to obtain nameserver status for.

|-p subsystem_pid
|Specifies a particular instance of the subsystem_pid to obtain |nameserver status for.

|-s subsystem_name
|Specifies a subsystem to get status for. The |subsystem_name variable can be the actual subsystem name or the |synonym name for the subsystem. The command is unsuccessful if the |subsystem_name variable is not contained in the subsystem object |class. |

|Operands

|None.

|Description

|Use the hagsns command to query the status of the Group Services |nameserver.

|Standard Error

|This command writes error messages (as necessary) to standard error.

|Exit Values

|

|0
|Indicates the successful completion of the command.

|nonzero
|Indicates that an error occurred. |

|Security

|You must have root privilege to run this command.

|Location

|/usr/sbin/rsct/bin/hagsns

|Related Information

|PSSP commands: hagsd, hagsvote, nlssrc

|AIX commands: lssrc

|Refer to the "System Resource Controller Overview" in AIX System |User's Guide: Operating System and Devices for an explanation |of subsystems, subservers, and the System Resource Controller.

|Refer to PSSP: Diagnosis Guide for diagnosis |information.

|Examples

|To get domain information from the HAGS subsystem, enter:

|hagsns -c -s hags

|or

|hagsns -s hags

|You should receive output similar to the following:

|HA GS NameServer Status
|NodeID=1.16, pid=14460, domainID=6.14, NS established,
|CodeLevel=PSSP-3.2(DRL=5)
|NS state=kCertain, protocolInProgress=kNoProtocol,
|  outstandingBroadcast=KNoBcast
|Process started on Jun 19 18:34:20, (10d 20:19:22) ago.
|  HB connection took (19:14:9).
|Initial NS certainty on Jun 20 13:48:45, (10d 1:4:57) ago,
|  taking (0:0:15).
|Our current epoch of Jun 23 13:05:19 started on (7d 1:48:23), ago.
|Number of UP nodes: 12
|List of UP nodes: 0 1 5 6 7 8 9 11 17 19 23 26

|In the preceding example, domainID=6.14 means that node 6 is the |nameserver (NS) node. Note that the domainID consists of a node number |and an incarnation number. The incarnation number is an integer, |incremented whenever the GS daemon is started. NS established means |that the nameserver was established.

|hagsvote

|Purpose

|hagsvote - Gets vote information for Group Services |groups. | |

|Syntax

|hagsvote [-h host] |[-l] [-a argument] |[-c] -g group_name

|hagsvote [-h host] |[-l] [-a argument] |[-c] -s subsystem_name

|hagsvote [-h host] |[-l] [-a argument] |[-c] -p subsystem_pid

|Flags

|

|-a
|Specifies a group name. Note this group name is different from that |of the -g flag. In this case, the group was created from |the client's first call to join the protocol.

|-c
|Requests the canonical output of the Group Services voting |information. The output is displayed in English regardless of the |installed language locale. If -c is not specified, the |daemon's locale will be used for the output.

|-g group_name
|Specifies a group of subsystems to get status for. The command is |unsuccessful if the group_name variable is not contained in the |subsystem object class.

|-h host
|Specifies the host name which is getting status.

|-l
|Requests detailed output in "long" form.

|-p subsystem_pid
|Specifies a particular instance of the subsystem_pid variable to |get the vote for.

|-s subsystem_name
|Specifies a subsystem to vote on. The subsystem_name |variable can be the actual subsystem name or the synonym name for the |subsystem. The command is unsuccessful if the subsystem_name |variable is not contained in the subsystem object class. |

|Operands

|None.

|Description

|Use the hagsvote command to query the status of voting protocols |for Group Services.

|Standard Error

|This command writes error messages (as necessary)_ to standard |error.

|Exit Values

|

|0
|Indicates the successful completion of the command.

|nonzero
|Indicates that an error occurred. |

|Security

|You must have root privilege to run this command.

|Location

|/usr/sbin/rsct/bin/hagsvote

|Related Information

|PSSP commands: hagsd, hagsns, nlssrc

|AIX commands: lssrc

|Refer to the "System Resource Controller Overview" in AIX System |User's Guide: Operating System and Devices for an explanation |of subsystems, subservers, and the System Resource Controller.

|Refer to PSSP: Diagnosis Guide for diagnosis |information.

|Examples

|

  1. |The command output for the following example provides information about |the status of the voting protocol for the group theSourceGroup in long |form.
    |hagsvote -ls hags -a theSourceGroup (locale-dependent)

    |You should receive output similar to the following:

    |Number of groups: 4
    |Group name [theSourceGroup] GL node [26] voting data:
    |GL in phase [1] of n-phase protocol of type [Join].
    |Local voting data:
    |Number of providers: 1
    |Number of providers not yet voted: 1 (vote not submitted).
    |Given vote: [No vote value] Default vote: [No vote value]
    |ProviderID Voted? Failed? Conditional?
    |[101/26] No No Yes
    |Global voting data:
    |Number providers not yet voted: 1
    |Given vote: [No vote value] Default vote: [No vote value]
    |Nodes that have voted: []
    |Nodes that have not voted: [26]

    |The first line of the output means that the total number of groups is |4. The second line provides the group name and the group leader node |(in this case 26). The remaining lines give the voting data: |

    |The output then shows the global voting status: |

  2. |In the following example, the meaning of each line of output is the same |as in the first example except that this node is the group leader node: |26.
    |hagsvote -ls hags -a theSourceGroup -c (canonical form)

    |You should receive output similar to the following:

    |Number of groups: 4
    |Group Name: theSourceGroup
    |GL Node: 26 (I am GL)
    |Current phase number of an n-phase protocol: 1
    |Protocol name: [Join]
    |Local voting data:
    |Number of local providers: 1
    |Number of local providers not yet voted: 1 (vote not submitted)
    |Given vote: [No vote value] Default vote: [No vote value]Global voting data:
    |Number of nodes in group: 1
    |Number of global providers not yet voted: 1
    |Given vote: [No vote value] Default vote: [No vote value]
    |Nodes that have voted: []
    |Nodes that have not voted: [26]
    |

hardmon daemon

Purpose

hardmon - Monitors and controls the state of the SP hardware.

Syntax

hardmon [-B] [-r poll_rate] [ -d debug_flag] ...

Flags

-B
Executes the daemon in diagnostic mode.

-r poll_rate
Specifies the rate, in seconds, at which the daemon polls each frame for state information.

-d debug_flag
Specifies the daemon debug flag to be set in the daemon. Refer to the hmadm command for possible values of debug_flag. Multiple -d debug flags can be specified.

Operands

None.

Description

hardmon is the Hardware Monitor daemon. The daemon monitors and controls the state of the SP hardware contained in one or more SP frames. This command is not normally executed from the command line. Access to the Hardware Monitor is provided by the hmmon, hmcmds, spmon, s1term, and nodecond commands. Control of the Hardware Monitor daemon is provided by the hmadm command. These commands are the Hardware Monitor "client" commands.

The Hardware Monitor daemon executes on the Monitor and Control Node (MACN). The MACN is that IBM RS/6000 workstation to which the RS-232 lines are connected to the frames. The MACN is one and the same as the control workstation. The daemon is managed by the System Resource Controller (SRC). When the MACN is booted, an entry in /etc/inittab invokes the startsrc command to start the daemon. The daemon is configured in the SRC to be restarted automatically if it terminates for any reason other than the stopsrc command. The SRC subsystem name for the Hardware Monitor daemon is hardmon.

hardmon obtains configuration information from the System Data Repository (SDR). The SP_ports object class specifies the port number that the daemon is to use to accept TCP/IP connections from the client commands. The port number is obtained from the object whose daemon attribute value matches hardmon and whose host_name attribute value matches the host name of the workstation on which the daemon is executing. There must be one hardmon object in SP_ports for the MACN. The Frame object class contains an object for each frame in the SP system.

The attributes of interest to the daemon are frame_number, tty, and MACN. When started, the daemon fetches all those objects in the Frame class whose MACN attribute value matches the host name of the workstation on which the daemon is executing. For each frame discovered in this manner, the daemon saves the frame number and opens the corresponding tty device. When all frames have been configured, the daemon begins to poll the frames for state information. Current state and changed state can then be obtained using the hmmon and spmon commands. The hmcmds and spmon commands can be used to control the hardware within the frames.

The daemon also reads the file /spdata/sys1/spmon/hmthresholds for values used to check boundary conditions for certain state variables. Refer to the /spdata/sys1/spmon/hmthresholds man page for more information. Finally, the /spdata/sys1/spmon/hmacls file is read for Access Control List (ACL) information. Refer to the hmadm command and the /spdata/sys1/spmon/hmacls file for more information on ACLs.

All errors detected by the Hardware Monitor daemon are written to the AIX error log.

The flags in the SRC subsystem object for the hardmon subsystem should not normally be changed. For example, if the poll rate is more than 5 seconds, the nodecond command can have unpredictable results. Upon request from IBM support for more information to aid in problem determination, debug flags can be set using the hmadm command.

If the High Availability Control Workstation (HACWS) Frame Supervisor (type 20) or the SEPBU HACWS Frame Supervisor (type 22) is installed in the SP frames, the -B flag is used to run the Hardware Monitor daemon in diagnostic mode. This diagnostic mode is used to validate that the frame ID written into the Supervisor matches the frame ID configured in the SDR for that frame. Normally, the frame ID is automatically written into the Supervisor during system installation. The frame ID is written into the frame to detect cabling problems in an HACWS configuration. In a non-HACWS SP configuration, the -B flag is useful whenever the RS232 cables between the frames and MACN are changed (but only if one or more frames contain a type 20 or type 22 supervisor). The hardmon command can be executed directly from the command line with the -B flag, but only after the currently running daemon is stopped using the stopsrc command. Diagnostic messages are written to the AIX error log. The daemon exits when all frames are validated.

Frame ID validation is also performed every time the daemon is started by the System Resource Controller. Any frame that has a frame ID mismatch can be monitored, but any control commands to the frame are ignored until the condition is corrected. A frame with a mismatch is noted in the System Monitor Graphical User Interface as well as in the AIX error log. The hmcmds command can be used to set the currently configured frame ID into a type 20 or type 22 supervisor after it is verified that the frame is correctly connected to the MACN.

Additional Configuration Information: The Hardware Monitor subsystem also obtains information from the system partition and the Syspar_map object classes in the SDR. While this information is not used by the hardmon daemon itself, it is used by the hardmon client commands listed under Related Information. Each of these commands executes in the environment of one system partition. If the SP system is not partitioned, these commands execute in the environment of the entire system. In any case, the Syspar_map object class is used to determine which nodes are contained in the current environment. The attributes of interest are syspar_name and node_number.

Starting and Stopping the hardmon Daemon

The hardmon daemon is under System Resource Controller (SRC) control. It uses the signal method of communication in SRC. The hardmon daemon is a single subsystem and not associated with any SRC group. The subsystem name is hardmon. To start the hardmon daemon, use the startsrc -s hardmon command. This starts the daemon with the default arguments and SRC options. The hardmon daemon is setup to be respawnable and be the only instance of the hardmon daemon running on a control workstation. Do not start the hardmon daemon from the command line without using the startsrc command to start it.

To stop the hardmon daemon, use the stopsrc -s hardmon command. This stops the daemon and does not allow it to respawn.

To display the status of the hardmon daemon, use the lssrc -s hardmon command.

If the default startup arguments need to be changed, use the chssys command to change the startup arguments or the SRC options. Refer to AIX Commands Reference and AIX General Programming Concepts: Writing and Debugging Programs for more information about daemons under SRC control and how to modify daemon arguments when under SRC.

To view the current SRC options and daemon arguments, use the odmget -q 'subsysname=hardmon' SRCsubsys command.

Files

/spdata/sys1/spmon/hmthresholds
Contains boundary values.

/spdata/sys1/spmon/hmacls
Contains Access Control Lists.

/spdata/sys1/spmon/hmdceacls
DCE ACL database.

Location

/usr/lpp/ssp/bin/hardmon

Related Information

Commands: hmadm, hmcmds, hmmon, nodecond, spmon, s1term

Files: /spdata/sys1/spmon/hmacls

Examples

  1. To start the hardmon daemon, enter:
    startsrc -s hardmon
    
  2. To stop the hardmon daemon, enter:
    stopsrc -s hardmon
    
  3. To display the status of the hardmon daemon, enter:
    lssrc -s hardmon
    
  4. To display the status of all the daemons under SRC control, enter:
    lssrc -a
    
  5. To display the current SRC options and daemon arguments for the hardmon daemon, enter:
    odmget -q 'subsysname=hardmon' SRCsubsys
    

hats

Purpose

hats - Starts or restarts Topology Services on a node or on the control workstation.

Syntax

hats

Flags

None.

Operands

None.

Description

Use this command to start the operation of Topology Services for a system partition (the hatsd daemon) on the control workstation or on a node within a system partition.

The hats script is not normally executed from the command line. It is normally called by the hatsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

The Topology Services subsystem provides internal services to PSSP components.

Note that the hats script issues the no -o nonlocsrcroute=1 command, which enables IP source routing. Do not change this setting, because the Topology Services subsystem requires this setting to work properly. If you change the setting, the Topology Services subsystem and a number of other subsystems that depend on it will no longer operate properly.

The hatsd daemon is initially started on the control workstation with the System Resource Controller (SRC), regardless of the level of the system partition. It is respawned automatically if the hatsd daemon encounters errors. The SP_NAME environment variable causes selection of the correct topology configuration.

Security

You must have root privilege to run this command.

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

The "Starting up and shutting down the SP system" chapter and "The System Data Repository" appendix in PSSP: Administration Guide

AIX Commands Reference

Information about the System Resource Controller (SRC) in AIX General Programming Concepts: Writing and Debugging Programs

Location

/usr/sbin/rsct/bin/hats

Related Information

Commands: hatsctrl, lssrc, startsrc, stopsrc, syspar_ctrl

Examples

See the hatsctrl command.

hatsctrl

Purpose

hatsctrl - A control script that starts the Topology Services subsystem.

Syntax

hatsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}

Flags

-a
Adds the subsystem.

-s
Starts the subsystem.

-k
Stops the subsystem.

-d
Deletes the subsystem.

-c
Cleans the subsystems, that is, delete them from all system partitions.

-u
Unconfigures the subsystems from all system partitions.

-t
Turns tracing on for the subsystem.

-o
Turns tracing off for the subsystem.

-r
Refreshes the subsystem.

-h
Displays usage information.

Operands

None.

Description

Topology Services is a distributed subsystem of PSSP that provides information to other PSSP subsystems about the state of the nodes and adapters on the IBM RS/6000 SP.

The hatsctrl control script controls the operation of the Topology Services subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hats. Associated with each subsystem is a daemon and a script that configures and starts the daemon.

An instance of the Topology Services subsystem executes on the control workstation and on every node of a system partition. Because Topology Services provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

From an operational point of view, the Topology Services subsystem group is organized as follows:

Subsystem
Topology Services

Subsystem Group
hats

SRC Subsystem
hats

The hats subsystem is associated with the hatsd daemon and the hats script. The hats script configures and starts the hatsd daemon.

The subsystem name on the nodes is hats. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hats.sp_prod and hats.sp_test.

Daemons
hatsd

The hatsd daemon provides the Topology Services. The hats script configures and starts the hatsd daemon.

The hatsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

The hatsctrl script provides a variety of controls for operating the Topology Services subsystem:

Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.

Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.

Adding the Subsystem

When the -a flag is specified, the control script uses the mkssys command to add the Topology Services subsystem to the SRC. The control script operates as follows:

  1. It makes sure that the hats subsystem is stopped.
  2. It gets the port number for the hats subsystem for this system partition from the Syspar_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The range of valid port numbers is 10000 to 10100, inclusive.

    The service name that is entered in the /etc/services file is hats.syspar_name.

  3. It checks to see if the subsystem is already configured in the SDR. If not, it creates an instance of the TS_Config class for this subsystem with default values. The default values are:
  4. It removes the hats subsystem from the SRC (just in case it is still there).
  5. It adds the hats subsystem to the SRC. On the control workstation, the IP address of the system partition is specified to be supplied as an argument to the daemon by the mkssys command.
  6. It adds an entry for the hats group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if hatsctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

Starting the Subsystem

When the -s flag is specified, the control script uses the startsrc command to start the Topology Services subsystem, hats.

Stopping the Subsystem

When the -k flag is specified, the control script uses the stopsrc command to stop the Topology Services subsystem, hats.

Deleting the Subsystem

When the -d flag is specified, the control script uses the rmssys command to remove the Topology Services subsystem from the SRC. The control script operates as follows:

  1. It makes sure that the hats subsystem is stopped.
  2. It removes the hats subsystem from the SRC using the rmssys command.
  3. It removes the port number from the /etc/services file.
  4. If there are no other subsystems remaining in the hats group, it removes the entry for the hats group from the /etc/inittab file.

Cleaning Up the Subsystems

When the -c flag is specified, the control script stops and removes the Topology Services subsystems for all system partitions from the SRC. The control script operates as follows:

  1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g hats command.
  2. It removes the entry for the hats group from the /etc/inittab file.
  3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.
  4. It removes all entries for the hats subsystems from the /etc/services file.

Unconfiguring the Subsystems

When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Topology Services subsystems.

Note:
The -u flag is effective only on the control workstation.

Prior to executing the hatsctrl command with the -u flag on the control workstation, the hatsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.

Turning Tracing On

When the -t flag is specified, the control script turns tracing on for the hatsd daemon, using the traceson command.

Turning Tracing Off

When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hatsd daemon, using the tracesoff command.

Refreshing the Subsystem

When the -r flag is specified, the control script refreshes the subsystem, using the hats refresh command and the refresh command. It rebuilds the information about the node and adapter configuration in the SDR and signals the daemon to read the rebuilt information. To refresh the subsystem across all nodes execute hatsctrl -r from the control workstation.

Logging

While it is running, the Topology Services daemon provides information about its operation and errors by writing entries in a log file. The hatsd daemon in the system partition named syspar_name uses a log file called /var/ha/log/hats.syspar_name.

Files

/var/ha/log/hats.syspar_name.
Contains the log of the hatsd daemon on the system partition named syspar_name.

Standard Error

This command writes error messages (as necessary) to standard error.

Exit Values

0
Indicates the successful completion of the command.

1
Indicates that an error occurred.

Security

You must have root privilege to run this command.

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Prerequisite Information

AIX Commands Reference

Information about the System Resource Controller (SRC) in AIX General Programming Concepts: Writing and Debugging Programs

Location

/usr/sbin/rsct/bin/hatsctrl

Related Information

Commands: hats, lssrc, startsrc, stopsrc, syspar_ctrl

Examples

  1. To add the Topology Services subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -a
    
  2. To start the Topology Services subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -s
    
  3. To stop the Topology Services subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -k
    
  4. To delete the Topology Services subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -d
    
  5. To clean up the Topology Services subsystem on all system partitions, enter:
    hatsctrl -c
    
  6. To unconfigure the Topology Services subsystem from all system partitions, on the control workstation, enter:
    hatsctrl -u
    
  7. To turn tracing on for the Topology Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -t
    
  8. To turn tracing off for the Topology Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
    hatsctrl -o
    
  9. To display the status of all of the subsystems in the Topology Services SRC group, enter:
    lssrc -g hats
    
  10. To display the status of an individual Topology Services subsystem, enter:
    lssrc -s subsystem_name
    
  11. To display detailed status about an individual Topology Services subsystem, enter:
    lssrc -l -s subsystem_name
    

    In response, the system returns information that includes the running status of the subsystem, the number of defined and active nodes, the required number of active nodes for a quorum, the status of the group of nodes, and the IP addresses of the source node, the group leader, and the control workstation.

  12. To display the status of all of the daemons under SRC control, enter:
    lssrc -a
    

hatsoptions

Purpose

hatsoptions - Controls Topology Services options on a node or control workstation.

Syntax

hatsoptions [-s] [-d]

Flags

-s
Instructs the Topology Services daemon to reject messages that are apparently delayed.

-d
Instructs the Topology Services daemon not to reject messages that are apparently delayed (this is the default).

Operands

None.

Description

Before this command can be executed, environment variable HB_SERVER_SOCKET must be set to the location of the UNIX-domain socket used by the Topology Services subsystem. The statement below can be used:

export HB_SERVER_SOCKET=/var/ha/soc/hats/server_socket.partition name
 

Alternatively, variable HA_SYSPAR_NAME can be set to the partition name.

The Topology Services daemon must be running in order for this command to be successful.

hatsoptions can be used to control a number of options in Topology Services. Option -s instructs the Topology Services daemon to reject messages that are apparently delayed. This can be used in very large system configurations, where messages are sometimes delayed in the network or in the sender and receiver nodes. Use this option only if the Time-Of-Day clocks are synchronized across all the nodes and the control workstation. Otherwise messages may be incorrectly discarded when the sender's Time-Of-Day clock is behind the receiver's.

Option -d instructs the Topology Services daemon not to reject messages that are apparently delayed. This is the default.

Environment Variables

HB_SERVER_SOCKET
This environment variable should be set before this command can be executed. It must be set to the location of the UNIX-domain socket used by Topology Services clients to connect to the Topology Services daemon. This environment variable must be set to /var/ha/soc/hats/server_socket./var/ha/soc/hats/server_socket.partition name.

HA_SYSPAR_NAME
If HB_SERVER_SOCKET is not set, then HA_SYSPAR_NAME must be set to the partition name.

Files

/var/ha/soc/hats/server_socket.partition name

Implementation Specifics

This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program (LP).

Exit Values

0
Indicates the successful completion of the command.

1
Indicates the command was unsuccessful.

Security

You must have root privilege to run this command.

Prerequisite Information

AIX Commands Reference

Location

/usr/sbin/rsct/bin/hatsoptions

Related Information

Commands: hatsctrl, hats, lssrc, startsrc, stopsrc, syspar_ctrl

Examples

To instruct the Topology Services daemon on the local node to start discarding apparently delayed messages, enter:

export HA_SYSPAR_NAME=partition1
 
/usr/sbin/rsct/bin/hatsoptions -s

hatstune

Purpose

hatstune - Tunes HATS (High Availability Topology Services) parameters.

Syntax

hatstune
[-f frequency] [-s sensitivity] [-p priority]
 
[-l log_length] [-m pin_object] [-r]

hatstune
-d [-r]

hatstune
-v

hatstune
-h

Flags

-f frequency
Set HATS heart beat frequency (interval between heart beats). The frequency is an integer value of heart beat frequency in seconds. The valid frequency range is [1, 30] or a special keyword "default." The default frequency value is used if the frequency is "default."

-s sensitivity
Set HATS heart beat sensitivity; the maximum number of missing heart beats allowed before the adapter is declared down.

The sensitivity is an integer value of heart beat sensitivity. The valid sensitivity range is [4, 40] or a special keyword, "default." The default sensitivity value will be used if sensitivity is the special keyword "default."

-p priority
Specifies the AIX process scheduling priority at which the HATS daemon should be run. Valid priority values are: the [10, 80] range, the special value 0, and the "default" keyword. The HATS daemon will run on the HATS default fixed priority if the special keyword "default" is specified. The HATS daemon will run on the AIX default user program non-fixed priority if the special priority value 0 is used.

-l log_length
Sets the maximum user and service log file length. The log_length is the maximum number of lines in the log files. The valid range is [2000, 1000000] or a special keyword "default." The default log length value will be used if log_length is "default."

-m pin_object
Specifies whether the Topology Services daemon should be pinned in memory. Because HATS is a real time program, it should be able to run on a timely basis. Pinning some of HATS' address space in memory reduces the likelihood that the Topology Services daemon could become blocked for paging operations.

This option specifies which HATS objects should be pinned in real memory. The valid objects are data, text, proc, none, and a special keyword "default."

data
Only the data segment will be pinned in real memory.

text
Only the code segment will be pinned in real memory.

proc
Both data and text will be pinned in real memory.

none
None of above will be pinned in real memory.

default
The default setting will be used.

The above pin_objects are mutually exclusive. The objects are not case sensitive.

-r
Refreshes HATS after the tunable parameters are set successfully. By default, hatstune sets the tunable parameters only. The new parameters do not take effect until a HATS refresh is performed.

-d
Resets all HATS tunable parameters to their default values.

-v
Displays current settings of all HATS tunable parameters.

-h
Prints a brief usage message.

Operands

None.

Description

HATS tunable parameters can be set by changing SDR attributes of certain SDR objects. This command offers an easier mechanism to change HATS tunable parameters. Besides not requiring knowledge of the SDR objects needed to tune HATS, hatstune performs consistency checking on the tunable values.

The valid and default values of HATS tunable parameters are:

Heart beat frequency: (seconds)
1 <= Frequency <= 30 (default: 1)

Heart beat sensitivity:
4 <= Sensitivity <= 40 (default: 4)

Run HATS daemon at a fixed priority:
10 <= FixPri_Value <= 80 (default: 38)

Maximum log file length: (lines)
2000 <= Log_Length <= 1000000 (default: 5000)

Objects to be pinned in real memory:
Data, Text, Proc (Data and Text), None (default: Text)

Multiple options can be used together. The same option cannot be specified more than once.

By default, the new HATS tunable values are not in effect until a refresh operation is done by using the -r option or until a hatsctrl -r command is issued. This allows you to change the HATS tunable parameters as many times as desired and then specify the refresh option to have the changes take effect.

Value checking (-v option) can be performed on any SP node by any user. Value assignment (-f,-s, -p , -l, -m, -d options) can only be run on the control workstation by users who have root privilege.

Environment Variables

The SP_NAME variable can be used to designate the applicable partition.

Standard Output

Usage message, current HATS tunable values, new tunable values.

Standard Error

Output consists of error messages, when the command cannot complete successfully.

Exit Values

0
Indicates successful completion of the command.

nonzero
Indicates that an error occurred.

Security

The command can be run by any user on any SP node to check the current HATS tunable values. However, only users who have root privilege can change the settings.

Restrictions

Value checking (-v option) can be performed on any SP node by any user. Value assignment (-f,-s, -p , -l, -m, -d options) can only be run on the control workstation by users who have root privilege.

Implementation Specifics

This command is part of the IBM Reliable Scalable Cluster Technology (RSCT) on the RS/6000 system.

Prerequisite Information

"The Topology Services subsystem" chapter in the PSSP: Administration Guide.

Location

/usr/sbin/rsct/bin/hatstune

Related Information

Commands: hatsctrl, SDRGetObjects, SDRChangeAttrValues, splst_syspars

Examples

  1. To change the heart beat frequency to 3 seconds and the heart beat sensitivity to 10 (this command must be run on the control workstation), enter:
    hatstune -f 3 -s 10
    
  2. To change the maximum log file lengths to 10000 lines and cause the change to take effect immediately (this command must be run on the control workstation), enter:
    hatstune -l 10000 -r
    
  3. To reset all HATS tunable parameters to default values (this command must be run on the control workstation), enter:
    hatstune -d
    
  4. To display current HATS tunable settings, enter:
    hatstune -v
    
  5. To show the usage message, enter:
    hatstune -h
    


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]