IBM Books

Administration Guide


Topology Services components

The Topology Services subsystem consists of the following components:

Topology Services Daemon
The central component of the Topology Services subsystem. |

|Pluggable Network Interface Module (NIM)
|The program used by the Topology Services daemon to communicate with each |local adapter.

Port numbers
TCP/IP port numbers that the Topology Services subsystem uses for daemon-to-daemon communications. The Topology Services subsystem also uses UNIX domain sockets for server-to-client communication.

Control script
A shell script that is used to add, start, stop, and delete the Topology Services subsystem, which operates under the System Resource Controller (SRC) component.

Start-up script
A shell script that is used to obtain the configuration from the System Data Repository (SDR) and start the Topology Services Daemon. This script is invoked by the SRC component. |

|Tuning script
|A script used to change the Topology Services tunable parameters at run |time.

Files and directories
Various files and directories that are used by the Topology Services subsystem to maintain run-time data.

The sections that follow contain more details about each of these components.

The Topology Services daemon (hatsd)

The Topology Services daemon is contained in the executable file /usr/sbin/rsct/bin/hatsd. This daemon runs on each node of an SP system partition and on the control workstation. If there is more than one system partition, then multiple daemons run on the control workstation, one for each system partition.

Note that the operational domain of the Topology Services subsystem is an SP system partition. The control workstation is considered to be part of each domain. Unless otherwise stated, a reference to "the Topology Services subsystem" is a reference to the Topology Services subsystem in a single system partition.

When each daemon starts, it first reads its configuration from a file set up by the Startup script (hats). This file is called the machines list file, because it has all the machines (nodes) listed that are part of the configuration and the IP addresses for each adapter for each of the nodes in that configuration. From this file, the daemon knows the IP address and node number of all the potential heartbeat ring members.

The Topology Services subsystem directive is to form as large a heartbeat ring as possible. To form this ring, the daemon on one node must alert those on the other nodes of its presence by sending a proclaim message. According to a hierarchy defined by the Topology Services component, daemons can send a proclaim message only to IP addresses that are lower than its own and can accept a proclaim message only from an IP address higher than its own. Also, a daemon only proclaims if it is the leader of a ring. When a daemon first starts up, it builds a heartbeat ring for every local adapter, containing only that local adapter. This is called a singleton group and this daemon is the leader in each one of these singleton groups.

To manage the changes in these groups, Topology Services defines the following roles for each group:

Group Leader
The daemon on the node with the local adapter that has the highest IP address in the group. The Group Leader proclaims, handles request for joins, handles death notifications, coordinates group membership changes, and sends connectivity information.

Crown Prince
The daemon on the node with the local adapter that has the second highest IP address in the group. This daemon can detect the death of the Group Leader and has the authority to become the leader of the group if that happens.

Mayor
A daemon on a node with a local adapter present in this group that has been picked by the Group Leader to broadcast a message to all the adapters in the group. When a daemon receives a message to broadcast, it is a mayor.

Generic
This is the daemon on any node with a local adapter in the heartbeat ring. The role of the Generic daemon is to monitor the heartbeat of the upstream neighbor and inform the Group Leader if the correct number of heartbeats have been missed.

Each one of these roles are dynamic, which means that every time a new heartbeat ring is formed, the roles of each member are evaluated and assigned.

In summary, Group Leaders send and receive proclaim messages. If the proclaim is from a Group Leader with a higher IP address, then the Group Leader with the lower address replies with a join request. The higher address Group Leader forms a new group with all members from both groups. All members monitor their upstream neighbor for heartbeats. If a sufficient number of heartbeats are missed, a message is sent to the Group Leader and the unresponsive adapter will be dropped from the group. Whenever there is a membership change, Group Services is notified if it asked to be.

The Group Leader also accumulates node connectivity information, constructs a connectivity graph, and routes connections from its node to every other node in the SP system partition. The group connectivity information is sent to all nodes so that they can update their graphs and also compute routes from their node to any other node. It is this traversal of the graph on each node that determines which node membership notification is provided to each node. Nodes to which there is no route are considered unreachable and are marked as down. Whenever the graph changes, routes are recalculated, and a list of nodes that have connectivity is generated and made available to Group Services.

When a network adapter fails or has a problem in one node, the daemon will, for a short time, attempt to form a singleton group, since the adapter will be unable to communicate with any other adapter in the network. Topology Services will invoke a function which uses self-death logic. This self-death logic will attempt to determine whether the adapter is still working. This involves sending messages to other adapters in the network and monitoring the number of incoming packets. If no packets are received, the local adapter is considered to be down. Group Services is then notified that all adapters in the group are down.

After an adapter that was down recovers, the daemon will eventually find that the adapter is working again, by using a mechanism similar to the self-death logic, and will form a singleton group with it. This should allow the adapter to form a larger group with the other adapters in the network. An adapter up notification for the local adapter is sent to the Group Services subsystem.

|The pluggable NIMs

|Topology Services pluggable NIMs are processes started by the Topology |Services daemon to monitor each local adapter. The NIM is responsible |for: |

Port numbers and sockets

The Topology Services subsystem uses several types of communications:

Intrapartition port numbers

For communication between Topology Services daemons within a system partition, the Topology Services subsystem uses a single UDP port number. This port number is recorded in the Syspar_ports SDR class. The class contains two attributes: subsystem and port. The class object whose subsystem attribute has a value of hats contains the port number for the Topology Services subsystem. Note that the Syspar_ports class is a partitioned SDR class: there is a set of objects for each SP system partition. When you configure the Topology Services subsystem for a particular system partition, you use the SDR data that corresponds to that system partition.

The Topology Services port number is stored in the SDR so that, when the Topology Services subsystem is configured on each node, the port number is retrieved from the SDR. This ensures that the same port number is used by all Topology Services daemons in the same system partition. Note that because a Topology Services daemon from each system partition runs on the control workstation, the Topology Services subsystem in each system partition must have a unique port number.

This intrapartition port number is also set in the /etc/services file, using the service name hats.syspar_name, where syspar_name is the name of the system partition. The /etc/services file is updated on all nodes in the system partition and on the control workstation. The Topology Services daemon obtains the port number from the /etc/services file during daemon initialization.

UNIX domain sockets

UNIX domain sockets are used for communication between Topology Services clients (Group Services and Event Management), and the local Topology Services daemon. These are connection-oriented sockets. The socket name used to connect to the Topology Services daemon is /var/ha/soc/hats/server_socket.syspar_name, where syspar_name is the name of the system partition.

The control script (hatsctrl)

The Topology Services control script is contained in the executable file /usr/sbin/rsct/bin/hatsctrl. This script or command is normally invoked by the syspar_ctrl script, which provides an interface to all of the SP partition-sensitive subsystems.

If necessary, you can invoke the hatsctrl script directly from the command line. Before you run this command, be sure that the SP_NAME environment variable is set to the appropriate system partition name and that it is exported.

The purpose of the hatsctrl command is to add the Topology Services subsystem to the operating software configuration of an SP system partition. You can also use the command to remove the subsystem from a system partition, start the subsystem, stop the subsystem, and clean the subsystem from all system partitions.

For more information about the hatsctrl and syspar_ctrl commands, see the book PSSP: Command and Technical Reference.

|The start-up script (hats)

|The Topology Services start up script is contained in the |/usr/sbin/rsct/bin/hats executable file. The purpose of the |hats command is to get the configuration from the SDR and prepare the |environment for the Topology Services daemon to run. Normally, the |Topology Services start up script is invisible to users. Use the |Topology Services control script to operate the Topology Services |subsystem.

|The tuning script (hatstune)

|The Topology Services tuning script is contained in the |/usr/sbin/rsct/bin/hatstune executable file. The purpose of |the hatstune command is to change the Topology Services tunable |parameters at run time. Normally, you set the parameters during SP |configuration time. When Topology Services starts it gets these |parameters from the SDR.

|Sometimes you might want to change some parameters after Topology Services |starts. The hatstune command provides an easy way to change |the parameters without shutting down and restarting Topology |Services.

Files and directories

The Topology Services subsystem uses the following directories:

The /var/ha/log directory (log files)

The /var/ha/log directory contains trace output from the Topology Services startup script (hats) and the daemon (hatsd).

|There are four different log files created in this directory: |the startup script log, the service version of the daemon trace log, the user |version of the daemon trace log, and the NIM trace files. The files, each with the same names on the control workstation and on the nodes, have the following conventions:

  1. The Topology Services log from the hats startup script is:

    hats.syspar_name[.n]

    where:

    syspar_name is the name of the system partition to which the node belongs.

    n is a number from 1 to 7 with hats.syspar_name.1 being the most recent instance of the file and hats.syspar_name.7 being the least recent instance.

    The seven most recent instances are kept and older instances are removed.

  2. The service version of the log from the hatsd daemon is:

    hats.DD.HHMMSS.syspar_name

    where:

    DD is the Day of the Month that this daemon was started.

    HHMMSS is the Hour, Minute, and Second that the daemon was started.

    syspar_name is the name of the system partition to which the node belongs.

    The contents of this log might be used by IBM Service to help diagnose a problem. The five most recent instances of this file are kept and older instances are removed.

  3. The user version of the trace log from the hatsd daemon is:

    hats.DD.HHMMSS.syspar_name.locale

    where:

    DD is the Day of the Month that this daemon was started.

    HHMMSS is the Hour, Minute, and Second that the daemon was started.

    syspar_name is the name of the system partition to which the node belongs.

    locale is the language locale in which the Topology Services daemon was started.

    This user version contains error messages that are issued by the hatsd daemon. The file provides detailed information that can be used together with the AIX error log for diagnosing problems.

  4. |The NIM trace log from the pluggable NIM is:

    |nim.cthats.interface_name[.nnn]

    |where: |

    |

    interface_name is the network interface name, like en0.

    |

    nnn is a number from 001 to 003 with |nim.hats.interface_name.001 |being the most recent instance of the back up file and |nim.hats.interface_name.003 |being the least recent instance of the back up file. The file without |the trailing .nnn is the current NIM trace log. |

    |The default NIM shipped with Topology Services limits the size of its trace |log files to about 200KB. When the NIM trace log file grows to the |limit, the current NIM trace log file is renamed to the most recent back up |file and a new NIM trace log file is created. The current and 3 most |recent instances of the back up files are kept and the older instances are |removed.

The machines list file is also kept in this directory.

The Topology Services daemon limits the size of both the service and user log files to 5,000 lines by default. That limit can be altered during system configuration. When the limit is reached, the hatsd daemon appends the string '.bak' to the name of the current log file and begins a new log file with the same original name. A file that already exists with the '.bak' qualifier is removed before the current log is renamed.

|The /var/ha/soc directory (socket files)

|The /var/ha/soc/hats.syspar_name directory |contains the UNIX domain sockets used for communications between the Topology |Services daemon and its clients and NIMs. The UNIX domain socket name |for communications between the Topology Services daemon and its clients is |server_socket. The UNIX domain socket name for |communications between the Topology Services daemon and NIMs is as |follows:

|NIM_name.pid

|where: |

|

NIM_name is the executable name of the NIM. The name of |the default NIM shipped with Topology Services is |default_ip_nim.

|

pid is the process identification (PID) of the NIM |process. |

The /var/ha/run directory (daemon working files)

In the /var/ha/run directory, a directory named hats.syspar_name is created, where syspar_name is the system partition name. This directory is the current working directory for the Topology Services daemon. If the Topology Services daemon abnormally terminates, the core dump file is placed in this directory. Whenever the Topology Services daemon starts, it renames any core file to:

core.DD.HHMMSS.syspar_name

where:

DD is the Day of the Month that the daemon associated with this core file was started.

HHMMSS is the Hour, Minute, and Second that the daemon associated with this core file was started.

syspar_name is the name of the system partition to which the node belongs.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]