ATTENTION - READ THIS FIRST |
---|
Do not activate this trace facility until you have read this section completely, and understand this material. If you are not certain how to properly use this facility, or if you are not under the guidance of IBM Service, do not activate this facility. Activating this facility may result in degraded performance of your system. Activating this facility may also result in longer response times, higher processor loads, and the consumption of system disk resources. Activating this facility may also obscure or modify the symptoms of timing-related problems. |
Consult these logs for debugging purposes. They all refer to a particular instance of the Topology Services daemon running on the local node.
This log contains trace information about the activities performed by the daemon. When a problem occurs, logs from multiple nodes will often be needed. These log files must be collected before they wrap or are removed.
The trace is located in:
where DD is the day of the month when the daemon was started, and hhmmss is the time when the daemon was started.
If obtaining logs from all nodes is not feasible, the following is a list of nodes from which logs should be collected:
The Group Leader is the node which has the highest IP address on a network.
This is the node whose IP address is immediately lower than the address of the node where the problem was seen. The node with the lowest IP address has a Downstream Neighbor of the node with the highest IP address.
The most detailed level of tracing is Service log long tracing. It is started with the command:
traceson -l -s subsystem_name
where subsystem_name is:
With service log long tracing, trace records are generated under the following conditions:
Data in the Service log is in English. Each Service log entry has this format:
date daemon name message
Adapters are identified by a pair:
(IP address:incarnation number)
Groups are identified by a pair:
(IP address of Group Leader:incarnation number of group)
Long tracing should be activated on request from IBM Service. It can be activated (just for about one minute, to avoid overwriting other data in the log file), when the error condition is still present.
Service log normal tracing is the default, and is always running. There is negligible impact if no node or adapter events occur on the system. An adapter death event may result in approximately 50 lines of log information for the Group Leader and "mayor" nodes, or up to 250 lines for the Group Leader and "mayor" nodes on systems of approximately 400 nodes. All other nodes will produce less than 20 lines. Log file sizes can be increased as described in Changing the service log size.
With normal tracing, trace records are generated for these conditions:
No entries are created when no adapter or node events are happening on the system.
With normal tracing, the log trimming rate depends heavily on the frequency of adapter or node events on the system. The location of the log file and format of the information is the same as that of the long tracing described previously.
If the Service log file, using normal tracing, keeps growing even when no events appear to be happening on the system, this may indicate a problem. Search for possible entries in the AIX error log or in the User log. See Topology Services user log.
The long trace generates approximately 10KB of data per minute of trace activity. By default, log files have a maximum of 5000 lines, which will be filled in 30 minutes or less if long tracing is requested. To change the log file size:
hatstune -l new_max_lines -r
The full path name of this command is: /usr/sbin/rsct/bin/hatstune.
For example, hatstune -l 10000 -r changes the maximum number of lines in a log file to 10000. The -r flag causes the Topology Services subsystem to be refreshed in all the nodes.
smit hacmp Cluster Configuration Cluster Topology Configure Topology Services and Group Services Change / Show Topology and Group Services Configuration Cluster Topology Synchronize Cluster Topology
The Topology Services user log contains error and informational messages produced by the daemon. This trace is always running. It has negligible impact on the performance of the system, under normal circumstances.
The trace is located in: /var/ha/log/hats.DD.hhmmss.partition_name.lang for PSSP nodes, and /var/ha/log/topsvcs.DD.hhmmss.cluster_name.lang for HACMP nodes, where DD is the day of the month when the daemon was started, hhmmss is the time when the daemon was started, and lang is the language used by the daemon.
Data in the user log is in the language where the daemon is run, which is the node's administrative language. Messages in the user log have a catalog message number, which can be used to obtain a translation of the message in the desired language.
The size of the log file is changed using the same commands that change the size of the service log. Truncation of the log, saving of log files, and other considerations are the same as for the service log.
Each user log entry has this format:
date daemon name message
Adapters are identified by a pair:
(IP address:incarnation number)
Groups are identified by a pair:
(IP address of Group Leader:incarnation number of group)
The main source for diagnostics is the AIX error log. Some of the error messages produced in the user log occur under normal circumstances, but if they occur repeatedly they indicate an error. Some error messages give additional detail for an entry in the error log. Therefore, this log file should be examined when an entry is created in the system error log.
This is the Topology Services startup script log. It contains configuration data used to build the machines.lst configuration file. This log also contains error messages if the script was unable to produce a valid machines.lst file and start the daemon. The startup script is run at subsystem startup time and at refresh time.
This log refers to a particular instance of the Topology Services script running on the local node. In PSSP, the control workstation is responsible for building the machines.lst file from the adapter information in the SDR. Therefore, it is usually on the control workstation that the startup script encounters problems. In HACMP, the machines.lst file is built on every node.
The size of the file varies from 1KB to 50KB according to the size of the machine. The trace runs whenever the startup script runs. The trace is located in:
A new instance of the hats startup script log is created each time the script starts. A copy of the script log is made just before the script exits. Only the last seven instances of the log file are kept, and they are named file.1 through file.7. Therefore, the contents of the log must be saved before the subsystem is restarted or refreshed many times.
The file.1 is an identical copy of the current startup script log. At each startup, file.1 is renamed to file.2; file.2 is renamed to file.3, and so on. Therefore, the previous file.7 is lost.
Entries in the startup script log are kept both in English and in the node's language (if different). Trace records are created for these conditions:
There is no fixed format for the records of the log. The following information is in the file:
The following information is in the HACMP startup script log:
The main source for diagnostics is the AIX error log. The hats script log file should be used when the error log shows that the startup script was unable to complete its tasks and start the daemon.
For a PSSP system, good results are indicated when the message:
Exec /usr/sbin/rsct/bin/hatsd -n node_number
appears towards the beginning of the file.
For an HACMP system, good results are indicated by the absence of fatal error messages.
For a PSSP system, error results are indicated by one or more error messages. For example:
hats: 2523-605 Cannot find the address of the control workstation.
and the absence of the Exec message.
For an HACMP system, error results are indicated by the presence of the message:
topsvcs: 2523-600 Exit with return code: error code
and possibly other error messages.
All error messages have the message catalog number with them.
This log contains trace information about the activities of the Network Interface Modules (NIMs), which are processes used by the Topology Services daemon to monitor each network interface. These logs need to be collected before they wrap or are removed.
The trace is located in:
Where 00n is a sequence number of 001, 002, or 003. These three logs are always kept. Log file 003 is overwritten by 002, 002 is overwritten by 001, and 001 is overwritten by 003.
Trace records are generated under the following conditions:
Data in the NIM log is in English only. The format of each message is:
time-of-day message
An instance of the NIM log file will wrap when the file reaches around 200kB. Normally, it takes around 10 minutes to fill an instance of the log file. Since 3 instances are kept, the NIM log files needs to be saved within 30 minutes of when the adapter-related problem occurred.