[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

General Programming Concepts: Writing and Debugging Programs


Chapter 27. Trace Facility

The trace facility helps you isolate system problems by monitoring selected system events. Events that can be monitored include: entry and exit to selected subroutines, kernel routines, kernel extension routines, and interrupt handlers. When the trace facility is active, information about these events is recorded in a system trace log file. The trace facility includes commands for activating and controlling traces and generating trace reports. Applications and kernel extensions can use several subroutines to record additional events.

For more information on the trace facility, refer to the following:


The Trace Facility Overview

The trace facility is in the bos.sysmgt.trace file set. To see if this file set is installed, use

lslpp -l | grep bos.sysmgt.trace

If a line is produced which includes bos.sysmgt.trace then the file set is installed, otherwise you must install it.

The following topics are discussed:

The AIX system trace facility records trace events which can be formatted later by the trace report command. Trace events are compiled into kernel or application code, but are only traced if tracing is active.

Tracing is activated with the trace command or the trcstart subroutine. Tracing is stopped with either the trcstop command or the trcstop subroutine. While active, tracing can be suspended or resumed with the trcoff and trcon commands, or the trcoff and trcon subroutines.

Once the trace has been stopped with trcstop, a trace report can then be generated with the trcrpt command. This command uses a template file, /etc/trcfmt, to know how to format the entries. The templates are installed with the trcupdate command. For a discussion of the templates, see the trcupdate command.


Controlling the Trace

The trace command starts the tracing of system events and controls the trace buffer and log file sizes. This command is documented in the article on the trace daemon in the Command's Reference.

There are three methods of gathering trace data.

  1. The default method is to use 2 buffers to continuously gather trace data, writing one buffer while data is being put into the other buffer. The log file wraps when it becomes full.
  2. The circular method gathers trace data continuously, but only writes the data to the log file when the trace is stopped. This is particularly useful for debugging a problem where you know when the problem is happening and you just want to capture the data at that time. You can start the trace at any time, and then stop it right after the problem occurs and you'll have captured the events around the problem. This method is enabled with the -l trace daemon flag.
  3. The third option only uses one trace buffer, and quits tracing when that buffer fills, and writes the buffer to the log file. The trace is not stopped at this point, rather tracing is turned off as if a trcoff command had been issued. At this point you will usually want to stop the trace with the trcstop command. This option is most often used to gather performance data where we don't want trace to do i/o or buffer swapping until the data has been gathered. Use the -f flag to enable this option.

You will usually want to run the trace command asynchronously, in other words, you want to enter the trace command and then continue with other work. To run the trace asynchronously, use the -a flag. You must then stop the trace with the trcstop command.

It is usually desirable to limit the information that is traced. Use the -j events or -k events flags to specify a set of events to include (-j) or exclude (-k). Note, however, that to be able to display the program names associated with trace hooks, certain hooks must be enabled. These are given in the trace command article. These hooks may also be enabled by selecting the tidhk trace hook group from SMIT.

For example, if you want to trace the mbuf hook, 254, and show program names also, you need to run trace as follows:

trace -aj 106,10c,134,139,465,254

Tracing occurs.

trcstop

trcrpt -O exec=on

The hooks needed for showing the program name may change from release to release, so please check the trace command article. The -O exec=on trcrpt option shows the program names, see the trcrpt command.

It is often desirable to specify the buffer size and the maximum log file size. The trace buffers require real memory to be available so that no paging is necessary to record trace hooks. The log file will fill to the maximum size specified, and then wrap around, discarding the oldest trace data. The -T size and -L size flags specify the size of the memory buffers and the maximum size of the trace data in the log file in bytes.

Note: Because the trace facility pins the data collection buffers, making this amount of memory unavailable to the rest of the system, the trace facility can impact performance in a memory-constrained environment. If the application being monitored is not memory-constrained, or if the percentage of memory consumed by the trace routine is small compared to what is available in the system, the impact of trace "stolen" memory should be small. If you do not specify a value, trace uses the default sizes.

It should also be noted that the buffer memory is allocated from the kernel heap by default. If there isn't enough available memory, it is allocated from separate segments. On a busy system it may be desirable to allocate the buffers from separate memory segments to avoid constraining the kernel heap. This is done with the -B flag on the trace command.

Tracing may also be controlled from an application. See the trcstart, and trcstop articles.


Recording Trace Event Data

There are two types of trace data.

generic data
consists of a data word, a buffer of opaque data and the opaque data's length. This is useful for tracing items such as path names. See the Generic Trace Channels article in the Trace Facility Overview. It can be found in Chapter 27, Trace Facility.

Non-generic data
This is what is normally traced by the AIX operating system. Each entry of this type consists of a hook word and up to 5 words of trace data. For a 64-bit application these are 8-byte words. The C programmer should use the macros TRCHKL0 through TRCHKL5, and TRCHKL0T through TRCHKL5T defined in the /usr/include/sys/trcmacros.h file, to record non-generic data. If these macros can not be used, see the article on the utrchook subroutine.

Generating a Trace Report

See the trcrpt command article for a full description of trcrpt. This command is used to generate a readable trace report from the log file generated by the trace command. By default the command formats data from the default log file, /var/adm/ras/trcfile. The trcrpt output is written to standard output.

To generate a trace report from the default log file, and write it to /tmp/rptout, enter

trcrpt >/tmp/rptout

To generate a trace report from the log file /tmp/tlog to /tmp/rptout, which includes program names and system call names, use

trcrpt -O exec=on,svc=on /tmp/tlog >/tmp/rptout


Extracting trace data from a dump

If trace was active when the system takes a dump, the trace can usually be retrieved with the trcdead command. To avoid overwriting the default trace log file on the current system, use the -o output-file option.

For example

trcdead /o /tmp/tlog /var/adm/ras/vmcore.0

creates a trace log file /tmp/tlog which may then be formatted with

trcrpt /tmp/tlog


Trace Facility Commands

The following commands are part of the trace facility:

trace Starts the tracing of system events. With this command, you can control the size and manage the trace log file as well as the internal trace buffers that collect trace event data.
trcdead Extracts trace information from a system dump. If the system halts while the trace facilities are active, the contents of the internal trace buffers are captured. This command extracts the trace event data from the dump and writes it to the trace log file.
trcnm Generates a kernel name list used by the trcrpt command. A kernel name list is composed of a symbol table and a loader symbol table of an object file. The trcrpt command ues the kernel name list file to inerpret addresses when formatting a report from a trace log file.
trcrpt Formats reports of trace event data contained in the trace log file. You can specify the events to be included (or omitted) in the report, as well as determine the presentation of the output with this command. The trcrpt command uses the trace formatting templates stored in the /etc/trcfmt file to determine how to interpret the data recorded for each event.
trcstop Stops the tracing of system events.
trcupdate Updates the trace formatting templates stored in the /etc/trcfmt file. When you add applications or kernel extensions that record trace events, templates for these events must be added to the /etc/trcfmt file. The trcrpt command will use the trace formatting templates to determine how to interpret the data recorded for each event. Software products that record events usually run the trcupdate command as part of the installation process.

Trace Facility Calls and Subroutines

The following calls and subroutines are part of the trace facility:

trcgen, trcgent Records trace events of more than five words of data. The trcgen subroutine may be used to record an event as part of the system event trace (trace channel 0) or to record an event on a generic trace channel (channels 1 through 7). You specify the channel number in a subroutine parameter when you record the trace event. The trcgent subroutine appends a time stamp to the event data.
trchook, utrchook Records trace events of up to five words of data. These subroutines may be used to record an event as part of the system event trace (trace channel 0). The utrchook subroutine uses a special FAST-SVC path to improve performance and should be used by programs at the user (application) level.
trcgenk, trcgenkt Records trace events of more than five words of data. The trcgenk subroutine may be used to record an event as part of the system event trace (trace channel 0) or to record an event on a generic trace channel (channels 1 through 7). You specify the channel number in a subroutine parameter when you record the trace event. The trcgenkt subroutine appends a time stamp to the event data.
trcoff Suspends the collection of trace data on either the system event trace channel (channel 0) or a generic trace channel (1 through 7). The trace channel remains active and trace data collection can be resumed by using the trcon subroutine.
trcon Starts the collection of trace data on a trace channel. The channel may be either the system event trace channel (0) or a generic channel (1 through 7). The trace channel, however, must have been previously activated by using the trace command or the trcstart subroutine. You can suspend trace data collection by using the trcoff subroutine.
trcstart Requests a generic trace channel. This subroutine activates a generic trace channel and returns the channel number to the calling application to use in recording trace events using the trcgen, trcgent, trcgenk, and trcgenkt subroutines.
trcstop Frees and deactivates a generic trace channel.

Trace Facility Files


/etc/trcfmt Contains the trace formatting templates used by the trcrpt command to determine how to interpret the data recorded for each event.
/var/adm/ras/trcfile Contains the default trace log file. The trace command allows you to specify a different trace log file.
/usr/include/sys/trchkid.h Contains trace hook identifier definitions.
/usr/include/sys/trcmacros.h Contains commonly used macros for recording trace events.

Trace Event Data

The data recorded for each traced event consist of a word containing the trace hook identifier and the hook type followed by a variable number of words of trace data optionally followed by a time stamp. The word containing the trace hook identifier and the hook type is called the hook word. The remaining two bytes of the hook word are called hook data and are available for recording event data.

Trace Hook Identifiers

A trace hook identifier is a three-digit hexadecimal number that identifies an event being traced. You specify the trace hook identifier in the first twelve bits of the hook word. Trace hook identifiers are defined in the /usr/include/sys/trchkid.h file. The values 0x010 through 0x0FF are available for use by user applications. All other values are reserved for system use. The currently defined trace hook identifiers can be listed using the trcrpt -j command.

Hook Types

The hook type identifies the composition of the event data and is user-specified. The twelfth through the sixteenth bits of the hook word constitute the hook type. For more information on hook types, refer to the trcgen, trcgenk, and trchook subroutines.

Trace Facility Generic Trace Channels

The trace facility supports up to eight active trace sessions at a time. Each trace session uses a channel of the multiplexed trace special file, /dev/systrace. Channel 0 is used by the trace facility to record system events. The tracing of system events is started and stopped by the trace and trcstop commands. Channels 1 through 7 are referred to as generic trace channels and may be used by subsystems for other types of tracing such as data link tracing.

To implement tracing using the generic trace channels of the trace facility, a subsystem calls the trcstart subroutine to activate a trace channel and to determine the channel number. The subsystem modules can then record trace events using the trcgen, trcgent, trcgenk, or trcgenkt subroutine. The channel number returned by the trcstart subroutine is one of the parameters that must be passed to these subroutines. The subsystem can suspend and resume trace data collection using the trcoff and trcon subroutines and can deactivate a trace channel using the trcstop subroutine. The trace events for each channel must be written to a separate trace log file, which must be specified in the call to the trcstart subroutine. The subsystem must provide the user interface to activating and deactivating subsystem tracing.

The trace hook IDs, which are stored in the /usr/include/sys/trchkid.h file, and the trace formatting templates, which are stored in the /etc/trcfmt file, are shared by all the trace channels.

Related Information

The trace daemon in AIX 5L Version 5.1 Commands Reference.

The trcdead command, trcnm command, trcrpt command, trcstop command, trcupdate command in AIX 5L Version 5.1 Commands Reference.

The trchook subroutine, trcgen subroutine, trcoff subroutine, trcon subroutine, trcstart subroutine, trcstop subroutine in AIX 5L Version 5.1 Technical Reference: Base Operating System and Extensions Volume 2.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]