Performance Management Guide

Understanding the Trace Facility

The trace facility is more flexible than traditional system-monitor services that access and present statistics maintained by the system. It does not presuppose what statistics will be needed, instead, trace supplies a stream of events and allows the user to decide what information to extract. With traditional monitor services, data reduction (conversion of system events to statistics) is largely coupled to the system instrumentation. For example, many systems maintain the minimum, maximum, and average elapsed time observed for executions of task A and permit this information to be extracted.

The trace facility does not strongly couple data reduction to instrumentation, but provides a stream of trace event records (usually abbreviated to events). It is not necessary to decide in advance what statistics will be needed; data reduction is to a large degree separated from the instrumentation. The user may choose to determine the minimum, maximum, and average time for task A from the flow of events. But it is also possible to:

Extract the average time for task A when called by process B
Extract the average time for task A when conditions XYZ are met
Calculate the standard deviation of run time for task A
Decide that some other task, recognized by a stream of events, is more meaningful to summarize.

This flexibility is invaluable for diagnosing performance or functional problems.

In addition to providing detailed information about system activity, the trace facility allows application programs to be instrumented and their trace events collected in addition to system events. The trace file then contains a complete record of the application and system activity, in the correct sequence and with precise time stamps.

Implementation

A trace hook is a specific event that is to be monitored. A unique number is assigned to that event called a hook ID. The trace command monitors these hooks.

The trace command generates statistics on user processes and kernel subsystems. The binary information is written to two alternate buffers in memory. The trace process then transfers the information to the trace log file on disk. This file grows very rapidly. The trace program runs as a process which may be monitored by the ps command. The trace command acts as a daemon, similar to accounting.

The following figure illustrates the implementation of the trace facility.

Figure 12-1. Implementation of the Trace Facility. This illustration shows the trace process. In this process, the user process (kernel subsystems) sends trace hook calls to trace buffers labled A and B. From the buffers, they pass through the trace driver and on to the trace log file of the user kernel.

Monitoring facilities use system resources. Ideally, the overhead should be low enough as to not significantly affect system execution. When the trace program is active, the CPU overhead is less than 2 percent. When the trace data fills the buffers and must be written to the log, additional CPU is required for file I/O. Usually this is less than 5 percent. Because the trace program claims and pins buffer space, if the environment is memory-constrained, this might be significant. Be aware that the trace log and report files can become very large.

Limiting the Amount of Trace Data Collected

The trace facility generates large volumes of data. This data cannot be captured for extended periods of time without overflowing the storage device. There are two ways to use the trace facility efficiently:

The trace facility can be turned on and off in multiple ways to capture system activity. It is practical to capture in this way seconds to minutes of system activity for post processing. This is enough time to characterize major application transactions or interesting sections of a long task.
The trace facility can be configured to direct the event stream to standard output. This allows a real-time process to connect to the event stream and provide data reduction as the events are recorded, thereby creating long-term monitoring capability. A logical extension for specialized instrumentation is to direct the data stream to an auxiliary device that can either store massive amounts of data or provide dynamic data reduction. This technique is used by the performance tools tprof, pprof, netpmon, and filemon.

Starting and Controlling Trace

The trace facility provides three distinct modes of use:

Subcommand Mode: Trace is started with a shell command (trace) and carries on a dialog with the user through subcommands. The workload being traced must be provided by other processes, because the original shell process is in use.
Command Mode: Trace is started with a shell command (trace -a) that includes a flag which specifies that the trace facility is to run asynchronously. The original shell process is free to run ordinary commands, interspersed with trace-control commands.
Application-Controlled Mode: Trace is started with the trcstart() subroutine and controlled by subroutine calls such as trcon() and trcoff() from an application program.

Formatting Trace Data

A general-purpose trace-report facility is provided by the trcrpt command. The report facility provides little data reduction, but converts the raw binary event stream to a readable ASCII listing. Data can be visually extracted by a reader, or tools can be developed to further reduce the data.

The report facility displays text and data for each event according to rules provided in the trace format file. The default trace format file is /etc/trcfmt, which contains a stanza for each event ID. The stanza for the event provides the report facility with formatting rules for that event. This technique allows users to add their own events to programs and insert corresponding event stanzas in the format file to specify how the new events should be formatted.

Viewing Trace Data

When trace data is formatted, all data for a given event is usually placed on a single line. Additional lines may contain explanatory information. Depending on the fields included, the formatted lines can easily exceed 80 characters. It is best to view the reports on an output device that supports 132 columns.