Performance Management Guide

Example of Trace Facility Use

The following takes you through an example of a typical trace.

Note

This example is more meaningful if the input file is not already cached in system memory. Choose as the source file any file that is about 50 KB in size and has not been used recently.

Obtaining a Sample Trace File

Trace data accumulates rapidly. Bracket the data collection as closely around the area of interest as possible. One technique for doing this is to issue several commands on the same command line. For example:

# trace -a -k "20e,20f" -o trc_raw ; cp ../bin/track /tmp/junk ; trcstop

captures the execution of the cp command. We have used two features of the trace command. The -k "20e,20f" option suppresses the collection of events from the lockl() and unlockl() functions. These calls are numerous on uniprocessor systems, but not on SMP systems, and add volume to the report without giving us additional information. The -o trc_raw option causes the raw trace output file to be written in our local directory.

Formatting the Sample Trace

We use the following form of the trcrpt command for our report:

# trcrpt -O "exec=on,pid=on" trc_raw > cp.rpt

This reports both the fully qualified name of the file that is run and the process ID that is assigned to it.

The report file shows us that there are numerous VMM page assign and delete events in the trace, like the following sequence:

1B1 ksh            8526          0.003109888       0.162816
         VMM page delete:      V.S=0000.150E ppage=1F7F
             working_storage delete_in_progress process_private computational

1B0 ksh            8526          0.003141376       0.031488
         VMM page assign:      V.S=0000.2F33 ppage=1F7F
             working_storage delete_in_progress process_private computational

We are not interested in this level of VMM activity detail at the moment, so we reformat the trace as follows:

# trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trc_raw > cp.rpt2

The -k "1b0,1b1" option suppresses the unwanted VMM events in the formatted output. It saves us from having to retrace the workload to suppress unwanted events. We could have used the -k function of the trcrpt command instead of that of the trace command to suppress the lockl() and unlockl() events, if we had believed that we might need to look at the lock activity at some point. If we had been interested in only a small set of events, we could have specified -d "hookid1,hookid2" to produce a report with only those events. Because the hook ID is the leftmost column of the report, you can quickly compile a list of hooks to include or exclude. A comprehensive list of trace hook IDs is defined in the /usr/include/sys/trchkid.h file.

Reading a Trace Report

The header of the trace report tells you when and where the trace was taken, as well as the command that was used to produce it:

Thu Oct 28 13:34:05 1999
System: AIX texmex Node: 4
Machine: 000691854C00
Internet Protocol Address: 09359BBB 9.53.155.187
Buffering: Kernel Heap

trace -a -k 20e,20f -o trc_raw

The body of the report, if displayed in a small enough font, looks similar to the following:

ID  PROCESS NAME  PID     ELAPSED_SEC    DELTA_MSEC  APPL  SYSCALL KERNEL  INTERRUPT
101 ksh           8526    0.005833472    0.107008          kfork LR = D0040AF8
101 ksh           7214    0.012820224    0.031744          execve LR = 10015390
134 cp            7214    0.014451456    0.030464          exec:  cmd=cp ../bin/track /tmp/junk pid=7214 tid=24713

In cp.rpt2 you can see the following information:

The fork(), exec(), and page fault activities of the cp process.
The opening of the input file for reading and the creation of the /tmp/junk file
The successive read()/write() system calls to accomplish the copy.
The process cp becoming blocked while waiting for I/O completion, and the wait process being dispatched.
How logical-volume requests are translated to physical-volume requests.
The files are mapped rather than buffered in traditional kernel buffers, and the read accesses cause page faults that must be resolved by the Virtual Memory Manager.
The Virtual Memory Manager senses sequential access and begins to prefetch the file pages.
The size of the prefetch becomes larger as sequential access continues.
When possible, the disk device driver coalesces multiple file requests into one I/O request to the drive.

The trace output looks a little overwhelming at first. This is a good example to use as a learning aid. If you can discern the activities described, you are well on your way to being able to use the trace facility to diagnose system-performance problems.

Filtering of the Trace Report

The full detail of the trace data may not be required. You can choose specific events of interest to be shown. For example, it is sometimes useful to find the number of times a certain event occurred. To answer the question "How many opens occurred in the copy example?" first find the event ID for the open() system call. This can be done as follows:

# trcrpt -j | grep -i open

You should be able to see that event ID 15b is the OPEN SYSTEM CALL event. Now, process the data from the copy example as follows:

# trcrpt -d 15b -O "exec=on" trc_raw

The report is written to standard output, and you can determine the number of open() subroutines that occurred. If you want to see only the open() subroutines that were performed by the cp process, run the report command again using the following:

# trcrpt -d 15b -p cp -O "exec=on" trc_raw