tprof [ -m ] [ -v ] { -k [ -e ] | -s | -j Java Class | -p Program | -t Process_ID | [ -x Command ] | [ -i Trace_File [ -n Gennames_File ] [ -C { all | List } ] ]}
Notes:
- One of the -s, -k, -e, -j, or -p flags must be specified to get a profile.
- The -m flag must be specified to enable the microprofiling.
- The -x flag must be the last flag issued on the command line.
- The -i and -x flags may not be specified at the same time.
- The -n and -C flags can only be used in conjunction with the -i flag.
- The Java Classes will not be micro-profiled.
- The -j and -p flags may not be specified at the same time.
The tprof command reports CPU usage for individual programs and the system as a whole. This command is a useful tool for anyone with a JAVA, C, C++, or FORTRAN program that might be CPU-bound and who wants to know which sections of the program are most heavily using the CPU. The tprof command also reports the fraction of time the CPU is idle. These reports can be useful in determining CPU usage in a global sense.
The tprof command specifies the user program to be profiled, executes the user program, and then produces a set of files containing reports.
In the operating system, an interrupt occurs periodically to allow a "housekeeping" kernel routine to run. This housekeeping occurs 100 times per second. When the tprof command is invoked, the housekeeping kernel routine records the process ID and the address of the instruction executing when the interrupt occurred. With both the instruction address and process ID, the tprof analysis routines can charge CPU time to processes and threads, to subprograms, and even to source lines of programs. Charging CPU time to source program lines is called microprofiling.
Note: Microprofiling can only be performed on code compiled with the -g flag. It is also necessary for the source code to reside in the working directory, which is the directory in which the tprof command is invoked. The proftest working directory is used in the examples for this command.
For subroutine-level profiling, the tprof command can be run without modifying the executable program. You do not have to recompile with special compiler flags or linker options. This means you can obtain a subroutine profile of any executable module that has already been built. However, as previously stated, recompilation is required to get a microprofile.
The files containing the reports are left in the working directory. All files created by the tprof command are prefixed by ___ (two underscores). Prefixing with underscores allows the tprof command user to easily identify all tprof-created files. It is recommended that a working directory be created specifically for microprofiling and that copies of executables and source files be placed (or linked) in that directory. In the examples, the working directory is proftest.
In its simplest form, the tprof command is issued as follows:
tprof -x program
When using the -x Command flag, one of the -s, -k, -j, or -p flags must be specified to get the specific profiling report. If none of these flags are specified, only the summary report is produced. Using the -t flag limits the report to information about the specified process, which is named in the Process_Id parameter.
The -i Trace_File flag allows for offline processing by tprof of trace data files created by the system trace command. The -n flag allows you to specify a Gennames_File to be used when processing an offline file. These flags are useful when it is necessary to post-process a trace file from a remote machine or perform the trace data collection at one time and post-process it at another time. It is very important that the gennames file is created immediately after the trace command has completed. The flags are also useful when system loading is high and trace hooks are being missed by tprof. The offline option relieves this problem. Trace hooks relevant to tprof must be collected by the trace command and are specified by the trace -J flag. The gennames command is then executed to collect mapping information for tprof. Once trace and gennames have executed, trcrpt -r must be executed on the trace logfile and redirected to another file. At this point an adjusted trace logfile and a Gennames_File is fed into tprof.
Example:
trace -af -T 10000000 -L 10000000 -o trace.out -j 234,106,10C,134,139,00A,465
(The program is either already running or has begun.)
trcoff
gennames > gennames.out
trcstop
trcrpt -r trace.out > trace.rpt
Then invoke the tprof command with at least the -i and -n flags:
tprof -i trace.rpt -n gennames.out -s -k -e
The trace hook numbers above are defined as follows:
HKWD_TRACE_SYNC 0x000 HKWD_TRACE_TRCON 0x001 HKWD_TRACE_TRCOFF 0x002 HKWD_TRACE_HEADER 0x003 HKWD_TRACE_LWRAP 0x005 HKWD_TRACE_TWRAP 0x006 HKWD_KERN_PROF 0x234 HKWD_KERN_DISPATCH 0x106 HKWD_KERN_IDLE 0x10C HKWD_SYSC_EXECVE 0x134 HKWD_SYSC_FORK 0x139 HKWD_TRACE_UTIL 0x00A HKWD_SYSC_CRTHREAD 0x465
Note: Review the /usr/lpp/perfagent/README.perfagent.tools file for the latest information on changes to the performance analysis tools.
If multiple CPU trace files are created by the trace daemon, the base name of the file should be entered after the -i flag. The -C flag could be used to specify the CPUs to run the tprof command against and to generate per CPU tprof command reports. When the -i trc and -C all are specified together and there are 5 CPU trace files, the tprof command processes the trace files trc-0, trc-1, trc-2, trc-3, and trc-4.
Consider the following example:
tprof -i trc -n gennames_file -C all tprof -i trc -n gennames_file -C 0,1,2,3,4
In the case where multiple CPU IDs are specified, an individual report file is created for each trace file and a summary file is produced. In conjunction with the -i and -C flags, the -n flag must be used. This allows you to specify a Gennames file to be used when processing offline files. This is important because Gennames files save program loading information that may change over time.
The following is an example of a 2 CPU system:
trace -C all -af -T 10000000 -L 10000000 -o trace.out -j 234,106,10C,134,139,00A,465
(The program is either already running or has begun.)
trcoff
gennames > gennames.out
trcstop
/* A trcrpt -r must be run on each trace file created including the anchor file */ trcrpt -r trace.out > trace.rpt trcrpt -r trace.out-0 > trace.rpt-0 trcrpt -r trace.out-1 > trace.rpt-1 tprof -C all -i trace.rpt -n gennames.out -ske
A summary report with the suffix .all is always produced. If no program is specified, the report is named __prof.all. This report contains an estimate of the amount of CPU time spent in each process that was executing while the tprof command was monitoring the system. If the sample program is specified, the report is named __sample.all. This report contains an estimate of the amount of CPU time spent in the sample program that was executing while the tprof command was monitoring the system. This report also contains an estimate of the amount of CPU time spent in each subprogram of the sample program. The summary report shows the amount of time the CPU was idle as well as the amount of time spent in the kernel. The tprof command reports CPU time in ticks, where 100 ticks equals 1 second.
Example 1 Report A shows an example summary report. The first section lists the CPU time estimates for each process. The second section of the report summarizes the first by combining the reports for multiple occurrences of each process into a single line.
A third section of the report is produced only if there are User ticks associated with the program being profiled. This section of the report notes the subprogram names and the number of ticks they collected. The source file from which the subprogram was compiled is also given as well as the length, in bytes, of the subprogram.
When microprofiling occurs, two more reports are produced only if there are User ticks associated with the program being microprofiled. The two reports are left as additional files in the working directory by the tprof command:
Example 2 Reports A and B are the result of microprofiling an example program.
As previously stated, if microprofiling is desired, copies of the source files (usually suffixed with .c, .h, or .f) must be placed in the working directory. Additionally, executable files which need to be microprofiled, should be compiled with the -g flag.
When microprofiling occurs, there is one __t.routine_source file for each routine in each source file compiled with the -g flag. The source line number and number of ticks associated with that source line are to the left of the microprofile text. Count totals for each routine are at the end of the function. Counts should not be associated with a comment line, data declaration, or macro definition.
Note: tprof cannot report correct source line information if a .c file is included in another .c file.
The tprof command uses the following tools to create reports:
When you run executable modules built with shared libraries, the tprof command reports an estimate of the aggregate CPU time spent in shared code. When invoked with the -s flag, the tprof command resolves the time spent in shared libraries to the subprogram level.
It provides a report on CPU time spent in shared libraries, and resolves that time to the subprogram level for each shared library incurring ticks.
Note: The -v flag with the -m flag turns on verbose microprofiling which lists the files that cannot be microprofiled.
Using the information gathered in this report, it is possible to microprofile a single shared library subprogram to the source-statement level. If microprofiling of shared libraries is desired, a copy of the associated source file must be placed in the current working directory in which the tprof command is invoked. The object file needing to be microprofiled should be compiled with the -g flag and re-archived into the library it belongs to or compile the whole library with -g flag. The re-compiled or re-archived library should be placed in its original path.
The degree to which CPU activity can be resolved is determined by the number of samples captured and the degree to which "hot spots" dominate. While a program with a few "hot spots" can be profiled with relatively few samples, less-frequently executed sections of the program will not be visible in the profiling reports unless more samples are captured. In cases where user programs run less than a minute, there may be insufficient resolution to have a high degree of confidence in the estimates.
A simple solution is to repeatedly execute the user program or script until you achieve the degree of resolution you need. The longer a program is run, the finer the degree of resolution of the profile. If you doubt the accuracy of a profile, run the tprof command several times and compare the resulting profiles.
In order to profile a program or script that is executed multiple times using different sets of arguments, you must invoke the tprof command using a script that executes the desired program accordingly. The following is an example of a script called lin5 that executes the lin program five times:
./lin; ./lin; ./lin; ./lin; ./lin # execute program lin 5 times
The tprof command is then invoked with:
tprof -p ./lin -x ./lin5
Another example can be found in the script DOIT, which executes the lin program n times (where n is an argument to DOIT).
The most significant difference between these three commands is that the tprof command provides microprofiling, while the gprof and prof commands do not.
The prof and gprof commands are standard, supported profiling tools on many UNIX systems, including this operating system. Both the prof and gprof commands provide subprogram profiling and exact counts of the number of times every subprogram is called. The gprof command also provides a very useful "call graph" showing the number of times each subprogram was called by a specific parent and the number of times each subprogram called a child. The tprof command provides neither subprogram call counts nor call graph information.
Like the tprof command, both the prof and gprof commands obtain their CPU consumption estimates for each subprogram by sampling the address counter of the user program at the rate of 100 times per second.
If only subprogram-level profiling is desired, the prof and gprof commands require users to relink their programs using a special linker flag. In order to get subprogram call counts, both the prof and gprof commands require users to both recompile and relink their programs by using the appropriate compile or link flag (-p or -pg, respectively). This may prove to be an impediment to users who do not have the ability to recompile or relink applications that need to be profiled. The tprof command requires recompilation (with the -g flag) only if source statement profiling is required. The tprof command will profile subprograms in any C or FORTRAN executable without recompilation.
The CPU times of user programs relinked for the prof command are the same as the run times of the unprofiled programs. The CPU times of user programs recompiled and relinked for the gprof command are increased by as little as a few percent to a factor of three or more. The CPU times of user programs profiled under the tprof command are the same as the run times of the unprofiled programs.
Also, the tprof command provides a CPU usage summary of all processes active during the execution of the profiled user program, which neither the prof nor gprof command provides.
Notes: CPU activity is sampled at a rate of 100 samples per second. This means that estimates of CPU consumption for each subprogram may not be sufficiently accurate for short-running programs. However, the accuracy is sufficient for programs and processes that run several minutes or more. The system trace facility is used by the tprof command. Only one user can be using the system trace facility at a time. Thus, only one tprof command can be active in the system at a time. The tprof command is a CPU-activity profiler. It does not profile other system resources, such as disks or memory.
Note: The following C program does nothing useful. It simply wastes CPU cycles. The program is used only for illustrative purposes.
/* Array Incrementer -- Version 1 */ #include <stdlib.h> #define Asize 1024 #define RowDim InnerIndex #define ColDim OuterIndex main() { int Increment; int OuterIndex; int InnerIndex; int big [Asize][Asize]; /* Initialize every byte of the array to 0x01 */ for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) { for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) big [RowDim][ColDim] = 0x01010101; } Increment = rand(); /* Increment every element in the array */ for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) { for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) { big [RowDim][ColDim] += Increment; if (big [RowDim][ColDim] <0) printf("negative number. %d\n", big[RowDim][Coldim]); } } printf("version 1 check num: %d\n", big[rand()%Asize][rand()%Asize]); return(0); }
In this example, version1.c is the source file and version1 will be the program. The source program, version1.c, already exists in a directory called proftest.
cd proftest
xlc -o version1 version1.c
tprof -p ./version1 -x ./version1
An example of the__version1.all file is found in Example 1 Report. Of course, the __version1.all file that you create is certain to be different from the example __version1.all reported shown here.
Process PID TID Total Kernel User Shared Other ======= === === ===== ====== ==== ====== ===== ./version1 4524 5313 287 8 278 1 0 ======= === === ===== ====== ==== ====== ===== Total 287 8 278 1 0 Process FREQ Total Kernel User Shared Other ======= === ===== ====== ==== ====== ===== ./version1 1 287 8 278 1 0 ======= === ===== ====== ==== ====== ===== Total 1 287 8 278 1 0 Total Ticks For ./version1 (USER) = 278 Subroutine Ticks % Source Address Bytes ============= ===== ==== ======== ======== ====== .main 278 60.4 version1.c 10000340 218
Process names and process identification (PID) numbers can be reused during normal operation. When a process forks another process, the forked process inherits the process name of the original process. When a process execs another process, the execed process inherits the PID of the original process.
The first section of the report shows the number of ticks consumed by, or on behalf of, the specified process and its thread (if no process is specified, then this section shows all processes that is running during tprof monitoring). The columns give tick counts for Kernel Code, User Code, and Shared Code. The name of the program is version1, so the version1 process (with process ID number 4524) is the process being profiled. This process/program collected a total of 287 ticks: 8 ticks for services performed for it in the kernel's address space, 278 ticks in the user's address space, and 1 ticks in the shared library space. Other is a catchall category that normally is 0.
The second section of the report summarizes the results by program, regardless of process ID. It shows the number (FREQ) of different processes that ran each program at some point.
The third section of the report shows the number of ticks used by each function in the executable and the percentage of the total runs' CPU ticks that each function's ticks represent.
Note: If there are no ticks in the User column for the program being profiled, then the third section of the report is not produced.
Sampling occurs at the rate of 100 per second. Each tick corresponds to 1/100 sec or 10 msec. Thus, in Example 1 Report A, version1 is estimated to have consumed 2.87 seconds: 2.78 in user space, and 0.08 seconds in kernel (system) space.
The example of profiling contains summary information at the subprogram level. In this example, we provide CPU-usage profiling at the microprofiling level. Additional information must be provided to the tprof command in the working directory in order to make the correspondences between execution-time estimates and source statements:
tprof -mp ./version1 -x ./version1 # invoke the profiler, telling # it to execute and profile "version1"
The output file__version1.all should be nearly identical to that obtained from the previous steps. Of course, the report that summarizes all processes active during the execution of the version1 program will change. You will also find the __t.main_version1.c file, which contains profiled source for main in source, and the __h.version1.c file, which contains the "hottest" lines in the source.
Note: If there are no ticks in the User column in the __program.all file for the program being profiled, then the __t.routine_source and __h.source files are not produced.
Example 2 Reports A and B contain results from the microprofiling example shown above. Example 2 Report A shows the contents of the __t.main_version1.c file. Example 2 Report B shows the contents of the __h.version1.c file.
Ticks Profile for main in ./version1.c Line Ticks Source 13 1 for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) 14 - { 15 17 for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) 16 105 big [RowDim][ColDim] = 0x01010101; 17 - } 18 - Increment = rand(); 19 - /* Increment every element in the array */ 20 - for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) 21 - { 22 21 for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) 23 - { 24 30 big [RowDim][ColDim] += Increment; 25 104 if (big [RowDim][ColDim] <0) 26 - printf("negative number. %d\n", big[RowDim][ColDim ]); 27 - } 28 - } 29 - printf("version 1 check num: %d\n", big[rand()%Asize][rand( )%Asize]); 30 - return(0); 31 - } 278 Total Ticks for main in ./version1.c
Hot Line Profile for ./version1.c Line Ticks 16 105 25 104 24 30 22 21 15 17 13 1
This example uses two FORTRAN source files, linless.f and daxpy.f, which already exist in a directory called proftest.
cd proftest
xlf -o lin2 linless.f daxpy.f
tprof -p ./lin2 -x ./lin2
Sometimes you want to profile a script of sequential executions of programs and shell commands rather than a single program. In this example, the DOIT script runs a program n times, with both the program and n specified as arguments in the script.
# DOIT Shell Script #!/usr/bin/ksh i=$2 command=$1 while [ $i -gt 0 ] do let i=i-1 $command done
For example, to run the program lin three times, enter:
DOIT ./lin 3
In this example, the DOIT script and the FORTRAN source file called lin.f, already exists in a directory called proftest.
cd proftest
xlf -g -o lin lin.f
tprof -p ./lin -x DOIT ./lin 3
This provides statement-level profiling
This example uses a Java Class named myclass. This class exists in the proftest directory, which is already included in the CLASSPATH.
Example Java Program myclass.java: public class myclass { public static void main(String args[]) { int i; for(i=0;i < 100;i++) { } System.out.print("Hello World\n"); } }
cd proftest
javac myclass.java
tprof -j myclass -x java -Xrunjpa myclass
Note: The tprof java profiler agent always generates a trace file no matter whether you run the java application via tprof or independently (with -Xrunjpa). tprof uses this trace file to profile the java classes in both real time and offline mode. When starting the java application, make sure to specify the -Xrunjpa. For huge java applications, the applications should be started with a -noclassgc flag, which disables the garbage collection of classes. If this flag is not used, then tprof profiling information for java classes may not be accurate.
An example of the generated __myclass.all is shown below:
Process PID TID Total Kernel User Shared Other Java ======= === === ===== ====== ==== ====== ===== ==== java 23852 18985 343 28 0 171 0 140 java 23852 18985 1 1 0 0 0 0 ======= === === ===== ====== ==== ====== ===== ==== Total 344 29 0 171 0 140 Segment :: 2 21 Process FREQ Total Kernel User Shared Other Java ======= ==== ===== ====== ==== ====== ===== ==== java 2 344 29 0 171 0 140 ======= ==== ===== ====== ==== ====== ===== ==== Total 2 344 29 0 171 0 140 Total System Ticks: 649 (used to calculate function level CPU) Total JAVA = 144 (ticks accumulated in Java Segment) Total ticks for myclass (JAVA) = 140 Class Name Ticks % Source Class ID ========== ===== ==== ====== ======== myclass 140 21.6 myclass.java 3008a750 Profile: myclass ( myclass.java ) Method Name Ticks % Method ID Load Addr Size =========== ===== ==== ========= ========= ==== main [([Ljava/lng/String;)V] 140 21.6 34b09edc 345b2a2c 32c
Note: All the method names are displayed with their signatures in brackets ([]).
If your system displays the following message:
/dev/systrace: device busy or trcon: TRCON:no such device
This means the trace facility is already in use. Stop your program and try again after typing trcstop, stops the trace.
Note: This command is valid on both the POWER-based platform and the Itanium-based platform. And it can read both types of tracefiles, 64-bit and 32-bit, regardless of the OS (32-bit or 64-bit).
The cc command, gprof command, prof command, stripnm command, and trace daemon.