Performance Management Guide

Appendix C. Accessing the Processor Timer

Attempts to measure very small time intervals are often frustrated by the intermittent background activity that is part of the operating system and by the processing time consumed by the system time routines. One approach to solving this problem is to access the processor timer directly to determine the beginning and ending times of measurement intervals, run the measurements repeatedly, and then filter the results to remove periods when an interrupt intervened.

The POWER and POWER2 architectures implement the processor timer as a pair of special-purpose registers. The POWER-based architecture defines a 64-bit register called the TimeBase. Only assembler-language programs can access these registers.

Note: The time measured by the processor timer is the absolute wall-clock time. If an interrupt occurs between accesses to the timer, the calculated duration will include the processing of the interrupt and possibly other processes being dispatched before control is returned to the code being timed. The time from the processor timer is the raw time and should never be used in situations in which it will not be subjected to a reasonableness test.

A pair of library subroutines make access to the TimeBase registers architecture-independent. The subroutines are as follows:

read_real_time(): This subroutine obtains the current time from the appropriate source and stores it as two 32-bit values.
time_base_to_time(): This subroutine ensures that the time values are in seconds and nanoseconds, performing any necessary conversion from the TimeBase format.

The time-acquisition and time-conversion functions are separated in order to minimize the overhead of time acquisition.

The following example shows how these subroutines could be used to measure the elapsed time for a specific piece of code:

#include <stdio.h>
#include <sys/time.h>
 
int main(void) {
   timebasestruct_t start, finish;
   int val = 3;
   int w1, w2;
   double time;
 
   /* get the time before the operation begins */
   read_real_time(&start, TIMEBASE_SZ);
 
   /* begin code to be timed */
   printf("This is a sample line %d \n", val);
   /* end code to be timed   */
 
   /* get the time after the operation is complete
   read_real_time(&finish, TIMEBASE_SZ);
 
   /* call the conversion routines unconditionally, to ensure    */
   /* that both values are in seconds and nanoseconds regardless */
   /* of the hardware platform.                                  */
   time_base_to_time(&start, TIMEBASE_SZ);
   time_base_to_time(&finish, TIMEBASE_SZ);
 
   /* subtract the starting time from the ending time */
   w1 = finish.tb_high - start.tb_high; /* probably zero */
   w2 = finish.tb_low - start.tb_low;
 
   /* if there was a carry from low-order to high-order during  */
   /* the measurement, we may have to undo it.                  */
   if (w2 < 0) {
      w1--;
      w2 += 1000000000;
   }
 
   /* convert the net elapsed time to floating point microseconds */
   time = ((double) w2)/1000.0;
   if (w1 > 0)
      time += ((double) w1)*1000000.0;
 
   printf("Time was %9.3f microseconds \n", time);
   exit(0);
}

To minimize the overhead of calling and returning from the timer routines, the analyst can experiment with binding the benchmark nonshared (see When to Use Dynamic Linking and Static Linking).

If this were a real performance benchmark, we would perform the code to be measured repeatedly. If we timed a number of consecutive repetitions collectively, we could calculate an average time for the operation, but it might include interrupt handling or other extraneous activity. If we timed a number of repetitions individually, we could inspect the individual times for reasonableness, but the overhead of the timing routines would be included in each measurement. It may be desirable to use both techniques and compare the results. In any case, the analyst will want to consider the purpose of the measurements in choosing the method.