[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Commands Reference, Volume 2


fdpr Command

Purpose

A performance tuning utility for improving execution time and real memory utilization of user-level application programs.

Syntax

Most Common Usage:

fdpr -p ProgramName -x Command

Use with Phases 1 and 3 Flags:

fdpr -p ProgramName [ -armember ArchiveMemberList ] [ -M Segnum ] [ -o OutputFile ] [ -nI ] [ -tb ] [ -pc ] [ -pp ] [ -toc ] [ -bt ] [ -disasm ] [ -profcount ] [ -map ] [ [ -03 ] [ -nop ] [ -opt-fdpr-glue ] [ -inline ] [ -i_resched ] [ -killed_regs ] [ -RD ] [ -tocload | -aggressive_tocload ] [ -regs_release ] [ -ret_prologs ] ] [[ -Rn ]| [ -R0 | -R1 | -R2 | -R3 ]] [ -v ] -s [ -1 | -3 ] [ -x Command ]

Use With Phase 2 Flag:

fdpr -p ProgramName [ -armember ArchiveMemberList ] [ -M Segnum ] [ -o OutputFile ] [ -nI ] [ -tb ] [ -pc ] [ -pp ] [ -toc ] [ -bt ] [ -disasm ] [ -profcount ] [ -map ] [ [ -03 ] [ -nop ] [ -opt_fdpr_glue ] [ -inline ] [ -i_resched ] [ -killed_regs ] [ -RD ] [ -tocload | -aggressive_tocload ] [ -regs_release ] [ -ret_prologs ] ] [[ -Rn ]| [ -R0 | -R1 | -R2 | -R3 ]] [ -v ] [ -s [ -2 |-12|-23] -x Command

Description

The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help improve the execution time and the real memory utilization of user-level application programs. The fdpr program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then the program creates a new version that is optimized for that workload. The new program generated by fdpr typically runs faster and uses less real memory.

Attention: The fdpr command applies advanced optimization techniques to a program that may result in programs that do not behave as expected; programs that are reordered using this tool should be used with due caution and should be rigorously retested with, at a minimum, the same test suite used to test the original program in order to verify expected functionality. The reordered program is not supported.

The fdpr command builds an optimized executable program in 3 distinct phases:

These phases can be run separately or in partial or full combination, but must be run in order (i.e., -1 then -2 then -3 or -12 then -3). The default is to run all three phases.

Note: The instrumented executable, created in phase 1 and run in phase 2, typically runs several times slower than the original program. Due to the increased execution time required by the instrumented program, the executable should be invoked in such a way as to minimize execution duration, while still fully exercising the desired code areas. The fdpr command user should also attempt to eliminate, where feasible, any time dependent aspects of the program.

Flags


-1, -2, -3 Specifies the phase to run. The default is all 3 phases (-123). The -s flag must be used when running separate phases so that the succeeding phases can access the required intermediate files. The phases must be run in order (for example, -1, then -2, then -3, or -1, then -23).
-M SegNum Specifies where to map shared memory for profiling. The default is 0x30000000. Specify an alternate shared memory address if the program to be reordered or any of the command strings invoked with the -x flag use conflicting shared-memory addresses. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000).
-nI Does not permit branch reversing.
-o OutFile Specifies the name of the output file from the optimizer. The default is program.fdpr
-p ProgramName Contains the name of the executable program file or shared object file or shared library containing shared objects/executables, to optimize. This program must be an unstripped executable.
-armember ArchiveMemberList Lists archive members to be optimized, within a shared archive file specified by the -p flag. If -armember is not specified, all members of the archive file are optimized. The entries in ArchiveMemberList should be separated by spaces.
-Rn Copies input to output instead of invoking the optimizer.

Note: The -Rn flag cannot be used with the -R0, -R1, -R2, or -R3 flags.
-R0,-R1,-R2, -R3 Specifies the level of optimization. -R3 is the most aggressive optimization. The default is -R0. See "Optimization" for more information.
-tb Forces the restructuring of traceback tables in reordered code. If -tb is omitted, traceback tables are automatically included only for C++ applications using Try and Catch mechanism.
-pc Preserve CSECT boundaries. Effective only with -R1/-R3.
-pp Preserve procedures' boundaries. Effective only with -R1/-R3.
-toc Enable TOC pointer modifications. Effective only with -R0/-R2.
-bt Enable branch table modifications. Effective only with -R0/-R2.
-03 Switches on all optimization flags.
-inline Perform inlining of Hot functions.
-nop Remove NOP instructions from reordered code.
-opt_fdpr_glue Optimize Hot BBs in FDPR Glue during code reordering.
-killed_regs Avoids store instructions for registers within callee functions' prologs that are later on killed by the calling function.
-regs_release Eliminates store/restore instructions in the function's prolog/epilog for non-frequently used registers within the function.
-tocload Replaces an indirect load instruction via the TOC with an add immediate instruction.
-aggressive_tocload Performs the tocload optimization and also reduce the TOC size by removing redundant TOC entries.
-RD Performs static data reordering in the .data and .bss sections.
-i_resched Performs instruction rescheduling after code reordering.
-ret_prologs Optimizes functions prologs which terminate with a conditional branch instruction directly to the function's epilog.
-map Print a map of basic blocks with their respective old -> new addresses into a suffixed .map file.
-disasm Print the disassembled version of the input program into a suffixed .dis file.
-profcount Print the profiling counters into a suffixed .counters file.
-s Specifies that temporary files created by the fdpr command cannot be removed. This flag must be used when running fdpr in separate phases.
-v Contains verbose output.
-x Command Specifies the command used for invoking the instrumented program. All the arguments after the -x flag are used for the invocation. The -x flag is required when the -s flag is used with the -2 flag.

Optimization

The fdpr command provides four levels of optimization. The flags -R1, -R2, and -R3 provide the most aggressive optimization along with the greatest potential speedups. However, in some cases, using these optimization levels may result in an executable that does not behave as expected. Programs that contain assembler code (in particular, code that performs dynamic branch calculations) or programs derived from nonstandard compilers are prone to these types of reordering-induced anomalies. In addition, the -R1 and -R3 flags produce executables that do not include debug information and are therefore not supported by the dbx command.

Use of the -R0 flag can result in a slightly reduced performance improvement as this flag attempts to preserve functionality and debug capability by maintaining the original program structure and by eliminating branch table and function descriptor pointer adjustments. Functional errors are much less likely, though still possible. Also, this option produces a reordered executable that is typically 20-40% larger than the original program.

Both the -R0 and -R2 flags utilize a program-reordering technique in which the original structure of the program, including traceback entries, is preserved. The reordered code, which represents the highly-executed code paths through the program, is appended to the end of the executable. This technique provides near optimal performance improvement by allowing global code reordering (independent of procedure boundaries and absent of interleaved traceback entries) while preserving the debug capability. In addition, program functionality is maintained for a larger class of programs using the original program structure as a "safety net" to catch undetected and/or unmodified dynamically (runtime) computed branch instructions). The -R2 flag attempts to fix all dynamically computed branches that branch to moved code. However, for some programs (especially assembler programs), it is difficult to correctly identify these dynamic branches and using the -R2 flag for this class of programs can result in unexpected functional errors. Also, reordering programs that utilize any form of self-modifying code will probably result in unexpected functional errors.

Executables built with the -qfdpr compiler flag contain information to assist fdpr in producing reordered programs with guaranteed functionality. When this compiler flag is used, the functionality advantage of the fdpr option -R0 is extended to options -R1, -R2, and -R3. However, if -qfdpr is used, only those object modules built with this flag are reordered. If the -qfdpr flag is used, it should be used for all object modules in a program. Static linking will not improve performance if the -qfdpr flag is used.

Additional performance enhancements can be realized by using static linking when building the program to be reordered. Since the fdpr program only reorders the instructions within the executable program specified, any dynamically linked shared library routines called by the program are not reordered. Statically linking these library routines to the executable allows for reordering both the instructions in the program and all library routines used by the program. There are other advantages as well as disadvantages to building a statically linked program. See the AIX 5L Version 5.1 Performance Management Guide for further information.

Output Files

All files created by the fdpr command are stored in the current directory with the exception of any files that may be created by running the command specified in the -x flag. During the optimization process, the original program is saved by renaming the program, and is only restored to the original program name upon successful completion of the final phase.

The profile file created by the fdpr command explicitly uses the name of the current directory since scripts used to run the program may change the working directory before executing the program.

The files created and/or used by the fdpr command are:

program Name of the unstripped executable to be optimized.
__program.save Saved version of the original executable program.
__program.save.histo Intermediate file.
__program.save.bt Intermediate file.
__program.prof Name of the profile file.
__program.instr Name of the instrumented version of program.
__program.save.dis Name of the default disassembly file produced by the -disasm flag.
__program.save.map Name of the default mapping file produced by the -map flag.
__program.save.counters Name of the default profile counters file produced by the -profcount flag.
program.fdpr Default name of optimized executable output file.

Enhanced Debugging Capabilities

In order to enable a certain degree of debugging capability for optimized programs, fdpr updates the Symbol Table to reflect the changes that were made in the .text section.

Entry fields in the Symbol Table that specify addresses of symbols that were relocated during the reordering of fdpr, are modified to point to their new addresses in the .text section.

In addition, in the case where functions or files are split during reordering, fdpr creates new entries in the Symbol Table for each new part of the split function/file. These new parts of the same function are given new symbol names in the Symbol Table according to the following naming convention:

[ ] [fdpr|orig

For optimization flags -R0/-R2, which append all new reordered code to the end of the .text section, the suffix string [fdpr] indicates that the new entry refers to an address in the appended text area whereas the [orig] string refers to an address in the original text area. For optimization flags -R1/-R3 all the new entries are suffixed with the [fdpr] string.

For example: Originally, if a function was split into 3 parts, it would have 3 entries in the Symbol Table; one for each part:

  [Index] m   Value       Scn     Aux   Sclass    Type    Name
Original Entry:
   [456]  m  0x00000230    2       1     0x02    0x0000   .main
 
Restructured Entries:
   [456]  m  0x00000304    2       1     0x02    0x0000   .main
  [1447]  m  0x00003328    2       1     0x02    0x0000   .main[1] [fdpr]
  [1453]  m  0x000033b4    2       1     0x02    0x0000   .main[2] [fdpr]

Debug Support

The use of the optimization flags -R0 and -R2 results in an executable that has additional information included in the program file for use by the dbx debug program. This additional information allows dbx to provide limited debug support by mapping reordered instruction addresses to their original locations and by maintaining traceback entries in the original text section. The dbx command maps most reordered instruction addresses to the corresponding addresses in the original executable as follows:

0xRRRRRRRR = fdpr[0xYYYYYYYY]

where 0xRRRRRRRR indicates the reordered address and fdpr[0xYYYYYYYY] indicates the original address. Also, dbx uses the traceback entries in the original instruction area to find associated procedure names and during stack traceback. See the "Examples" section for further details.

Examples

The following are typical usage examples of the fdpr command.

  1. This example allows the user to run all three phases. In this example, test1 is the unstripped executable and test2 is a shell script that invokes test1. The current working directory is /tmp/fdpr.

    test2 script file:
     
    # code to exercise test1
    test1 -expand 100 -root $PATH file.jpg -quit
    # the end of test2
    

    Run the fdpr command (using the default optimization):

    fdpr -p test1 -x test2
    

    This results in the new reordered executable test1.fdpr.

  2. To run one phase at a time, execute phase one of fdpr and save the necessary temporary files.

    fdpr -s -1 -p test1
    

    This command string renames the original program to __test1.save and creates an instrumented version with the name test1.

    To execute phase two and save temporary files:

    fdpr -s -2 -p test1 -x test2
    

    This command string executes the script file test2 that runs the instrumented version of test1 to collect the profile data.

    To execute phase three, saving temporary files:

    fdpr -s -3 -p test1
    

    Again, this results in the new reordered executable test1.fdpr.

  3. To run the first two phases followed by phase three, execute phase one and two, saving temporary files.

    fdpr -s -12 -p test1 -x test2
    

    Execute phase three, while saving temporary files and using optimization level three.

    fdpr -s -3 -R3 -p test1
    
  4. If an error occurs while running an fdpr reordered program built with the -R0 or -R2 optimization flags, the dbx command can be used to determine what procedure the error occurred in (either in the original text section or in the reordered text section) as follows:

    dbx program.fdpr
    

    which produces the output similar to the following:

    Type 'help' for help.
    reading symbolic information ...warning: no source compiled with -g
     
    [using memory image in core]
     
    Segmentation fault in proc_d at 0x10000634 = fdpr[0x10000290]
    0x10000634 (???) 98640000        stb   r3,0x0(r4)
    (dbx)
    

    The address mapping information 0x10000634 = fdpr[0x10000290] indicates that the instruction at address 0x10000634 is in the reordered text section and originally resided (in the original program) at address 0x10000290 in proc_d. Running dbx on the original program and using the mapped addresses (0x10000290 in the above example) may provide additional information to aid in debugging.

    A stack traceback, which is used to determine how the program arrived at the current location, is produced as follows:

    (dbx) where
    

    which produces the following output:

    proc_d(0x0) at 0x10000634
    proc_c(0x0) at 0x10000604
    proc_b(0x0) at 0x100005d0
    proc_a(0x0) at 0x1000059c
    main(0x2, 0x2ff7fba4) at 0x1000055c
    (dbx)
    
  5. The dbx subcommand stepi may also be used to single step through the instructions of a reordered executable program as follows:

    (dbx) stepi
    

    which produces the following output:

    stopped in proc_d at 0x1000061c = fdpr[0x10000278]
    0x1000061c (???) 9421ffc0       stwu   r1,-64(r1)
    (dbx)
    

    In this example, dbx indicates that the program stopped in routine proc_d at address 0x1000061c in the reordered text section (originally located at address 0x10000278).

Files


/usr/bin/fdpr Contains the fdpr command.
program Name of the unstripped executable to be optimized.
__program.save Saved version of the original executable program.
__program.save.histo Intermediate file.
__program.save.bt Intermediate file.
__program.prof Name of the profile file.
__program.instr Name of the instrumented version of program.
__program.save.dis Name of the default disassembly file produced by the -disasm flag.
__program.save.map Name of the default mapping file produced by the -map flag.
__program.save.counters Name of the default profile counters file produced by the -profcount flag.
program.fdpr Default name of optimized executable output file.

Related Information

The dbx command.

Restructuring Executable Programs with the fdpr Program in AIX 5L Version 5.1 Performance Management Guide.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]