A performance tuning utility for improving execution time and real memory utilization of user-level application programs.
fdpr -p ProgramFile -x WorkloadCommand
fdpr -p ProgramFile [ -M Segnum ] [ -fd Fdesc ] [ -o OutputFile ] [ -armember ArchiveMemberList ] [ OptimizationFlags ] [ -map ] [ -disasm ] [ -profcount ] [ -v ] [ -s [ -1 | -3 ] [ -x WorkloadCommand ]
fdpr -p ProgramFile [ -M Segnum ] [ -fd Fdesc ] [ -o OutputFile ] [ -armember ArchiveMemberList ] [ OptimizationFlags ] [ -map ] [ -disasm ] [ -profcount ] [ -v ] [ -s [ -2 | -12 | -23] [ -x WorkloadCommand ]
[[ -Rn ] | [-R0 | -R1 | -R2 | -R3 ] ] [ -nI ] [ -tb ] [ -pc ] [ -pp ] [ -bt ] [ -toc ] [ -O2 ] [ -O3 ] [ -nop ] [ -opt_fdpr_glue ] [ -inline ] [ -i_resched ] [ -killed_regs ] [ -RD ] [ -full_saved_regs_calls ] [ -trunc_tb ] [ -tocload | -aggressive_tocload ] [ -regs_release ] [ -ret_prologs ] [ -volatile_regs ] [ -propagate ] [ -regs_redo ] [ -ptrgl_opt ] [ -dcbt_opt ]
The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help improve the execution time and the real memory utilization of user-level application programs. The fdpr program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then tcreates a new version of the program that is optimized for that workload. The new program generated by fdpr typically runs faster and uses less real memory.
The fdpr command builds an optimized executable program in 3 distinct phases:
These phases can be run separately or in partial or full combination, but must be run in order (i.e., -1 then -2 then -3 or -12 then -3). The default is to run all three phases.
-Rn | Copies input to output instead
of invoking the optimizer.
Note
The -Rn flag
cannot be used with the -R0, -R1, -R2, or -R3 flags. |
-R0,-R1,-R2, -R3 | Specifies the level of optimization. -R3 is the most aggressive optimization. The default is -R0. See "Optimization" for more information. |
-nI | Do not permit branch reversing at code reordering optimization. |
-tb | Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications using Try & Catch mechanism. |
-pc | Preserve CSECT boundaries. Effective only with -R1/-R3. |
-pp | Preserve procedures' boundaries. Effective only with -R1/-R3. |
-toc | Enable TOC pointer modifications. Effective only with -R0/-R2. |
-bt | Enable branch table modifications. Effective only with -R0/-R2. |
-O2 | Switch on the following less aggressive optimization flags: -nop, -opt_fdpr_glue, -inline, -killed_regs, -RD, -tocload, -volatile_regs, -regs_redo |
-O3 | Switch on all the following optimization flags: -nop, -opt_fdpr_glue, -inline, -i_resched, -killed_regs, -RD, -full_saved_regs_calls, -trunc_tb, -aggressive_tocload, -regs_release, -ret_prologs, -volatile_regs, -propagate, -regs_redo, -dcbt_opt, -ptrgl_opt. |
-inline | Perform inlining of frequently invoked functions. |
-nop | Remove NOP instructions from optimized code. |
-opt_fdpr_glue | Optimize frequently used basic blocks within glue code added by fdpr at code reordering optimization |
-killed_regs | Avoid store instructions for registers within callee functions' prologs that are overwritten by the calling function. |
-regs_release | Eliminate store/restore instructions in the function's prolog/epilog for non-frequently used registers within the function. |
-volatile_regs | Eliminate store/restore instructions in the function's prolog/epilog by using non-used registers within the function. |
-propagate | Propagate store instructions for registers within frequently executed callee functions' prologs, to the rarley executed prolog of the calling function, when possible. |
-regs_redo | Avoid store instructions for registers within callee functions' prologs that can be re-constructed by the calling fucntion on return. |
-full_saved_regs_calls | Extend the -propagate, -regs_redo and -killed_regs optimizations to also eliminate the restore instructions from function's epilogs corresponding to the eliminated prologs' store instructions |
-tocload | Replace an indirect load instruction via the TOC with an add immediate instruction. |
-aggressive_tocload | Perform the tocload optimization and also reduce the TOC size by removing redundant TOC entries. |
-RD | Perform static data reordering in the .data and .bss sections. |
-i_resched | Perform instruction rescheduling. |
-ret_prologs | Optimize functions' prologs which terminate with a conditional branch instruction directly to a function's epilog. |
-trunc_tb | Truncate function names from Traceback Tables. |
-ptrgl_opt | Perform optimization of indirect call instructions via registers by replacing them with direct jumps. |
-dcbt_opt | Insert Data Cache Block Touch (dcbt) instructions into the code to prefetch data at run-time. |
The fdpr command provides four levels of code reordering optimization. The flags -R1, -R2, and -R3 provide the most aggressive code reordering optimization along with the greatest potential speedups. However, in some cases, using these optimization levels may result in an executable which does not behave as expected. Programs which contain assembler code (in particular code which performs dynamic branch calculations) or programs derived from nonstandard compilers are prone to these types of reordering-induced anomalies.
Use of the -R0 or -R2 flags may result in a slightly reduced performance improvement as these flags preserve functionality and debug capability by maintaining the original program code. The reordered code, which represents the highly-executed code paths through the program, is appended to the end of the original program code. Therefore, these options will produce a reordered executable which is typically 20-40% larger than the original program. The -R2 flag also fixes the content of branch tables in the program, according to the new reordered code and, therefore, may result in better performance improvement.
The -R1 and -R3 flags reorder the entire executable code. Only modules, in the code, for which no appropriate compiler signature is found in the symbol table, the entire original module is maintained while only frequently executed parts in them are added to the reordered program. The -R1 flag reorders the entire code while preserving procedures' boundaries and therefore, may result in less performance improvement than the -R3 flag.
Executables built with the -qfdpr IBM xl compiler flag contain information to assist fdpr in producing reordered programs. Modules which are not compiled with the -qfdpr option, are reordered based on the compiler signatures in the symbol table.
Additional performance enhancements may be realized by using static linking when building the program to be reordered. Since the fdpr program only reorders the instructions within the executable program specified, any dynamically linked shared library routines called by the program are not optimized. Statically linking these library routines to the executable allows for optimizing both the instructions in the program and all library routines used by the program. There are other advantages as well as disadvantages to building a statically linked program. See the AIX 5L Version 5.2 Performance Management Guide for further information.
All files created by the fdpr command are stored in the current directory with the exception of any files which may be created by running the workload command specified in the -x flag. During the optimization process, the original program is saved by renaming the program, and is only restored to the original program name upon successful completion of the final phase.
The profile file created by the fdpr command explicitly uses the name of the current directory since scripts used to run the program may change the working directory before executing the program.
The files created and/or used by the fdpr command are:
In order to enable a certain degree of debugging capability for optimized programs, fdpr updates the Symbol Table to reflect the changes that were made in the .text section.
Entry fields in the Symbol Table that specify addresses of symbols that were relocated during the reordering of fdpr, are modified to point to their new addresses in the .text section.
In addition, in the case where functions or files are split during reordering, fdpr creates new entries in the Symbol Table for each new part of the split function/file. These new parts of the same function are given new symbol names in the Symbol Table according to the following naming convention:
<originalfunctionname>_<fdpr | nonfdpr>_<functionspartnumber>
For optimization flags -R0/-R2, which append all new reordered code to the end of the .text section, the suffix string __fdpr_ indicates that the new entry refers to an address in the appended text area whereas the __nonfdpr_ string refers to an address in the original text area. For optimization flags -R1/-R3 all the new entries are suffixed with the __fdpr_ string.
For example: Originally, if a function was split into 3 parts, it would have 3 entries in the Symbol Table; one for each part:
[Index] m Value Scn Aux Sclass Type Name Original Entry: [456] m 0x00000230 2 1 0x02 0x0000 .main Restructured Entries: [456] m 0x00000304 2 1 0x02 0x0000 .main [1447] m 0x00003328 2 1 0x02 0x0000 .main_fdpr_1 [1453] m 0x000033b4 2 1 0x02 0x0000 .main_fdpr_2
The use of the optimization flags -R0 and -R2 result in an executable that has additional information included in the program file for use by the dbx debug program. This additional information allows dbx to provide limited debug support by mapping reordered instruction addresses to their original locations and by maintaining traceback entries in the original text section. The dbx command maps most reordered instruction addresses to the corresponding addresses in the original executable as follows:
0xRRRRRRRR = fdpr[0xYYYYYYYY]
where 0xRRRRRRRR indicates the reordered address and fdpr[0xYYYYYYYY] indicates the original address. Also, dbx uses the traceback entries in the original instruction area to find associated procedure names and during stack traceback. See the "Examples" section for further details.
The following are typical usage examples of the fdpr command.
test2 script file: # code to exercise test1 test1 -expand 100 -root $PATH file.jpg -quit # the end of test2
Run the fdpr command (using the default optimization):
fdpr -p test1 -x test2
This results in the new reordered executable test1.fdpr.
fdpr -s -1 -p test1
This command string renames the original program to __test1.save and creates an instrumented version with the name test1.
To execute phase two and save temporary files:
fdpr -s -2 -p test1 -x test2
This command string executes the script file test2 that runs the instrumented version of test1 to collect the profile data.
To execute phase three, saving temporary files:
fdpr -s -3 -p test1
Again, this results in the new reordered executable test1.fdpr.
fdpr -s -12 -p test1 -x test2
Execute phase three, while saving temporary files and using optimization level three.
fdpr -s -3 -R3 -p test1
dbx program.fdpr
which produces the output similar to the following:
Type 'help' for help. reading symbolic information ...warning: no source compiled with -g [using memory image in core] Segmentation fault in proc_d at 0x10000634 = fdpr[0x10000290] 0x10000634 (???) 98640000 stb r3,0x0(r4) (dbx)
The address mapping information 0x10000634 = fdpr[0x10000290] indicates that the instruction at address 0x10000634 is in the reordered text section and originally resided (in the original program) at address 0x10000290 in proc_d. Running dbx on the original program and using the mapped addresses (0x10000290 in the above example) may provide additional information to aid in debugging.
A stack traceback, which is used to determine how the program arrived at the current location, is produced as follows:
(dbx) where
which produces the following output:
proc_d(0x0) at 0x10000634 proc_c(0x0) at 0x10000604 proc_b(0x0) at 0x100005d0 proc_a(0x0) at 0x1000059c main(0x2, 0x2ff7fba4) at 0x1000055c (dbx)
(dbx) stepi
which produces the following output:
stopped in proc_d at 0x1000061c = fdpr[0x10000278] 0x1000061c (???) 9421ffc0 stwu r1,-64(r1) (dbx)
In this example, dbx indicates that the program stopped in routine proc_d at address 0x1000061c in the reordered text section (originally located at address 0x10000278).
Software Product/Option: AIX Performance Aide/ Local Performance Analysis & Control Commands.
The dbx command.
Restructuring Executable Programs with the fdpr Program in AIX 5L Version 5.2 Performance Management Guide.