In profile-based optimization (PBO), the compiler
and linker work together to optimize an application based on profile data
obtained from running the application on a typical input data set. For
instance, if certain procedures call each other frequently, the linker
can place them close together in the a.out file, resulting
in fewer instruction cache misses, TLB misses, and memory page faults
when the program runs. Similar optimizations can be done at the basic
block levels of a procedure. Profile data is also used by the
compiler for other general tasks, such as code scheduling and register
allocation.
General Information about PBO
Using PBO
NoteThe compiler interface to PBO is currently supported only by the
C, C++, and FORTRAN compilers.
When to Use PBO
Profile-Based Optimization must be the last level of optimization you use when building
an application. As with other optimizations, it must be performed after
an application has been completely debugged.
Most applications benefit from PBO. The two types of applications
that benefit the most from PBO are:
Applications
that exhibit poor instruction memory locality. These are usually
large applications in which the most common paths of execution are spread
across multiple compilation units. The loops in these applications typically
contain large numbers of statements, procedure calls, or both.
Applications
that are branch-intensive. The operations performed in such applications
are highly dependent on the input data. User interface managers, database
managers, editors, and compilers are examples of such applications.
The best way to determine whether PBO improves an
application's performance is to try it.
NoteUnder some conditions, PBO is incompatible with programs that explicitly
load shared libraries. Specifically, PBO does not function properly if
the shl_load routine has either the BIND_FIRST
or the BIND_NOSTART flags set. For more information about
explicit loading of shared libraries, see The
shl_load and cxxshl_load Routines .
How to Use PBO
Profile-based optimization involves these steps:
Instrument
the application - prepare the application so that it generates profile
data.
Profile
the application - create profile data that can be used to optimize the
application.
-
Optimize the application
- generate optimized code based on the profile data.
A Simple Example
Suppose you want to apply PBO to an application called sample .
The application is built from a C source file sample.c . Discussed
below are the steps involved in optimizing the application.
Step 1 Instrument
First, compile the application for instrumentation and level 2 optimization:
$ cc -v -c +I -O sample.c
/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
$ cc -v -o sample.inst +I -O sample.o
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main \
-o sample.inst sample.o -I -lc
At this point, you have an instrumented program called sample.inst .
Step 2 Profile
Assume you have two representative input files to use for profiling,
input.file1 and input.file2 . Now execute the
following three commands:
$ sample.inst < input.file1
$ sample.inst < input.file2
$ mv flow.data sample.data
The first invocation of sample.inst creates the flow.data
file and places an entry for that executable file in the data file. The
second invocation increments the counters for sample.inst
in the flow.data file. The third command moves the flow.data
file to a file named sample.data .
Step 3 Optimize
To perform profile based optimizations on this application, relink
the program as follows:
$ cc -v -o sample.opt +P +pgm sample.inst \
+df sample.data sample.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o sample.opt \
+pgm sample.inst +df sample.data sample.o -P -lc
Note that it is not necessary to recompile the source file. The
+pgm option is used because the executable name used during
instrumentation, sample.inst , does not match the current
output file name, sample.opt . The +df option
is necessary because the profile database file for the program has been
moved from flow.data to sample.data .
Instrumenting (+I/-I)
Although you can use the linker alone to perform PBO, the best optimizations
result if you use the compiler as well; this section describes this approach.
To instrument an application (with C, C++, and FORTRAN), compile
the source with the +I compiler command line option. This
causes the compiler to generate a .o file containing intermediate
code, rather than the usual object code. (Intermediate code
is a representation of your code that is lower-level than the source code,
but higher level than the object code.) A file containing such intermediate
code is referred to as an I-SOM file.
After creating an I-SOM file for each source file, the compiler
invokes the linker as follows:
In 32-bit mode, instead of using the startup
file /usr/ccs/lib/crt0.o , the compiler specifies a special
startup file named /opt/langtools/lib/icrt0.o . When building
a shared library, the compiler uses /usr/ccs/lib/scrt0.o .
In 64-bit mode, the linker automatically adds /usr/css/lib/pa20_64/fdp_init.o
or /usr/css/lib/pa20_64/fdp_init_sl.o to the link when detects
that -I crt0.o is not changed.
The compiler passes the -I option
to the linker, causing it to place instrumentation code in the resulting
executable.
You can see how the compiler invokes the linker by specifying the
-v option. For example, to instrument the file sample.c ,
to name the executable sample.inst , to perform level 2 optimizations
(the compiler option -O is equivalent to +O2 ),
and to see verbose output (-v ):
$ cc -v -o sample.inst +I -O sample.c
/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o \
sample.inst sample.o -I -lc
Notice in the linker command line (starting with /usr/ccs/bin/ld ),
the application is linked with /opt/langtools/lib/icrt0.o
and the -I option is given.
To save the profile data to a file other than flow.data
in the current working directory, use the FLOW_DATA environment
variable as described in Specifying a Different
flow.data with FLOW_DATA .
The Startup File icrt0.o
The icrt0.o startup file uses the atexit system call
to register the function that writes out profile data. (For 64-bit mode,
the initialization code is in /usr/ccs/lib/pa20_64/fdp_init.0 .)
That function is called when the application exits.
The atexit system call allows a fixed
number of functions to be registered from a user application. Instrumented
applications (those linked with -I ) have one less atexit
call available. One or more instrumented shared libraries use a single
additional atexit call. Therefore, an instrumented application
that contains any number instrumented shared libraries uses two of
the available atexit calls.
For details on atexit , see atexit(2).
The -I Linker Option
When invoked with the -I option, the linker instruments
all the specified object files. Note that the linker instruments regular
object files as well as I-SOM files; however, with regular object files,
only procedure call instrumentation is added. With I-SOM files, additional
instrumentation is done within procedures.
For instance, suppose you have a regular object file named foo.o
created by compiling without the +I option, and
you compile a source file bar.c with the +I
option and specify foo.o on the compile line:
$ cc -c foo.c
$ cc -v -o foobar -O +I bar.c foo.o
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc
In this case, the linker instruments both bar.o and
foo.o . However, since foo.o is not
an I-SOM file, only its procedure calls are instrumented; basic blocks
within procedures are not instrumented. To instrument foo.c
to the same extent, you must compile it with the +I option
- for example:
$ cc -v -c +I -O foo.c
/opt/langtools/lbin/cpp foo.c /var/tmp/ctm432
/opt/ansic/lbin/ccom /var/tmp/ctm432 foo.o -O2 -I
$ cc -v -o foobar -O +I bar.c foo.o
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc
A simpler approach is to compile foo.c and bar.c
with a single cc command:
$ cc -v +I -O -o foobar bar.c foo.c
/opt/langtools/lbin/cpp bar.c /var/tmp/ctm352
/opt/ansic/lbin/ccom /var/tmp/ctm352 bar.o -O2 -I
/opt/langtools/lbin/cpp foo.c /var/tmp/ctm456
/opt/ansic/lbin/ccom /var/tmp/ctm456 foo.o -O2 -I
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
bar.o foo.o -I -lc
Code Generation from I-SOMs
As discussed in Looking "inside"
a Compiler , a compiler driver invokes several phases. The last phase
before linking is code generation. When using PBO, the
compilation process stops at an intermediate code level. The PA-RISC code
generation and optimization phase is invoked by the linker. The code generator
is /opt/langtools/lbin/ucomp .
NoteSince the code generation phase is delayed until link time with
PBO, linking can take much longer than usual when using PBO. Compile times
are faster than usual, since code generation is not performed.
Building Portable Code with Linker Optimization
To build executables on a PA-RISC 2.0 system that run on 1.1 and
2.0 systems, compiled for optimization with +O4 , +P ,
or +I , explicitly compile those components with +DAportable
or +DA1.1 . This is due to the code generation that the linker
invokes at link-time for optimization. When you compile with +O4 ,
+P or +I , your compiler builds an I-SOM (Intermediate
code-System Object Module) file instead of a SOM file at compile time.
(See Instrumenting (+I/-I) for more information).
At link-time, the linker invokes the code generator (ucomp) to generate
SOM files from the I-SOM files and to complete the optimization. If you
did not build the I-SOM file with +DAportable or +DA1.1 ,
ucomp generates a SOM file that contains code for the PA-RISC architecture
of the machine on which you are building.
For example, if you build an archive library on a 1.1 system with
+O4 , +P , or +I , without specifying
the architecture, the I-SOM files in the library do not contain a specific
option for 1.1 code generation. If you move the archive library to a 2.0
system and use it to build an executable, the executable is built
as a 2.0 executable because of the link-time code generation. To build a 1.1
executable, rebuild the archive library with +DAportable
or +DA1.1 .
Another approach is to combine objects that have been compiled with
+O4 , +P , or +I into a merged object
file with the linker -r option: the -r option
produces an object file (SOM) not an I-SOM file. Since code generation
occurs when the merged file is built, if this file is built on a 1.1 system,
the file is safe to ship to other systems for building 1.1 applications.
To determine if an object file is an I-SOM file, use the size (1)
command. I-SOM files have zero listed for the size of all the sections
(text, data and bss (uninitialized data)):
$ size foo.o
0 + 0 + 0 = 0
Profiling
After instrumenting a program, you can run it one or more times
to generate profile data, which is ultimately used to perform the optimizations
in the final step of PBO.
This section provides information on the following profiling topics:
Choosing Input Data
For best results from PBO, use representative input data when running
an instrumented program. Input data that represents rare cases or error
conditions is usually not effective for profiling. Run the instrumented
program with input data that closely resembles the data in a typical user's
environment. Then, the optimizer focuses its efforts on the parts of
the program that are critical to performance in the user's environment.
You need not do a large number of profiling runs before the
optimization phase. Usually it is adequate to select a small number of
representative input data sets.
The flow.data File
When an instrumented program terminates with the exit(2)
system call, special code in the 32-bit icrt0.o startup file
or the 64-bit /usr/ccs/lib/pa20_64/fdp_init.o file writes
profile data to a file called flow.data in the current working
directory. This file contains binary data, which cannot be viewed or updated
with a text editor. The flow.data file is not updated when
a process terminates without calling exit . That happens,
for example, when a process aborts because of an unexpected signal, or when the
program calls exec(2) to replace itself with another program.
There are also certain non-terminating processes (such as servers,
daemons, and operating systems) which never call exit . For
these processes, you must programmatically write the profile data to the
flow.data file. In order to do so, a process must call a
routine called _write_counters() . This routine is defined
in the icrt0.o file. A stub routine with the same name is
present in the crt0.o file so that the source does not have
to change when instrumentation is not being done.
If flow.data does not exist, the program creates it.
If flow.data exists, the program updates the profile data.
As an example, suppose you have an instrumented program named prog.inst ,
and two representative input data files named input_file1
and input_file2 . Then the following lines create a
flow.data file:
$ prog.inst < input_file1
$ ls flow.data
flow.data
$ prog.inst < input_file2
The flow.data file includes profile data from both
input files.
To save the profile data to a file other than flow.data
in the current working directory, use the FLOW_DATA environment
variable as described in Specifying a Different
flow.data with FLOW_DATA .
Storing Profile Information for Multiple Programs
A single flow.data file can store information for multiple
programs. This allows an instrumented program to spawn other instrumented
programs, all of which share the same flow.data file.
To allow multiple programs to save their data in the same flow.data
file, a program's profile data is uniquely identified by the executable's
basename (see basename(1)), the executable's file size, and
the time the executable was last modified.
Instead of using the executable's basename, you can specify a basename
by setting the environment variable PBO_PGM_PATH . This is
useful when a number of programs are actually linked to the same instrumented
executables.
For example, consider profiling the ls , lsf
and lsx commands (lsx is ls with
the -x option and lsf is ls with
the -F option). Because the three commands could be linked
to the same instrumented executables, the developer may want to collect
profile data under a single basename by setting PBO_PGM_PATH=ls .
If PBO_PGM_PATH=ls is not set, the profile data is saved
under the ls , the lsf , and the lsx
basenames.
When an instrumented program begins execution, it checks whether
the basename, size, and time-stamp match those in the existing flow.data
file. If the basename matches but the size or time-stamp does not match,
that probably means that the program has been relinked since it last created
profile data. In this case, the following error message is issued:
program: Can't update counters. Profile data exists
but does not correspond to this executable. Exit.
You can fix this problem in any one of the following ways:
Remove or rename the existing flow.data
file.
Run the instrumented program in a different
working directory.
Set the FLOW_DATA environment
variable so that profile data is written to a file other than flow.data .
Rename the instrumented program.
Sharing the flow.data File Among Multiple Processes
A flow.data file can potentially be accessed by several
processes at the same time. For example, this can happen when you run
more than one instrumented program at the same time in the same directory,
or when profiling one program while linking another with -P .
Such asynchronous access to the file can potentially corrupt the
data. To prevent simultaneous access to the flow.data file
in a particular directory, a lock file called flow.lock
is used. Instrumented programs that need to update the flow.data
file and linker processes that need to read it must first obtain access
to the lock file. Only one process can hold the lock at any time. As long
as the flow.data file is being actively read and written,
a process will wait for the lock to become available.
A program that terminates abnormally can leave the flow.data
file inactive but locked. A process that tries to access an inactive but
locked flow.data file gives up after a short period of time.
In such cases, you may need to remove the flow.lock file.
If an instrumented program fails to obtain the database lock, it
writes the profile data to a temporary file and displays a warning message
containing the name of the file. You could then use the +df
option along with the +P option while optimizing, to specify
the name of the temporary file instead of the flow.data file.
If the linker fails to obtain the lock, it displays an error message
and terminates. In such cases, wait until all active processes that are
reading or writing a profile database file in that directory have completed.
If no such processes exist, remove the flow.lock file.
Forking an Instrumented Application
When instrumenting an application that creates a copy of itself
with the fork system call, you must ensure that the child
process calls a special function named _clear_counters() ,
which clears all internal profile data. If you don't do this, the child
process inherits the parent's profile data, updating the data as it executes,
resulting in inaccurate (exaggerated) profile data when the child terminates.
The following code segment shows a valid way to call _clear_counters :
if ((pid = fork()) == 0) /* this is the child process */
{
_clear_counters(); /* reset profile data for child */
. . . /* other code for the child */
}
The function _clear_counters is defined in icrt0.o .
It is also defined as a stub (an empty function that does nothing) in
crt0.o . This allows you to use the same source code without
modification in the instrumented and un-instrumented versions of the program.
Optimizing Based on Profile Data (+P/-P)
The final step in PBO is optimizing a program using profile data
created in the profiling phase. To do this, rebuild the program with the
+P compiler option. As with the +I option, the
+P option causes the compiler to generate an I-SOM .o
file, rather than the usual object code, for each source file.
Note that it is not really necessary to recompile the source files;
you could, instead, specify the I-SOM .o files that were
created during the instrumentation phase. For instance, suppose you have
already created an I-SOM file named foo.o from foo.c
using the +I compiler option; then the following commands
are equivalent in effect:
$ cc +P foo.c
$ cc +P foo.o
Both commands invoke the linker, but the second command doesn't
compile before invoking the linker.
The -P Linker Option
After creating an I-SOM file for each source file, the compiler
driver invokes the linker with the -P option, causing the
linker to optimize all the .o files. As with the +I
option, the driver uses /opt/langtools/lbin/ucomp to generate
code and perform various optimizations.
To see how the compiler invokes the linker, specify the -v
option when compiling. For instance, suppose you have instrumented prog.c
and gathered profile data into flow.data . The following example
shows how the compiler driver invokes the linker when +P
is specified:
$ cc -o prog -v +P prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
prog.o -P -lc
Notice how the program is now linked with /usr/ccs/lib/crt0.o
instead of /opt/langtools/lib/icrt0.o because the profiling
code is no longer needed.
Using The flow.data File
By default, the code generator and linker look for the flow.data
file in the current working directory. In other words, the flow.data
file created during the profiling phase should be located in the directory
where you relink the program.
Specifying a Different flow.data File with +df
What if you want to use a flow.data file from a different
directory than where you are linking? Or what if you have renamed the
flow.data file - for example, if you have multiple flow.data
files created for different input sets? The +df option allows
you to override the default +P behavior of using the file
flow.data in the current directory. The compiler passes this
option directly to the linker.
For example, suppose after collecting profile data, you decide to
rename flow.data to prog.prf . You could then
use the +df option as follows:
$ cc -v -o prog +P +df prog.prf prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
+df prog.prf prog.o -P -lc
The +df option overrides the effects of the FLOW_DATA
environment variable.
Specifying a Different flow.data with FLOW_DATA
The FLOW_DATA environment variable provides another
way to override the default flow.data file name and location.
If set, this variable defines an alternate file name for the profile data
file.
For example, to use the file /home/adam/projX/prog.data
instead of flow.data , set FLOW_DATA :
$ FLOW_DATA=/home/adam/projX/prog.data
$ export FLOW_DATA Bourne and Korn shell
$ setenv FLOW_DATA /home/adam/projX/prog.data C shell
Interaction between FLOW_DATA and +df
If an application is linked with +df and -P ,
the FLOW_DATA environment variable is ignored. In other words,
+df overrides the effects of FLOW_DATA .
Specifying a Different Program Name (+pgm)
When retrieving a program's profile data from the flow.data
file, the linker uses the program's basename as a lookup key. For instance,
if a program were compiled as follows, the linker would look for the profile
data under the name foobar :
$ cc -v -o foobar +P foo.o bar.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o foobar \
foo.o bar.o -P -lc
This works fine as long as the name of the program is the same during
the instrumentation and optimization phases. But what if the name of the
instrumented program is not the same as name of the final optimized program? What does linker do?
Let us say, for example, you want the name of the instrumented application
to be different from the optimized application. So, you use the following
compiler commands:
$ cc -O +I -o prog.inst prog.c //Instrument prog.inst.
$ prog.inst < input_file1 //Profile it, storing the data under
the name prog.inst.
$ prog.inst < input_file2
$ cc +P -o prog.opt prog.c //Optimize it, but name it prog.opt.
The linker is unable to find the program name prog.opt
in the flow.data file and issues the error message:
No profile data found for the program prog.opt in flow.data
To get around this problem, the compilers and linker provide the
+pgm name option, which allows you to specify
a program name to look for in the flow.data file. For instance,
to make the above example work properly, you would include +pgm
prog.inst on the final compile line:
$ cc +P -o prog.opt +pgm prog.inst prog.c
Like the +df option, the +pgm option is
passed directly to the linker.
Selecting an Optimization Level with PBO
When -P is specified, the code generator and linker
perform profile-based optimizations on any I-SOM or regular object files
found on the linker command line. In addition, optimizations will be performed
according to the optimization level you specified with a compiler option
when you instrumented the application. Briefly, the compiler optimization
options are:
+O0
-
Minimal optimization. This is the default.
+O1
-
Basic block level optimization.
+O2
-
Full optimization within each procedure in a file. (Can also
be invoked as -O .)
+O3
-
Full optimization across all procedures in an object B file.
Includes subprogram inlining.
+O4
-
Full optimization across entire application, performed at link
time. (Invokes ld +Ofastaccess +Oprocelim .) Includes inlining
across multiple files.
NoteThe +O3 and +O4 options are incompatible
with symbolic debugging. The only compiler optimization levels that allow
for symbolic debugging are +O2 and lower.
For more detailed information on compiler optimization levels, see
your compiler documentation.
PBO has the greatest impact when it is combined with level 2 or
greater optimizations. For instance, this compile command combines level
2 optimization with PBO (note that the compiler options +O2
and -O are equivalent):
$ cc -v -O +I -c prog.c
/opt/langtools/lbin/cpp prog.c /var/tmp/ctm123
/opt/ansic/lbin/ccom /var/tmp/ctm123 prog.o -O2 -I
$ cc -v -O +I -o prog prog.o
/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o prog \
prog.o -I -lc
The optimizations are performed along with instrumentation. However,
profile-based optimizations are not performed until you compile later
with +P :
$ cc -v +P -o prog prog.o
/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main \
-o prog prog.o -P -lc
Using PBO to Optimize Shared Libraries
Beginning with the HP-UX 10.0 release, the -I linker
option can be used with -b to build a shared library with
instrumented code. Also, the -P , +df , and +pgm
command-line options are compatible with the -b option.
To profile shared libraries, you must set the environment variable
SHLIB_FLOW_DATA to the file that receives profile data. Unlike
FLOW_DATA , SHLIB_FLOW_DATA has no default output
file. If SHLIB_FLOW_DATA is not set, profile data is not
collected. This allows you to activate or suspend the profiling of instrumented
shared libraries.
Note that you can set SHLIB_FLOW_DATA to flow.data
which is the same file as the default setting for FLOW_DATA .
But, again, profile data can be collected from shared libraries only if you explicitly
set SHLIB_FLOW_DATA to some output file.
The following is an example for instrumenting, profiling,
and optimizing a shared library:
$ cc +z +I -c -O libcode.c //Create I-SOM files.
$ ld -b -I libcode.o -o mylib.inst.sl //Create instrumented sl.
$ cc main.c mylib.inst.sl //Create executable a.out file.
$ export SHLIB_FLOW_DATA=./flow.data //Specify output file for profile data
$ a.out < input_file //Run instrumented executable with
representative input data.
$ ld -b -P +pgm mylib.inst.sl \ //Perform PBO.
libcode.o -o mylib.sl
Note that the name used in the database is the output pathname
specified when the instrumented library is linked (mylib.inst.sl
in the example above), regardless of how the library might be moved or
renamed after it is created.
Using PBO with ld -r
Beginning with the HP-UX 10.0 release, you can take greater advantage
of PBO on merged object files created with the -r linker
option.
Briefly, ld -r combines multiple .o files
into a single .o file. It is often used in large product
builds to combine objects into more manageable units. It is also often
used in combination with the linker -h option to hide symbols
that may conflict with other subsystems in a large application. (See Hiding Symbols with -h for more information
on ld -h .)
In HP-UX 10.0, the subspaces in the merged .o file
produced by ld -r are relocatable which allows for greater
optimization.
The following is an example of using PBO with ld -r :
$ cc +I -c file1.c file2.c //Create individual I-SOM files
$ ld -r -I -o reloc.o file1.o file2.o //Build relocatable, merged file
$ cc +I -o a.out reloc.o //Create instrumented executable file.
$ a.out < input_file //Run instrumented executable
with representative input data.
$ ld -r -P +pgm a.out -o reloc.o \
file1.o file2.o //Rebuild relocatable file for PBO.
$ cc +P -o a.out reloc.o //Perform PBO on the final executable
file.
Notice, in the example above, that the +pgm option was
necessary because the output file name differs from the instrumented program
file name.
NoteIf you are using -r and C++ templates, check "Known
Limitations" in the HP C++ Release Notes for possible
limitations.
Restrictions and Limitations of PBO
This section describes restrictions and limitations you must be
aware of when using Profile-Based Optimization. This section discusses the folowing topics:
NotePBO calls malloc() during the instrumentation (+I )
phase. If you replace libc malloc(3C) calls with
your own version of malloc() , use the same parameter list
(data types, order, number, and meaning of parameters) as the HP version.
(For information on malloc() , see malloc(3C).)
Temporary Files
The linker does not modify I-SOM files. Rather, it compiles, instruments,
and optimizes the code, placing the resulting temporary object file in
a directory specified by the TMPDIR environment variable.
If PBO fails due to inadequate disk space, try freeing up space on the
disk that contains the $TMPDIR directory. Or, set TMPDIR
to a directory on a disk with more free space.
Source Code Changes and PBO
To avoid the potential problems described later in this section, PBO must be used only
during the final stages of application development and performance
tuning, when source code changes are the least likely to be made. Whenever
possible, an application should be re-profiled after source code changes
have been made.
What happens if you attempt to optimize a program using profile
data that is older than the source files? For example, this can occur
if you change source code and recompile with +P , but don't
gather new profile data by re-instrumenting the code.
In that sequence of events, optimizations are still performed.
However, full profile-based optimizations will be performed only on those
procedures whose internal structure has not changed since the profile
data was gathered. For procedures whose structure has changed,
the following warning message is generated:
ucomp warning: Code for name changed since profile
database file flow.data built. Profile data for name
ignored. Consider rebuilding flow.data.
Note that it is possible to make a source code change that does
not affect the control flow structure of a procedure, but which does significantly
affect the profiling data generated for the program. In other words, a
very small source code change can dramatically affect the paths through
the program that are most likely to be taken. For example, changing the
value of a program constant that is used as a parameter or loop limit
value may have this effect. If the user does not re-profile the application
after making source code changes, the profile data in the database does
not reflect the effects of those changes. Consequently, the transformations
made by the optimizer can degrade the performance of the application.
Profile-Based Optimization (PBO) and High-Level Optimization
(HLO)
High-level optimization, or HLO, consists of a number of optimizations,
including inlining, that are automatically invoked with the +O3
and +O4 compiler options. (Inlining is an optimization that
replaces each call to a routine with a copy of the routine's actual code.)
+O3 performs HLO on each module while +O4 performs
HLO over the entire program and removes unnecessary ADDIL instructions.
Since HLO distorts profile data, it is suppressed during the instrumentation
phases of PBO.
When +I is specified along with +O3 or
+O4 , an I-SOM file is generated. However, HLO is not performed
during I-SOM generation. When the I-SOM file is linked, using the +P
option to do PBO, HLO is performed, taking advantage of the profile data.
Example
The following example illustrates high-level optimization with PBO:
$ cc +I +O3 -c file.c //Create I-SOM for instrumentation.
$ cc +I +O3 file.o //Link with instrumentation.
$ a.out < input_file //Run instrumented executable with
representative input data.
$ cc +P +O3 file.o //Perform PBO and HLO.
Replace +O3 with +O4 in the above example
to get HLO over the entire program and ADDIL elimination. (You may see
a warning when using +O4 at instrumentation indicating that
the +O4 option is being ignored. You can ignore this warning.)
I-SOM File Restrictions
For the most part, there are not many noticeable differences between
I-SOM files and ordinary object files. Exceptions are noted below.
ld
Linking object files compiled with the +I or +P
option takes much longer than linking ordinary object files. This is because
in addition to the work that the linker already does, the code generator
must be run on the intermediate code in the I-SOM files. On the other
hand, the time to compile a file with +I or +P
is relatively fast because code generation is delayed until link time.
All options to ld work normally with I-SOM files
with the following exceptions:
-r
-
The -r option works with both -I and
-P . However, it produces an object file, not an
I-SOM file. In 64-bit mode, use -I , -P , or the
+nosectionmerge option on a -r linker command
to allow procedures to be positioned independently. Without these options,
a -r link merges procedures into a single section.
-s
-
Do not use this option with -I . However, there
is no problem using this option with -P .
-G
-
Do not use this option with -I . However, there is no problem
using this option with -P .
-A
-
Do not use this option with -I or -P .
-N
-
Do not use this option with -I or -P .
nm
The nm command works on I-SOM files. However, because
code generation has not yet been performed, some of the imported symbols
that may appear in an ordinary relocatable object file does not appear
in an I-SOM file.
ar
I-SOM files can be manipulated with ar in exactly the
same way that ordinary relocatable files can be.
size
To determine if an object file is an I-SOM file, use the size (1)
command. I-SOM files have zero listed for the size of all the sections
(text, data and bss (uninitialized data)):.
$ size foo.o
0 + 0 + 0 = 0
strip
Do not run strip on files compiled with +I
or +P . Doing so results in an object file that is essentially
empty.
Compiler Options
Except as noted below, all cc , CC , and
f77 compiler options work as expected when specified with
+I or +P :
-g
-
This option is incompatible with +I and +P .
-G
-
This option is incompatible with +I , but compatible
with +P (as long as the insertion of the gprof
library calls does not affect the control flow graph structure of the
procedures.)
-p
-
This option is incompatible with +I option, but
is compatible with +P (as long as the insertion of the prof
code does not affect the control flow graph structure of the procedures.)
-s
-
You must not use this option together with +I .
Doing so results in an object file that is essentially empty.
-S
-
This option is incompatible with +I and +P
options because assembly code is not generated from the compiler in these
situations. Currently, it is not possible to get assembly code listings
of code generated by +I and +P .
-y /+y
-
The same restrictions apply to these options that were mentioned
for -g above.
+o
-
This option is incompatible with +I and +P .
Currently, you cannot get code offset listings for code generated by +I
and +P .
Compatibility with 9.0 PBO
PBO is largely compatible between the 9.0 and 10.0 releases of HP-UX.
I-SOM files created under 9.0 are completely acceptable in the 10.0
environment.
However, it is advantageous to re-profile programs under 10.0 in
order to achieve improved optimization. Although you can use profile data
in flow.data files created under 9.0, the resulting optimization
will not take advantage of 10.0 enhancements. In addition, a warning is
generated stating that the profile data is from a previous release. See
the section called Profiling in this chapter for more
information.
See the section called Profiling for more information
about the warning generated for profile data generated from a previous
release.
|