Previous | Contents | Index |
Certain shell commands and system tuning can improve run-time performance:
# myprog > results.lis # more results.lis |
On system tuning and
cc
options related to performance, see your operating system documentation
and the appropriate reference pages.
5.2 Analyzing Program Performance
This section describes how you can:
Before you analyze program performance, make sure any errors you might have encountered during the early stages of program development have been corrected.
For information about parallel profiling techniques and the
pprof profiler on Tru64 UNIX systems, see the Compaq Parallel Software Environment documentation.
5.2.1 Use the time Command to Measure Performance
Use the time command to provide information about program performance.
Run program timings when other users are not active. Your timing results can be affected by one or more CPU-intensive processes also running while doing your timings.
Try to run the program under the same conditions each time to provide the most accurate results, especially when comparing execution times of a previous version of the same program. Use the same CPU system (model, amount of memory, version of the operating system, and so on) if possible.
If you do need to change systems, you should measure the time using the same version of the program on both systems, so you know each system's effect on your timings.
For programs that run for less than a few seconds, run several timings to ensure that the results are not misleading. Overhead functions like loading shared libraries might influence short timings considerably.
Using the form of the time command that specifies the name of the executable program provides the following:
In the following example timings, the sample program being timed displays the following line:
Average of all the numbers is: 4368488960.000000 |
Using the Bourne shell, the following program timing reports that the program uses 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use) and 2.46 seconds of elapsed time:
$ time a.out Average of all the numbers is: 4368488960.000000 real 0m2.46s user 0m0.61s sys 0m0.58s |
Using the C shell, the following program timing reports 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use), about 4 seconds (0:04) of elapsed time, the use of 28% of available CPU time, and other information:
% time a.out Average of all the numbers is: 4368488960.000000 0.61u 0.58s 0:04 28% 78+424k 9+5io 0pf+0w |
Using the bash shell (L*X ONLY), the following program timing reports that the program uses 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use) and 2.46 seconds of elapsed time:
[user@system user]$ time ./a.out Average of all the numbers is: 4368488960.000000 elapsed 0m2.46s user 0m0.61s sys 0m0.58s |
Timings that show a large amount of system time may indicate a lot of time spent doing I/O, which might be worth investigating.
If your program displays a lot of text, you can redirect the output from the program on the time command line (see Section 5.1.3). Redirecting output from the program will change the times reported because of reduced screen I/O.
For more information, see time(1).
In addition to the time command, you might consider modifying the program to call routines within the program to measure execution time. For example:
To generate profiling information, use the f90 compiler and the prof , gprof , and pixie (TU*X ONLY) tools.
(TU*X ONLY) If you have installed the Parallel Software Environment (PSE) and need to profile a parallel HPF program, you can use the pprof profiler. For information about parallel profiling techniques and pprof, see the Compaq Parallel Software Environment documentation. The remainder of this section discusses nonparallel profiling.
Profiling identifies areas of code where significant program execution time is spent. Along with the f90 command, use the prof and pixie (TU*X ONLY) tools to generate the following profile information:
Once you have determined those sections of code where most of the program execution time is spent, examine these sections for coding efficiency. Suggested guidelines for improving source code efficiency are provided in Section 5.6.
You can also use the profiler facility provided by the optional DEC
FUSE product, which provides an integrated development environment and
windowing interface to many Compaq Tru64 UNIX program development
facilities (see the DEC Fuse Handbook).
5.2.2.1 Program Counter Sampling (prof)
To obtain program counter sampling data, perform the following steps:
% f90 -p -O3 -o profsample profsample.f90 |
% f90 -c -O3 profsample.f90 % f90 -p -O3 -o profsample profsample.o |
% profsample |
% prof profsample mon.out |
You can limit the report created by prof by using prof command options, such as -only , -exclude , or -quit .
For example, if you only want reports on procedures calc_max and calc_min, you could use the following command line to read the profile data file named mon.out :
% prof -only calc_max -only calc_min profsample |
The time spent in particular areas of code is reported by prof in the form of a percentage of the total CPU time spent by the program. To reduce the size of the report, you can either:
When you use the -only or -exclude options, the percentages are still based on all procedures of the application. To obtain percentages calculated by prof that are based on only those procedures included in the report, use the -Only and -Exclude options (use an uppercase initial letter in the option name).
You can use the -quit option to reduce the amount of information reported. For example, the following command prints information on only the five most time-consuming procedures:
% prof -quit 5 profsample |
The following command limits information only to those procedures using 10% or more of the total execution time:
% prof -quit 10% profsample |
For more information on
prof
, see prof(1) and the Compaq Tru64 UNIX Programmer's Guide.
5.2.2.2 Call Graph Sampling (gprof)
To obtain call graph information, use the gprof tool. Perform the following steps:
% f90 -pg -O3 -o profsample profsample.for |
% f90 -pg -c -O3 profsample.f90 % f90 -pg -O3 -o profsample profsample.f90 |
% profsample |
% gprof profsample gmon.out |
The output produced by gprof includes:
For more information on using
gprof
and its output, see the Compaq Tru64 UNIX Programmer's Guide.
5.2.2.3 Basic Block Counting (pixie and prof)
To obtain basic block counting information, perform the following steps:
% f90 -O3 -o profsample profsample.f90 |
% atom -tools pixie profsample |
% profsample.pixie |
% prof -pixie profsample |
To create multiple profile data files, run the program multiple times.
For more information on
prof
,
gprof
, and
pixie
(TU*X ONLY), see prof(1), gprof(1), pixie(1), and the Compaq Tru64 UNIX Programmer's Guide.
5.2.2.4 Source Line CPU Cycle Use (prof and pixie)
You use the same files created by the pixie command (see Section 5.2.2.3) for basic block counting to estimate the number of CPU cycles used to execute each source file line.
To view a report of the number of CPU cycles estimated for each source file line, use the following options with the prof command:
Depending on the level of optimization chosen, certain source lines might be optimized away.
The CPU cycle use estimates are based primarily on the instruction type and its operands and do not include memory effects such as cache misses or translation buffer fills.
For example, the following command sequence uses:
% f90 -o profsample profsample.f90 % atom -tools pixie profsample % profsample.pixie % prof -pixie -heavy -only calc_max profsample |
You can create a feedback file by using a series of commands. Once created, you can specify a feedback file in a subsequent compilation with the f90 command option -feedback . You can also request that cord use the feedback file to rearrange procedures, by specifying the -cord option on the f90 command line.
To create the feedback file, complete these steps:
% f90 -o profsample -gen_feedback profsample.f90 |
% pixie profsample |
% profsample.pixie |
% prof -pixie -feedback profsample.feedback profsample |
You can use the feedback file as input to the f90 compiler:
% f90 -feedback profsample.feedback -o profsample profsample.f90 |
The feedback file provides the compiler with actual execution information, which the compiler can use to improve such optimizations as inlining function calls.
Specify the desired optimization level ( -On option) for the f90 command with the -feedback name option (in this example the default is -O4 ).
You can use the feedback file as input to the f90 compiler and cord , as follows:
% f90 -cord -feedback profsample.feedback -o profsample profsample.f90 |
The
-cord
option invokes
cord
, which reorders the procedures in an executable program to improve
program execution, using the information in the specified feedback
file. Specify the desired optimization level (
-On
option) for the
f90
command with the
-feedback name
option (in this example
-O4
).
5.2.4 Atom Toolkit
(TU*X ONLY) The Atom toolkit includes a programmable instrumentation tool and several prepackaged tools. The prepackaged tools include:
To invoke atom tools, use the following general command syntax:
% atom -tool tool-name ...) |
For more information, see the Compaq Tru64 UNIX Programmers Guide, atom(1), hiprof(5), pixie(5), and third(5).
Atom does not work on programs built with the
-om
option.
5.3 Data Alignment Considerations
For optimal performance on Alpha systems, make sure your data is aligned naturally.
A natural boundary is a memory address that is a multiple of the data item's size (data type sizes are described in Table 9-1). For example, a REAL (KIND=8) data item aligned on natural boundaries has an address that is a multiple of 8. An array is aligned on natural boundaries if all of its elements are.
All data items whose starting address is on a natural boundary are naturally aligned. Data not aligned on a natural boundary is called unaligned data.
Although the Compaq Fortran compiler naturally aligns individual data items when it can, certain Compaq Fortran statements (such as EQUIVALENCE) can cause data items to become unaligned (see Section 5.3.1).
Although you can use the f90 command -align keyword options to ensure naturally aligned data, you should check and consider reordering data declarations of data items within common blocks and structures. Within each common block, derived type, or record structure, carefully specify the order and sizes of data declarations to ensure naturally aligned data. Start with the largest size numeric items first, followed by smaller size numeric items, and then nonnumeric (character) data.
Previous | Next | Contents | Index |