Previous | Contents | Index |
The DIGITAL Fortran 90 compiler performs code optimizations ( -o4 ) by default, unless you specify -g (or -g2 ).
Debugging optimized code is recommended only under special circumstances; for example, if a problem disappears when you specify the -o0 option.
One aid to debugging optimized code is to use one of the following command-line options:
By referring to a listing of the generated code, you can see how the compiler optimizations affected your code. This lets you determine the debugging commands you need in order to isolate the problem.
For a discussion of compiler optimizations, see Section 5.8 and Section 5.7.
When you try to perform a debugger operation on a variable or language construct that has been optimized, the variable or line may not exist in the debugging environment. For example:
For more information on optimizations, see Section 5.7 and Section 5.8.
This chapter discusses the following topics related to improving run-time performance of DIGITAL Fortran 90 programs:
This chapter does not address the performance and profiling of programs
that execute in parallel using the DIGITAL Parallel Software Environment. For
information about performance and profiling of parallel HPF programs,
see the DIGITAL High Performance Fortran 90 HPF and PSE Manual.
5.1 Software Environment and Efficient Compilation
Before you attempt to analyze and improve program performance, you should:
To ensure that your software development environment can significantly improve the run-time performance of your applications, obtain and install the following optional software products:
% f90 for_cal.f90 -ldxml |
% kf90 -lc=blas for_cal.f90 -ldxml |
For More Information:
About system-wide tuning and suggestions for other performance
enhancements on DIGITAL UNIX systems, see the manual DIGITAL UNIX System Tuning and Performance.
5.1.2 Compile Using Multiple Source Files and Appropriate f90 Options
During the earlier stages of program development, you can use incremental compilation with minimal optimization. For example:
% f90 -c -O1 sub2.f90 % f90 -c -O1 sub3.f90 % f90 -o main.out -g -O0 main.f90 sub2.o sub3.o |
During the later stages of program development, you should specify multiple source files together and use an optimization level of at least -o4 on the f90 command line to allow more interprocedure optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization ( -o4 ):
% f90 -o main.out main.f90 sub2.f90 sub3.f90 |
Compiling multiple source files lets the compiler examine more code for possible optimizations, which results in:
For very large programs, compiling all source files together may not be practical. In such instances, consider compiling source files containing related routines together using multiple f90 commands, rather than compiling source files individually.
Table 5-1 shows f90 options that can improve performance. Most of these options do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results.
DIGITAL Fortran 90 performs certain optimizations unless you specify the appropriate f90 command options. Additional optimizations can be enabled or disabled using f90 command options.
Table 5-1 lists the f90 options that can directly improve run-time performance.
Option Names | Description | For More Information |
---|---|---|
-align keyword | Controls whether padding bytes are added between data items within common blocks, derived-type data, and DIGITAL Fortran record structures to make the data items naturally aligned. | Section 5.3 |
-architecture keyword | Determines the type of Alpha architecture code instructions to be generated for the program unit being compiled. All Alpha processors implement a core set of instructions; certain processor versions include additional instruction extensions. | Section 3.4 |
-cord and -feedback file | Uses a feedback file created during a previous compilation by specifying the -gen_feedback option. These options use the feedback file to improve run-time performance, optionally using cord to rearrange procedures. | Section 5.2.3 |
-fast |
Sets the following performance-related options:
|
See description of each option |
-fp_reorder | Allows the compiler to reorder code based on algebraic identities to improve performance, enabling certain optimizations. The numeric results can be slightly different from the default ( -no_fp_reorder ) because of the way intermediate results are rounded. This slight difference in numeric results is acceptable to most programs. | Section 5.8.9 |
-gen_feedback | Requests generated code that allows accurate feedback information for subsequent use of the -feedback file option (optionally with cord ). Using -gen_feedback changes the default optimization level from -o4 to -o0 . | Section 5.2.3 |
-inline all | Inlines every call that can possibly be inlined while generating correct code. Certain recursive routines are not inlined to prevent infinite loops. | Section 5.8.5 |
-inline speed | Inlines procedures that will improve run-time performance with a likely significant increase in program size. | Section 5.8.5 |
-inline size | Inlines procedures that will improve run-time performance without a significant increase in program size. This type of inlining occurs at optimization level -o4 and -o5 . | Section 5.8.5 |
-math_library fast | Requests the use of certain math library routines (used by intrinsic functions) that provide faster speed. Using this option causes a slight loss of accuracy and provides less reliable arithmetic exception checking to get significant performance improvements in those functions. | Section 3.50 |
-mp | Enables parallel processing using directed decomposition (directives inserted in source code). This can improve the performance of certain programs running on shared memory multiprocessor systems | Section 3.52 |
-o num ( -o0 to -o5 ) | Controls the optimization level and thus the types of optimization performed. The default optimization level is -o4 , unless you specify -g2 , -g , or -gen_feedback , which changes the default to -o0 (no optimizations). Use -o5 to activate loop transformation optimizations and the software pipelining optimization. | Section 5.7 |
-om | Used with the -non_shared option to request certain code optimizations after linking, including nop (No Operation) removal, .lita removal, and reallocation of common symbols. This option also positions the global pointer register so the maximum addresses fall in the global-pointer window. | Section 3.59 |
-omp | Enables parallel processing using directed decomposition (directives inserted in source code). This can improve the performance of certain programs running on shared memory multiprocessor systems | Section 3.60 |
-p , -p1 | Requests profiling information, which you can use to identify those parts of your program where improving source code efficiency would most likely improve run-time performance. After you modify the appropriate source code, recompile the program and test the run-time performance. | Section 5.2.2 |
-pg | Requests profiling information for the gprof tool, which you can use to identify those parts of your program where improving source code efficiency would most likely improve run-time performance. After you modify the appropriate source code, recompile the program and test the run-time performance. | Section 5.2.2 |
-pipeline | Activates the software pipelining optimization (a subset of -o5 ). | Section 3.62 |
-speculate keyword | Enables the speculative execution optimization, a form of instruction scheduling for conditional expressions. | Section 3.70 |
-transform_loops | Activates a group of loop transformation optimizations (a subset of -o5 ). | Section 3.75 |
-tune keyword | Specifies the target processor generation (chip) architecture on which the program will be run, allowing the optimizer to make decisions about instruction tuning optimizations needed to create the most efficient code. Keywords allow specifying one particular Alpha processor generation type, multiple processor generation types, or the processor generation type currently in use during compilation. Regardless of the setting of -tune keyword , the generated code will run correctly on all implementations of the Alpha architecture. | Section 5.8.6 |
-unroll num | Specifies the number of times a loop is unrolled ( num) when specified with optimization level -o3 or higher. If you omit -unroll num , the optimizer determines how many times loops are unrolled. | Section 5.7.4.1 |
-wsf num and related options | Specifies that the code generated for this program will allow parallel execution on multiple processors using the DIGITAL Parallel Software Environment | Section 3.87 and the DIGITAL High Performance Fortran 90 HPF and PSE Manual |
Table 5-2 lists options that can slow program performance. Some applications that require floating-point exception handling or rounding might need to use the -fpen and -fprm dynamic options. Other applications might need to use the -assume dummy_aliases or -vms options for compatibility reasons. Other options listed in Table 5-2 are primarily for troubleshooting or debugging purposes.
Option Names | Description | For More Information |
---|---|---|
-assume dummy_aliases |
Forces the compiler to assume that dummy (formal) arguments to
procedures share memory locations with other dummy arguments or with
variables shared through use association, host association, or common
block use. These program semantics slow performance, so you should
specify
-assume dummy_aliases
only for the called subprograms that depend on such aliases.
The use of dummy aliases violates the FORTRAN-77 and Fortran 90 standards but occurs in some older programs. |
Section 5.8.10 |
-c | If you use -c when compiling multiple source files, also specify -o output to compile many source files together into one object file. Separate compilations prevent certain interprocedure optimizations, such as when using multiple f90 commands or using -c without the -o output option. | Section 2.1.7 |
-check bounds | Generates extra code for array bounds checking at run time. | Section 3.16 |
-check omp_bindings | Provides run-time checking to enforce the binding rules for OpenMP Fortran API (parallel processing) compiler directives inserted in source code. | Section 3.20 |
-check overflow | Generates extra code to check integer calculations for arithmetic overflow at run time. Once the program is debugged, omit this option to reduce executable program size and slightly improve run-time performance. | Section 3.21 |
-fpe n values greater than -fpe0 | Using -fpe1 , -fpe2 , -fpe3 , or -fpe4 (or using the for_set_fpe routine to set equivalent exception handling) slows program execution. For programs that specify -fpe3 or -fpe4 , the impact on run-time performance can be significant. | Section 3.35 |
-fprm dynamic | Certain rounding modes and changing the rounding mode can slow program execution slightly. | Section 3.36 |
-g , -g2 , -g3 | Generates extra symbol table information in the object file. Specifying -g or -g2 also reduces the default level of optimization to -o0 . | Section 3.38 |
-inline none
-inline manual |
Prevents the inlining of all procedures (except statement functions). | Section 5.8.5 |
-o0 , -o1 , -o2 , or -o3 | Minimizes the optimization level (and types of optimizations). Use during the early stages of program development or when you will use the debugger. | Section 3.58 and Section 5.7 |
-synchronous_exceptions | Generates extra code to associate an arithmetic exception with the instruction that causes it, slowing efficient instruction execution. Use this option only when troubleshooting, such as when identifying the source of an exception. | Section 3.72 |
-vms | Controls certain VMS-related run-time defaults, including alignment. If you specify the -vms option, you may need to also specify the -align records option to obtain optimal run-time performance. | Section 3.82 |
Previous | Next | Contents | Index |