Compaq Fortran
User Manual for
Tru64 UNIX and Linux Alpha Systems

3.58 -nofor_main --- Allow Non-Fortran Main Program

Specify the -nofor_main option when the main program is not written in Fortran. For example, if the main program is written in C and calls a Compaq Fortran subprogram, specify -nofor_main when compiling the program with the f90 command. Specifying -nofor_main prevents linking for_main.o into programs.

If you omit -nofor_main , the main program must be a Fortran program.

3.59 -noinclude --- Omit Standard Directory Search for INCLUDE Files

Specifying the -noinclude option directs the Fortran compiler to not search for include files in the /usr/include directory. This option does not apply to the directories searched for module files or cpp files.

To request that the cpp preprocessor not search for #include files in the /usr/include directory, use the -I option (see Section 3.26.2).

3.60 -nowsf_main --- Compile HPF Global Routine for Nonparallel Main Program

Use the -nowsf_main option (TU*X ONLY) to indicate that the HPF global routine being compiled will be linked with a main program that was not compiled with the -wsf option.

3.61 -o output --- Name Output File

If you omit -c and specify -o output, this names the executable program file output instead of a.out in the current working directory.

When you specify -c with -o output, this names the retained object file output.

For More Information:

On output files and their names, see Section 2.1.6.
On using multiple input files, see Section 2.1.7.

3.62 -O0, -O1, -O2, -O3, -O4 or -O, -O5 --- Specify Optimization Level

Use the -O0 , -O1 , -O2 , -O3 , -O4 (same as -O ), and -O5 options to specify the level of optimization performed during compilation.

The default level of optimization is -O4 unless you specify the -g2 , -g , or -gen_feedback option (in which case the default is -O0 ).

At optimization levels lower than -O4 , the compiler issues "uninitialized variable" warnings.

In most cases, the higher the level of optimization you specify, the faster the program will execute. However, the faster execution speeds that result from using -O3 or higher usually produce larger object files and longer compile times. The following options apply:

-O0
Specifying -O0 disables nearly all optimizations. If you specify -g2 or -g , this is the default.
-O1
Specifying -O1 enables local optimizations within the source program unit, recognition of common subexpressions, and expansion of integer multiplication and division (using shifts).
-O2
Specifying -O2 enables global optimization. This includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling. Specifying -O2 includes the optimizations performed by -O1 (implies -O1 ).
-O3
Specifying -O3 enables additional global optimizations that improve speed (at the cost of extra code size). These optimizations include:

Loop unrolling
Code replication to eliminate branches

Specifying -O3 implies the optimizations performed at levels -O1 and -O2 .
-O4 or -O
Specifying -O4 enables interprocedure analysis and automatic inlining of small procedures (with heuristics limiting the amount of extra code). This is the default unless you also specify -g2 or -g (specifying -g2 or -g changes the default to -O0 ).
Specifying -O4 or -O implies the optimizations performed at levels -O1 , -O2 , and -O3 .
-O5
Specifying -O5 activates the loop transformation optimizations (also set by -transform_loops ) and the software pipelining optimization (also set by -pipeline ):

The loop transformation optimizations are a group of optimizations that apply to array references within loops. These optimizations can improve the performance of the memory system and can apply to multiple nested loops.
Loop transformation optimizations include loop blocking, loop distribution, loop fusion, loop interchange, loop scalar replacement, and outer loop unrolling.
To specify loop transformation optimizations without software pipelining, do one of the following:

Specify -O5 with -nopipeline (preferred method)
Specify -transform_loops with -O4 , -O3 , or -O2 . This optimization is not performed at optimization levels below -O2 .

For more information on the loop transformation optimizations, see Section 3.79.
The software pipelining optimization applies instruction scheduling to certain innermost loops, allowing instructions within a loop to "wrap around" and execute in a different iteration of the loop. This can reduce the impact of long-latency operations, resulting in faster loop execution. Software pipelining also enables the prefetching of data to reduce the impact of cache misses.
To specify software pipelining without loop transformation optimizations, do one of the following:

Specify -O5 with -notransform_loops (preferred method)
Specify -pipeline with -O4 , -O3 , or -O2 . This optimization is not performed at optimization levels below -O2 .

For more information on software pipelining, see Section 3.66.

In addition to loop transformation and software pipelining, specifying -O5 activates certain optimizations that are not activated by -transform_loops and -pipeline , including byte-vectorization, and insertion of additional NOP (No Operation) instructions for alignment of multi-issue sequences.
To determine whether using -O5 benefits your particular program, you should time program execution for the same program (or subprogram) compiled at levels -O4 and -O5 .
Specifying -O5 implies the optimizations performed at levels -O1 , -O2 , -O3 and -O4 .

For More Information:

On the effects of the -O5 option, see Section 3.79 and Section 3.66.
On limiting loop unrolling with optimization level -O3 or higher ( -unroll num ), see Section 3.83.
On speculative execution optimization, see Section 3.74.
On timing program execution, see Section 5.2.1.
On the related -fp_reorder option, see Section 3.12.
On improving and measuring run-time performance, see Chapter 5.
On the optimizations performed at each level, see Section 5.7.

3.63 -om --- Request Nonshared Object Optimizations

Use the -om option (TU*X ONLY) with the -non_shared option to request certain code optimizations after linking, including nop (No Operation) removal, .lita removal, and reallocation of common symbols. This option also positions the global pointer register so the maximum addresses fall in the global-pointer window.

Pass -om options to the linker using the -Wl,arg form:

-Wl,-om_compress_lita removes unused .lita entries after optimization, and then compresses the .lita section.
-Wl,-om_dead_code removes dead code (unreachable instructions) generated after applying optimizations. The .lita section is not compressed by this option.
-Wl,-om_no_inst_sched turns off instruction scheduling.
-Wl,-om_no_align_labels turns off alignment of labels. Normally, the -om option aligns the targets of all branches on quadword boundaries to improve loop performance.
-Wl,-om_Gcommon,num sets the size threshold of common symbols. Every common symbol whose size is less than or equal to num will be allocated close to each other. This option can be used to improve the probability that the symbol can be accessed directly from the global pointer register. Normally, -om tries to collect all common symbols together.

For more information, see your operating system documentation.

3.64 -omp --- Enable OpenMP Parallel Processing Using Directed Decomposition

Use the -omp option (TU*X ONLY) to enable parallel processing that uses directed decomposition. Parallel processing is directed by inserting OpenMP directives in your source code. This kind of parallel processing is intended for shared memory multiprocessor systems.

Some of the OpenMP directives include:

ATOMIC
BARRIER
CRITICAL and END CRITICAL
DO and END DO
FLUSH
MASTER and END MASTER
ORDERED and END ORDERED
PARALLEL and END PARALLEL
PARALLEL SECTIONS and END PARALLEL SECTIONS
SECTIONS, SECTION, and END SECTIONS
SINGLE and END SINGLE

For more information, see Chapter 6 and Appendix D and the Compaq Fortran Language Reference Manual.

3.65 -pad_source --- Pad Short Source Records with Spaces

Specify the -pad_source option to request that source records shorter than the statement field width are to be padded with spaces on the right, out to the end of the statement field. This affects the interpretation of character and Hollerith literals that are continued across source records.

The default is -nopad_source . This causes a warning message to be displayed if a character or Hollerith literal that ends before the statement field ends is continued onto the next source record. To suppress this warning message, specify the -warn nousage option.

Specifying -pad_source can prevent warning messages associated with -warn usage .

3.66 -pipeline --- Activate Software Pipelining Optimization

Specifying -pipeline (or -O5 ) activates the software pipelining optimization. The software pipelining optimization applies instruction scheduling to certain innermost loops, allowing instructions within a loop to "wrap around" and execute in a different iteration of the loop. This can reduce the impact of long-latency operations, resulting in faster loop execution.

For this version of Compaq Fortran, loops chosen for software pipelining are always innermost loops and do not contain branches, procedure calls, or COMPLEX floating-point data.

Software pipelining can be more effective when you combine -pipeline with the appropriate -tune keyword for the target Alpha processor generation (see Section 3.80).

Software pipelining also enables the prefetching of data to reduce the impact of cache misses.

Software pipelining is a subset of the optimizations activated by -O5 . Instead of specifying both -pipeline and -transform_loops , you can specify -O5 .

To specify software pipelining without loop transformation optimizations, do one of the following:

Specify -O5 with -notransform_loops (preferred method)
Specify -pipeline with -O4 , -O3 , or -O2 . This optimization is not performed at optimization levels below -O2 .

To determine whether using -pipeline benefits your particular program, you should time program execution for the same program (or subprogram) compiled with and without software pipelining (such as with -pipeline and -nopipeline ).

For programs that contain loops that exhaust available registers, longer execution times may result with -O5 , requiring use of -unroll n to limit loop unrolling (see Section 3.83).

For More Information:

About the -O5 option, see Section 3.62.
On software pipelining, see Section 5.8.2.

3.67 -p0, -p1 or -p, -pg, and -pprof --- Profiling Support

Nonparallel programs ( -wsf option omitted) and (TU*X ONLY) parallel HPF programs ( -wsf option specified) use different profiling tools, which need different profiling options. Profiling information identifies those parts of your program where improving source code efficiency would most likely improve run-time performance.

If you omit the -wsf option, you can use the prof and pixie (TU*X ONLY) tools if you specify the -p0 and -p1 or -p options to control the level of profiling support provided during compilation (the default is -p0 ). When you omit -wsf , the -pprof method option is ignored.

If you specify the -wsf option (TU*X ONLY), you can use the -pprof method option to use the pprof parallel profiler. When you specify -wsf , omit the -p0 , -p1 , and -p options.

Options related to profiling include:

-p0
Specifying -p0 (the default) does not permit profiling. If loading occurs, the standard run-time startup routine (crt0.o) is used and profiling libraries are not searched.
-p1 or -p
Specifying -p1 or -p sets up profiling by periodically sampling the value of the program counter. This option only effects loading. When loading occurs, this option replaces the standard run-time startup routine option with the profiling run-time startup routine (mcrt0.o) and searches the level one profiling library ( libprof1 ).
When profiling happens, the startup routine calls monstartup(3) and produces the file mon.out , which contains execution-profiling data for use with the postprocessor prof command.
If you specify this option, do not also specify -g0 .
-pg
Allows use of the call graph profiling tool gprof .
-pprof method
(TU*X ONLY) Prepares a program for subsequent profiling with the pprof profiler. To use the -pprof method option:

Specify the -wsf option
Omit the -p1 and -p options.
Specify the -pprof option with the type of profiling needed (i or s) after -pprof . To request interval sampling, use -pprof i ; to request for program counter sampling, use -pprof s .
If you specify -pprof s , you can:

Choose to specify the -non_shared option if you want profiling information for archive libraries
Specify -pprof s during linking for existing object files

For complete information about -pprof , see Section 3.92.4.

For More Information:

On the -pprof option, see Section 3.92.4.
On using profiling tools prof and pixie (TU*X ONLY), see Section 5.2.2.
On the -gen_feedback , -feedback file , and -cord options, (TU*X ONLY) see Section 3.34.

3.68 -pthread --- Link Using Threaded Run-Time Library

Use the -pthread option (TU*X ONLY) to request that the linker use threaded libraries. This is usually used with the -reentrancy threaded option (see Section 3.71). The -threads option is a synonym for -pthread .

3.69 -r8 or -real_size 64, -r16 or -real_size 128, -real_size 32 --- Floating-Point Data Size

Use the -r8 or -real_size 64 , -r16 or -real_size 128 , and -real_size 32 options to control the size of REAL and COMPLEX declarations without a kind parameter or size specifier:

-real_size 32
Specifying -real_size 32 defines REAL declarations, constants, functions, and intrinsics as REAL*4 (single precision or KIND=4) and COMPLEX declarations, constants, functions, and intrinsics as COMPLEX*8 (COMPLEX or KIND=4). This is the default unless you specify -r8 (or -real_size 64 ), or -r16 (or -real_size 128 ).
-r8 or -real_size 64
Specifying -real_size 64 or -r8 defines:

REAL declarations, constants, functions, and intrinsics as REAL*8 (DOUBLE PRECISION or KIND=8)
COMPLEX declarations, constants, functions, and intrinsics as COMPLEX*16 (DOUBLE COMPLEX or KIND=8)

If you omit -r8 (and -real_size 64 ), then:

REAL declarations, constants, functions, and intrinsics are defined as REAL*4 (KIND=4).
DOUBLE PRECISION declarations, constants, functions, and intrinsics are defined as REAL*8 (KIND=8).
COMPLEX declarations, constants, functions, and intrinsics are defined as COMPLEX*8 (KIND=4).
DOUBLE COMPLEX declarations, constants, functions, and intrinsics are defined as COMPLEX*16 (KIND=8).

Specifying -r8 or -real_size 64 causes REAL and COMPLEX intrinsic functions to produce REAL*8 (KIND=8) or COMPLEX*16 (KIND=8) results unless their arguments are typed with an explicit KIND type parameter.
For instance, a reference to the CMPLX intrinsic with -real_size 64 produces a COMPLEX*16 (KIND=8) result unless the argument is explicitly typed as REAL*4 (KIND=4) or COMPLEX*8 (KIND=4), in which case the result is COMPLEX*8 (KIND=4).
-r16 or -real_size 128
Specifying -real_size 128 or -r16 defines:

REAL and DOUBLE PRECISION declarations, constants, functions, and intrinsics as REAL*16 (KIND=16)
COMPLEX and DOUBLE COMPLEX declarations, constants, functions, and intrinsics as COMPLEX*32 (KIND=16)

If you omit -r16 (and -real_size 128 ), then:

REAL declarations, constants, functions, and intrinsics are defined as REAL*4 (KIND=4).
DOUBLE PRECISION declarations, constants, functions, and intrinsics are defined as REAL*8 (KIND=8).
COMPLEX declarations, constants, functions, and intrinsics are defined as COMPLEX*8 (KIND=4).
DOUBLE COMPLEX declarations, constants, functions, and intrinsics are defined as COMPLEX*16 (KIND=8).

Specifying -r16 or -real_size 128 causes REAL, DOUBLE PRECISION, COMPLEX, and DOUBLE COMPLEX intrinsic functions to produce REAL*16 (KIND=16) or COMPLEX*32 (KIND=16) results unless their arguments are typed with an explicit KIND type parameter.
For instance, a reference to the CMPLX intrinsic with -real_size 128 produces a COMPLEX*32 (KIND=16) result unless the argument is explicitly typed as REAL*4 (KIND=4) or COMPLEX*8 (KIND=4), in which case the result is COMPLEX*8 (KIND=4).

For More Information:

On data types, see Chapter 9.
On intrinsic functions, see the Compaq Fortran Language Reference Manual.

3.70 -recursive --- Request Recursive Execution

Specify the -recursive option to:

Change the default allocation class for all local variables from STATIC to AUTOMATIC, except for variables that are data-initialized or named in a SAVE statement, or for variables declared as STATIC.
Permit references to a routine name from inside the routine.

A subprogram declared with the RECURSIVE keyword is always recursive (whether you specify or omit the -static option).

Variables declared with the AUTOMATIC statement or attribute always use stack-based storage for all local variables (whether you specify or omit the -recursive or -automatic options).

Specifying -recursive sets -automatic (puts local variables on the run-time stack).

The default is -norecursive .

3.71 -reentrancy keyword --- Control Use of Threaded Run-Time Library

The -reentrancy keyword option (TU*X ONLY) specifies whether code generated for the main program and any Fortran procedures it calls will be relying on threaded or asynchronous reentrancy. The default is -reentrancy none .

-noreentrancy
Same as -reentrancy none .
-reentrancy none
Specifying -reentrancy none informs the Compaq Fortran run-time library that the program will not be relying on threaded or asynchronous reentrancy. The run-time library need not guard against such interrupts inside its own critical regions. Same as -noreentrancy .
-reentrancy asynch
Specifying -reentrancy asynch informs the Compaq Fortran run-time library that the program may contain asynchronous handlers that could call the RTL. The run-time library will guard against asynchronous interrupts inside its own critical regions.
-reentrancy threaded
Specifying -reentrancy threaded informs the Compaq Fortran run-time library that the program is multithreaded, such as programs using the DECthreads library. The run-time library will use thread locking to guard its own critical regions.
To use the threaded libraries, also specify the -threads option (see Section 3.78).

3.72 -S --- Create Assembler File

Specifying the -S option creates an assembler file from the compiled source. The assembler file is created with the base name of the source file with a .s file suffix. Linking does not occur.

Contents

Index

Compaq FortranUser Manual for Tru64 UNIX and Linux Alpha Systems

3.58 -nofor_main --- Allow Non-Fortran Main Program

3.62 -O0, -O1, -O2, -O3, -O4 or -O, -O5 --- Specify Optimization Level

-O0

-O1

-O2

-O3

-O4 or -O

-O5

3.63 -om --- Request Nonshared Object Optimizations

3.67 -p0, -p1 or -p, -pg, and -pprof --- Profiling Support

-p0

-p1 or -p

-pg

-pprof method

3.68 -pthread --- Link Using Threaded Run-Time Library

-real_size 32

-r8 or -real_size 64

-r16 or -real_size 128

3.70 -recursive --- Request Recursive Execution

-noreentrancy

-reentrancy none

-reentrancy asynch

-reentrancy threaded

3.72 -S --- Create Assembler File

Compaq Fortran
User Manual for
Tru64 UNIX and Linux Alpha Systems