Previous | Contents | Index |
When you specify -non_shared to request a nonshared object file, you can specify the -om option to request code optimizations after linking, including nop (No Operation) removal, .lita removal, and reallocation of common symbols. This option also positions the global pointer register so the maximum addresses fall in the global-pointer window.
For More Information:
On the
-wl,arg
command-line options that enable nonshared object file code
optimizations, see Section 3.59.
5.8.9 Arithmetic Reordering Optimizations
If you use the -fp_reorder option (same as ( -assume noaccuracy_sensitive ), DIGITAL Fortran 90 may reorder code (based on algebraic identities) to improve performance. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic:
X = (A + B) + C X = A + (B + C) |
The results can be slightly different from the default -nofp_reorder because of the way intermediate results are rounded. However, the -no_fp_reorder results are not categorically less accurate than those gained by the default. In fact, dot product summations using -fp_reorder can produce more accurate results than those using -no_fp_reorder .
The effect of -fp_reorder is important when DIGITAL Fortran 90 hoists divide operations out of a loop. If -fp_reorder is in effect, the unoptimized loop becomes the optimized loop:
Unoptimized Code | Optimized Code |
---|---|
T = 1/V | |
DO I=1,N | DO I=1,N |
. | . |
. | . |
. | . |
B(I) = A(I)/V | B(I) = A(I)*T |
END DO | END DO |
The transformation in the optimized loop increases performance
significantly, and loses little or no accuracy. However, it does have
the potential for raising overflow or underflow arithmetic exceptions.
5.8.10 Dummy Aliasing Assumption
Some programs compiled with DIGITAL Fortran 90 (or DIGITAL Fortran 77) may have results that differ from the results of other Fortran compilers. Such programs may be aliasing dummy arguments to each other or to a variable in a common block or shared through use association, and at least one variable access is a store.
This program behavior is prohibited in programs conforming to the Fortran 90 standard, but not by DIGITAL Fortran 90. Other versions of Fortran allow dummy aliases and check for them to ensure correct results. However, DIGITAL Fortran 90 assumes that no dummy aliasing will occur, and it can ignore potential data dependencies from this source in favor of faster execution.
The DIGITAL Fortran 90 default is safe for programs conforming to the Fortran 90 standard. It will improve performance of these programs, because the standard prohibits such programs from passing overlapped variables or arrays as actual arguments if either is assigned in the execution of the program unit.
The -assume dummy_aliases option allows dummy aliasing. It ensures correct results by assuming the exact order of the references to dummy and common variables is required. Program units taking advantage of this behavior can produce inaccurate results if compiled with -assume nodummy_aliases .
Example 5-1 is taken from the DAXPY routine in the Fortran-77 version of the Basic Linear Algebra Subroutines (BLAS).
Example 5-1 Using the -assume dummy_aliases Option |
---|
SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY) C Constant times a vector plus a vector. C uses unrolled loops for increments equal to 1. DOUBLE PRECISION DX(1), DY(1), DA INTEGER I,INCX,INCY,IX,IY,M,MP1,N C IF (N.LE.0) RETURN IF (DA.EQ.0.0) RETURN IF (INCX.EQ.1.AND.INCY.EQ.1) GOTO 20 C Code for unequal increments or equal increments C not equal to 1. . . . RETURN C Code for both increments equal to 1. C Clean-up loop 20 M = MOD(N,4) IF (M.EQ.0) GOTO 40 DO I=1,M DY(I) = DY(I) + DA*DX(I) END DO IF (N.LT.4) RETURN 40 MP1 = M + 1 DO I = MP1, N, 4 DY(I) = DY(I) + DA*DX(I) DY(I + 1) = DY(I + 1) + DA*DX(I + 1) DY(I + 2) = DY(I + 2) + DA*DX(I + 2) DY(I + 3) = DY(I + 3) + DA*DX(I + 3) END DO RETURN END SUBROUTINE |
The second DO loop contains assignments to DY. If DY is overlapped with DA, any of the assignments to DY might give DA a new value, and this overlap would affect the results. If this overlap is desired, then DA must be fetched from memory each time it is referenced. The repetitious fetching of DA degrades performance.
Linking Routines with Opposite Settings
You can link routines compiled with the -assume dummy_aliases option to routines compiled with -assume nodummy_aliases . For example, if only one routine is called with dummy aliases, you can use -assume dummy_aliases when compiling that routine, and compile all the other routines with -assume nodummy_aliases to gain the performance value of that option.
Programs calling DAXPY with DA overlapping DY do not conform to the FORTRAN-77 and Fortran 90 standards. However, they are supported if -assume dummy_aliases was used to compile the DAXPY routine.
This chapter describes how to use two sets of parallel compiler directives:
You use these compiler directives in programs to generate code that executes in parallel on a multiprocessor, multithreaded, shared-memory DIGITAL UNIX system on an Alpha processor.
The compiler can recognize one set of parallel compiler directives or the other, but not both in the same program. |
In addition, the following topics apply to both the OpenMP Fortran API and the DIGITAL Fortran parallel compiler directives:
For reference material on both sets of parallel compiler directives,
see Appendix D .
6.1 OpenMP Fortran API Compiler Directives
To enable the use of OpenMP Fortran API compiler directives in your program, you must include the -omp compiler option on your f90 command:
% f90 -omp prog.f -o prog |
Directives are structured so that they appear to be DIGITAL Fortran comments. The format of an OpenMP Fortran API compiler directive is:
prefix directive_name [clause[[,] clause]...] |
All OpenMP Fortran API compiler directives must begin with a directive prefix. Directives are not case-sensitive. Clauses can appear in any order after the directive name and can be repeated as needed, subject to the restrictions of individual clauses.
Directives cannot be embedded within continued statements, and
statements cannot be embedded within directives. Comments cannot appear
on the same line as a directive.
6.1.2.1 Directive Prefixes
The directive prefix you use depends on the source form you use in your program. Use the !$OMP prefix when compiling either fixed source form or free source form programs. Use the C$OMP and the *$OMP prefixes only when compiling fixed source form programs.
Fixed Source Form
For fixed source form programs, the prefix is one of the following: !$OMP, C$OMP, or *$OMP.
Prefixes must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.
Initial directive lines must have a space or zero in column six, and continuation directive lines must have a character other than a space or a zero in column six. For example, the following formats for specifying directives are equivalent.
c23456789 !$OMP PARALLEL DO SHARED(A,B,C) !Is the same as... c$OMP PARALLEL DO !Which is the same as... c$OMP+SHARED(A,B,C) c$OMP PARALLEL DO SHARED(A,B,C) |
Free Source Form
For free source form programs, use the prefix !$OMP. The prefix can appear in any column as long as it is preceded only by white space. It must appear as a single string with no intervening white space. Free-form source rules apply to the directive line.
Initial directive lines must have a space after the prefix. Continued directive lines must have an ampersand as the last nonblank character on the line. Continuation directive lines can have an ampersand after the directive prefix with optional white space before and after the ampersand. For example, the following formats for specifying directives are equivalent:
!$OMP PARALLEL DO & !$OMP SHARED(A,B,C) !The same as... !$OMP PARALLEL & !$OMP&DO SHARED(A,B,C) !Which is the same as... !$OMP PARALLEL DO SHARED(A,B,C) |
OpenMP Fortran API allows you to conditionally compile DIGITAL Fortran statements. The directive prefix you use for conditional compilation statements depends on the source form you use in your program:
The prefix must be followed by a legal DIGITAL Fortran statement on the same line. If you have used the -omp compiler option, the prefix is replaced by two spaces and the rest of the line is treated as a normal DIGITAL Fortran statement during compilations. You can also use the C preprocessor macro _OPENMP for conditional compilation.
Fixed Source Form
For fixed source form programs, the conditional compilation prefix is one of the following: !$ , C$ (or c$), or *$.
The prefix must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.
Initial lines must have a space or zero in column six, and continuation lines must have a character other than a space or zero in column six. For example, the following forms for specifying conditional compilation are equivalent:
c23456789 !$ IAM = OMP_GET_THREAD_NUM() + !$ * INDEX #IFDEF _OPENMP IAM = OMP_GET_THREAD_NUM() + * INDEX #ENDIF |
Free Source Form
The free source form conditional compilation prefix is !$. This prefix can appear in any column as long as it is preceded only by white space. It must appear as a single word with no intervening white space. Free-form source rules apply to the directive line.
Initial lines must have a space after the prefix. Continued lines must
have an ampersand as the last nonblank character on the line.
Continuation lines can have an ampersand after the prefix with optional
white space before and after the ampersand.
6.1.3 Directive Summary Descriptions
Table 6-1 provides summary descriptions of the OpenMP Fortran API compiler directives. For complete information about the OpenMP Fortran API compiler directives, see Appendix D.
Directive Format |
Description |
---|---|
prefix ATOMIC | |
This directive defines a synchronization construct that ensures that a specific memory location is updated atomically. This directive applies only to the immediately following statement. | |
prefix BARRIER | |
This directive defines a synchronization construct that synchronizes all the threads in a team. When encountered, each thread waits until all of the threads in the team have reached the barrier. | |
prefix CRITICAL [(name)]
block prefix END CRITICAL [(name)] |
|
These directives define a synchronization construct that restricts
access to the contained code to only one thread at a time. The optional
name argument identifies the critical section:
A thread waits at the beginning of a critical section until no other thread in the team is executing a critical section having the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global to the program. |
|
prefix DO [clause[[,] clause] ...]
do_loop [prefix END DO [NOWAIT]] |
|
These directives define a worksharing construct that specifies that the
iterations of the DO loop are executed in parallel. The iterations of
the do_loop are dispatched across the team of threads.
The DO directive takes an optional comma-separated list of clauses that specifies:
In addition, the ORDERED clause must be specified if the ORDERED directive appears in the dynamic extent of the DO directive. If the END DO directive is not specified, it is assumed to be present at the end of the DO loop, and threads synchronize at that point. If NOWAIT is specified, threads do not synchronize at the end of the DO loop. |
|
prefix FLUSH [(var[,var]...)] | |
This directive defines a synchronization construct that identifies the
precise point at which a consistent view of memory is provided.
The FLUSH directive takes an optional comma-separated list of named variables to be flushed. |
|
prefix MASTER
block prefix END MASTER |
|
These directives define a synchronization construct that specifies that
the contained block of code is to be executed only by the master thread
of the team.
The other threads of the team skip the code and continue execution. There is no implied barrier at the END MASTER directive. |
|
prefix ORDERED
block prefix END ORDERED |
|
These directives define a synchronization construct that specifies that the contained block of code is executed in the order in which iterations would be executed during a sequential execution of the loop. Only one thread at a time is allowed in an ordered section, and threads enter in the order of the loop iterations. | |
prefix PARALLEL [clause[[,] clause] ...]
block prefix END PARALLEL |
|
These directives define a parallel construct that is a region of a
program that must be executed by a team of threads until the END
PARALLEL directive is encountered. Use the worksharing directives such
as DO, SECTIONS, and SINGLE to divide the statements in the parallel
region into units of work and to distribute those units so that each
unit is executed by one thread.
The PARALLEL directive takes an optional comma-separated list of clauses that specifies:
|
|
prefix PARALLEL DO [clause[[,] clause] ...]
do_loop prefix END PARALLEL DO |
|
These directives define a combined parallel/worksharing construct that
is an abbreviated form of specifying a parallel region that contains a
single DO directive.
The PARALLEL DO directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and DO directives. |
|
prefix PARALLEL SECTIONS [clause[[,] clause] ...]
block prefix END PARALLEL SECTIONS |
|
These directives define a combined parallel/worksharing construct that
is an abbreviated form of specifying a parallel region that contains a
single SECTIONS directive. The semantics are identical to explicitly
specifying the PARALLEL directive immediately followed by a SECTIONS
directive.
The PARALLEL SECTIONS directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and SECTIONS directives. |
|
prefix SECTIONS [clause[[,] clause] ...]
[prefix SECTION] block [prefix SECTION block ] . . . prefix END SECTIONS [NOWAIT] |
|
These directives define a worksharing construct that specifies that the
enclosed sections of code are to be divided among threads in the team.
Each section is executed once by some thread in the team.
The SECTIONS directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION. When the END SECTIONS directive is encountered, threads synchronize at that point unless NOWAIT is specified. |
|
prefix SINGLE [clause[[,] clause] ...]
block prefix END SINGLE [NOWAIT] |
|
These directives define a worksharing construct that specifies that the
enclosed code is to be executed by only one thread in the team. Those
threads not executing the code wait at the END SINGLE directive unless
NOWAIT is specified.
The SINGLE directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE or FIRSTPRIVATE. |
|
prefix THREADPRIVATE(/cb/[,/cb/] ...) | |
This data environment directive makes named common blocks private to a thread, but global within the thread. |
Previous | Next | Contents | Index |