DIGITAL Fortran 90
User Manual for
DIGITAL UNIX Systems

C.3 Compilation Summary Section

The final entries on the compiler listing are the compiler options and compiler statistics.

The options shown include the ones specified on the f90 command line and the ones in effect as defaults during the compilation. The compiler statistics are the machine resources used by the compiler.

Example C-3 shows how compiler options and command-line options and compilation statistics appear on the listing.

Example C-3 Sample Compilation Summary

COMPILER OPTIONS BEING USED no -align commons no -align dcommons -align records no -align rec1byte no -align rec2byte no -align rec4byte no -align rec8byte -altparam -arch generic -assume accuracy_sensitive no -assume bigarrays no -assume byterecl no -assume dummy_aliases no -assume minus0 -assume underscore -assume source_include -assume zsize no -automatic -call_shared no -check bounds no -check format no -check omp_bindings no -check output_conversion no -check overflow -check power no -check underflow -convert native -double_size 64 no -d_lines -error_limit 30 no -extend_source no -f66 no -fpconstant -fpe0 -fprm nearest -free -g1 -granularity quadword no -hpf_matmul no -intconstant -integer_size 32 no -ladebug -machine_code -math_library accurate no -module -names lowercase -nearest_neighbor no -nowsf_main no -non_shared no -noinclude -numnodes 0 -O4 -inline speed no -transform_loops no -pipeline -speculate none -tune generic -unroll 0 no -pad_source -parallel manual no -pg -real_size 32 no -recursive -reentrancy none -shadow_width 0 no -shared no -show include -show map no -show wsfinfo no -show hpf_all no -show hpf_punt no -show hpf_nearest no -show hpf_comm no -show hpf_temps no -show hpf_indep no -show hpf_dev no -show hpf_default no -std no -synchronous_exceptions no -syntax_only no -vms -warn alignments no -warn argument_checking no -warn declarations -warn general -warn granularity no -warn truncated_source -warn uncalled -warn uninitialized -warn usage -warning_severity warning no -wsf no -fuse_xref -I path : /usr/lib/cmplrs/hpfrtl/,/usr/include/ -V filename : listing.l -o filename : listing.o COMPILER: DIGITAL Fortran 90 V5.x-xxx-xxxx

A summary of compilation statistics appear at the end of the listing file.

Appendix D
Parallel Compiler Directives Reference Material

This appendix provides reference material for:

OpenMP Fortran Application Program Interface (API) compiler directives
DIGITAL Fortran parallel compiler directives

D.1 OpenMP Fortran API Compiler Directives

The set of OpenMP Fortran Directives allows you to specify the actions taken by the compiler and run-time system when executing a DIGITAL Fortran program in parallel.

For information about the directive format, refer to Chapter 6.

D.1.1 ATOMIC Directive

The ATOMIC directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.

The ATOMIC directive takes the following form:

c$OMP ATOMIC

c
Is one of the following: C (or c), !, or * (see Chapter 6).

Rules and Restrictions for the ATOMIC Directive

This directive applies only to the immediately following statement, which must have one of the following forms:

x = x operator expr x = expr operator x x = intrinsic (x, expr) x = intrinsic (expr, x)

In the preceding statements:

x is a scalar variable of intrinsic type
expr is a scalar expression that does not reference x
intrinsic is either MAX, MIN, IAND, IOR, or IEOR
operator is either +, *, -, /, .AND., .OR., .EQV., or .NEQV.

This directive permits optimization beyond that of the critical section around the assignment. An implementation can replace all ATOMIC directives by enclosing the statement in a critical section. All of these critical sections must use the same unique name.

Only the load and store of x are atomic; the evaluation of expr is not atomic. To avoid race conditions, all updates of the location in parallel must be protected using the ATOMIC directive, except those that are known to be free of race conditions. The function intrinsic, the operator operator, and the assignment must be the intrinsic function, operator, and assignment.

The following restriction applies to the ATOMIC directive:

All references to storage location x must have the same type and type parameters.

Example

The following program avoids race conditions by protecting all simultaneous updates of the location, by multiple threads, with the ATOMIC directive. The ATOMIC directive applies only to the statement immediately following it. As a result, Y is not updated atomically:

c$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(X,Y,INDEX,N) DO I=1,N CALL WORK(XLOCAL, YLOCAL) c$OMP ATOMIC X(INDEX(I)) = X(INDEX(I)) + XLOCAL Y(I) = Y(I) + YLOCAL END DO

D.1.2 BARRIER Directive

The BARRIER directive synchronizes all the threads in a team. When encountered, each thread waits until all of the other threads in the team have reached the barrier.

The BARRIER directive takes the following form:

c$OMP BARRIER

c
Is one of the following: C (or c), !, or * (see Chapter 6).

Rules and Restrictions for the BARRIER Directive

The following restrictions apply to BARRIER directives:

BARRIER directives must be encountered by all threads in a team or by none at all.
BARRIER directives must be encountered in the same order by all threads in a team.

Examples

The directive binding rules call for a BARRIER directive to bind to the closest enclosing PARALLEL directive. For more information about directive binding, see Section D.1.4.

In the following example, the BARRIER directive ensures that all threads have executed the first loop and that it is safe to execute the second loop:

c$OMP PARALLEL c$OMP DO PRIVATE(i) DO i = 1, 100 b(i) = i END DO c$OMP BARRIER c$OMP DO PRIVATE(i) DO i = 1, 100 a(i) = b(101-i) END DO c$OMP END PARALLEL

D.1.3 CRITICAL Directive Construct

The CRITICAL directive restricts access to the enclosed code to only one thread at a time.

The CRITICAL directive takes the following form:

c$OMP CRITICAL [(name)] block c$OMP END CRITICAL [(name)]

c
Is one of the following: C (or c), !, or * (see Chapter 6).

Rules and Restrictions for CRITICAL and END CRITICAL Directives

The optional name argument identifies the critical section.

A thread waits at the beginning of a critical section until no other thread in the team is executing a critical section having the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global entities of the program. If the name conflicts with any other entity, the behavior of the program is undefined.

The following restrictions apply to the CRITICAL directive:

The section of code enclosed by the CRITICAL and END CRITICAL directive pair must be a structured block. It is illegal to branch into or out of the block.
If you specify a name on a CRITICAL directive, you must also specify the same name on the END CRITICAL directive. If no name appears on the CRITICAL directive, no name can appear on the END CRITICAL directive.

Examples

The following example includes several CRITICAL directives, and illustrates a queuing model in which a task is dequeued and worked on. To guard against multiple threads dequeuing the same task, the dequeuing operation must be in a critical section. Because there are two independent queues in this example, each queue is protected by CRITICAL directives having different names, XAXIS and YAXIS, respectively:

c$OMP PARALLEL DEFAULT(PRIVATE) SHARED(X,Y) c$OMP CRITICAL(XAXIS) CALL DEQUEUE(IX_NEXT, X) c$OMP END CRITICAL(XAXIS) CALL WORK(IX_NEXT, X) c$OMP CRITICAL(YAXIS) CALL DEQUEUE(IY_NEXT,Y) c$OMP END CRITICAL(YAXIS) CALL WORK(IY_NEXT, Y) c$OMP END PARALLEL

D.1.4 Directive Binding

The rules that apply to the dynamic binding of directives are:

The DO, SECTIONS, SINGLE, MASTER, and BARRIER directives bind to the dynamically enclosing PARALLEL, if one exists.
The ORDERED directive binds to the dynamically enclosing DO.
The ATOMIC directive enforces exclusive access with respect to ATOMIC directives in all threads, not just the current team.
The CRITICAL directive enforces exclusive access with respect to CRITICAL directives in all threads, not just the current team.
A directive can never bind to any directive outside the closest enclosing PARALLEL.

D.1.5 Directive Nesting

The rules that apply to the dynamic nesting of directives are:

A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team, which is composed of only the current thread unless nested parallelism is enabled.
DO, SECTIONS, and SINGLE directives that bind to the same PARALLEL directive are not allowed to be nested one inside the other.
DO, SECTIONS, and SINGLE directives are not permitted in the dynamic extent of CRITICAL and MASTER directives.
BARRIER directives are not permitted in the dynamic extent of DO, SECTIONS, SINGLE, MASTER, and CRITICAL directives.
MASTER directives are not permitted in the dynamic extent of DO, SECTIONS, and SINGLE directives.
ORDERED sections are not allowed in the dynamic extent of CRITICAL sections.
Any directive set that is legal when executed dynamically inside a PARALLEL region is also legal when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.

Examples

The following program containing nested PARALLEL regions is legal, because the inner and outer DO directives bind to different PARALLEL regions:

c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO I =1, N c$OMP PARALLEL SHARED(I,N) c$OMP DO DO J =1, N CALL WORK(I,J) END DO c$OMP END PARALLEL END DO !$OMP END PARALLEL

The following variation of the preceding example is also legal:

c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO I =1, N CALL SOME_WORK(I,N) END DO c$OMP END PARALLEL . . . SUBROUTINE SOME_WORK(I,N) c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO J =1, N CALL WORK(I,J) END DO c$OMP END PARALLEL RETURN END

D.1.6 DO Directive Construct

The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. The loop that follows a DO directive cannot be a DO WHILE or a DO loop not having loop control. The iterations of the DO loop are distributed across the existing team of threads.

A DO directive takes the following form:

c$OMP DO [clause[[,] clause]...] do_loop [c$OMP END DO [NOWAIT]]

c
Is one of the following: C (or c), !, or * (see Chapter 6).
clause
Is one of the following:

FIRSTPRIVATE(list) LASTPRIVATE(list) ORDERED PRIVATE(list) REDUCTION( operator intrinsic :list ) SCHEDULE(type[,chunk])
FIRSTPRIVATE(list)
The FIRSTPRIVATE clause provides a superset of the functionality provided by the PRIVATE clause. Variables that appear in the list are subject to PRIVATE clause semantics. In addition, private copies of the variables are initialized from the original object existing before the construct.
LASTPRIVATE(list)
This clause provides a superset of the functionality provided by the PRIVATE clause. Variables that appear in list are subject to the PRIVATE clause semantics.
When the LASTPRIVATE clause appears on a DO directive, the thread that executes the sequentially last iteration updates the version of the object it had before the construct. When the LASTPRIVATE clause appears in a SECTIONS directive, the thread that executes the lexically last SECTION updates the version of the object it had before the construct.
Subobjects that are not assigned a value by the last iteration of the DO or the lexically last SECTION of the SECTIONS directive are undefined after the construct.
ORDERED
If ordered sections are contained in the dynamic extent of the DO directive, the ORDERED clause must be present. For more information about ordered sections, see the ORDERED directive described in Section D.1.9.
PRIVATE(list)
The PRIVATE clause declares the variables in list to be private to each thread in a team. The behavior of a variable declared in a PRIVATE clause is as follows:

A new object of the same type is declared once for each thread in the team. The new object is no longer storage associated with the original object.
All references to the original object in the lexical extent of the directive construct are replaced with references to the private object.
Variables defined as PRIVATE are undefined for each thread on entering the construct and the corresponding shared variable is undefined on exit from a parallel construct.
Contents, allocation state, and association status of variables defined as PRIVATE are undefined when they are referenced outside the lexical extent (but inside the dynamic extent) of the construct, unless they are passed as actual arguments to called routines.
REDUCTION(

operator
intrinsic
:list )
This clause performs a reduction on the variables that appear in list, with operator or intrinsic.

operator
Is one of the following: +, *, -, .AND., .OR., .EQV., or .NEQV.
intrinsic
Is one of the following: MAX, MIN, IAND, IOR, or IEOR
:list
The list of variables on which the reduction is performed.
Variables in list must be named scalar variables of intrinsic type. Variables that appear in a REDUCTION clause must be SHARED in the enclosing context. A private copy of each variable in list is created for each thread as if the PRIVATE clause had been used. The private copy is initialized according to the operator. See Table D-1 for more information.

At the end of the REDUCTION, the shared variable is updated to reflect the result of combining the original value of the (shared) reduction variable with the final value of each of the private copies using the operator specified. The reduction operators are all associative (except for subtraction), and the compiler can freely reassociate the computation of the final value (the partial results of a subtraction reduction are added to form the final value).
The value of the shared variable becomes undefined when the first thread reaches the clause containing the reduction, and it remains so until the reduction computation is complete. Normally, the computation is complete at the end of the REDUCTION construct. If the REDUCTION clause is used on a construct to which NOWAIT is also applied, however, the shared variable remains undefined until a barrier synchronization has been performed. This ensures that all the threads have completed the REDUCTION clause.
The REDUCTION clause is intended to be used on a region or worksharing construct in which the reduction variable is used only in reduction statements having one of the following forms:
x = x operator expr x = expr operator x (except for subtraction) x = intrinsic (x,expr) x = intrinsic (expr, x)
Some reductions can be expressed in other forms. For instance, a MAX reduction might be expressed as follows:
IF (x .LT. expr) x = expr
Alternatively, the reduction might be hidden inside a subroutine call. Be careful that the operator you specify in the REDUCTION clause matches the reduction operation.
Table D-1 lists the operators and intrinsics and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable.
Table D-1 Operator and Intrinsic Initialization Values
Operator/Intrinsic Initialization Value

+ 0

* 1

- 0

.AND. .TRUE.

.OR. .FALSE.

.EQV. .TRUE.

.NEQV. .FALSE.

MAX Smallest representable number

MIN Largest representable number

IAND All bits on

IOR 0

IEOR 0

You can specify any number of reduction clauses on the directive, but a variable can appear only once in a REDUCTION clause for that directive.
SCHEDULE(type[,chunk])
This clause specifies how iterations of the DO loop are divided among the threads of the team. Within the SCHEDULE(type[,chunk]) clause syntax, type can be one of the following:

Type Effect

STATIC When (STATIC, chunk) is specified, iterations are divided into pieces of a size specified by chunk. The pieces are statically dispatched to threads in the team in a round-robin fashion in the order of the thread number. Chunk must be a scalar integer expression.
When chunk is not specified, the iterations are first divided into contiguous pieces by dividing the number of iterations by the number of threads in the team. Each piece is then dispatched to a thread before loop execution begins.

DYNAMIC When (DYNAMIC, chunk) is specified, the iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically obtains the next set of iterations.
When no chunk is specified, it defaults to 1.

GUIDED When (GUIDED, chunk) is specified, the chunk size is reduced exponentially with each succeeding dispatch. Chunk specifies the minimum number of iterations to dispatch each time. If there are less than chunk iterations remaining, the rest are dispatched.
When no chunk is specified, it defaults to 1.

RUNTIME When (RUNTIME) is specified, the decision regarding scheduling is deferred until run time. The schedule type and chunk size can be chosen at run time by using the OMP_SCHEDULE environment variable.
When (RUNTIME) is specified, it is illegal to specify chunk.

In the absence of the SCHEDULE clause, the default schedule type is STATIC.

**Table D-1 Operator and Intrinsic Initialization Values**
Operator/Intrinsic	Initialization Value
+	0
*	1
-	0
.AND.	.TRUE.
.OR.	.FALSE.
.EQV.	.TRUE.
.NEQV.	.FALSE.
MAX	Smallest representable number
MIN	Largest representable number
IAND	All bits on
IOR	0
IEOR	0

Type	Effect
STATIC	When (STATIC, chunk) is specified, iterations are divided into pieces of a size specified by chunk. The pieces are statically dispatched to threads in the team in a round-robin fashion in the order of the thread number. Chunk must be a scalar integer expression. When chunk is not specified, the iterations are first divided into contiguous pieces by dividing the number of iterations by the number of threads in the team. Each piece is then dispatched to a thread before loop execution begins.
DYNAMIC	When (DYNAMIC, chunk) is specified, the iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically obtains the next set of iterations. When no chunk is specified, it defaults to 1.
GUIDED	When (GUIDED, chunk) is specified, the chunk size is reduced exponentially with each succeeding dispatch. Chunk specifies the minimum number of iterations to dispatch each time. If there are less than chunk iterations remaining, the rest are dispatched. When no chunk is specified, it defaults to 1.
RUNTIME	When (RUNTIME) is specified, the decision regarding scheduling is deferred until run time. The schedule type and chunk size can be chosen at run time by using the OMP_SCHEDULE environment variable. When (RUNTIME) is specified, it is illegal to specify chunk.

Rules and Restrictions for DO and END DO Directives

If you do not specify an END DO directive, an END DO directive is assumed at the end of the DO loop. If you specify NOWAIT on the END DO directive, threads do not synchronize at the end of the parallel loop. Threads that finish early proceed straight to the instruction following the loop without waiting for the other members of the team to finish the DO directive.

Parallel DO loop control variables are block-level entities within the DO loop. If the loop control variable also appears in the LASTPRIVATE list of the parallel DO, it is copied out to a variable of the same name in the enclosing PARALLEL region. The variable in the enclosing PARALLEL region must be SHARED if it is specified on the LASTPRIVATE list of a DO directive.

The following restrictions apply to the DO directives:

It is illegal to branch out of a DO loop associated with a DO directive.
The values of the loop control parameters of the DO loop associated with a DO directive must be the same for all the threads in the team.
The DO loop iteration variable must be of type integer.
If used, the END DO directive must appear immediately after the end of the loop.
Only a single SCHEDULE clause can appear on a DO directive.
Only a single ORDERED clause can appear on a DO directive.
DO directives must be encountered by all threads in a team or by none at all.
DO directives must be encountered in the same order by all threads in a team.

Examples

In the following example, the loop iteration variable is private by default, and it is not necessary to declare it explicitly. The END DO directive is optional:

c$OMP PARALLEL c$OMP DO DO I=1,N B(I) = (A(I) + A(I-1)) / 2.0 END DO c$OMP END DO c$OMP END PARALLEL

If there are multiple independent loops within a parallel region, you can use the NOWAIT clause, shown in the following example, to avoid the implied BARRIER at the end of the DO directive:

c$OMP PARALLEL c$OMP DO DO I=2,N B(I) = (A(I) + A(I-1)) / 2.0 END DO c$OMP END DO NOWAIT c$OMP DO DO I=1,M Y(I) = SQRT(Z(I)) END DO c$OMP END DO NOWAIT c$OMP END PARALLEL

Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables as arguments to a LASTPRIVATE clause so that the values of the variables are the same as when the loop is executed sequentially. In the following example, the value of I at the end of the parallel region will equal N+1, as in the sequential case:

c$OMP PARALLEL c$OMP DO LASTPRIVATE(I) DO I=1,N A(I) = B(I) + C(I) END DO c$OMP END PARALLEL CALL REVERSE(I)

Ordered sections are useful for sequentially ordering the output from work that is done in parallel. Assuming that a reentrant I/O library exists, the following program prints out the indexes in sequential order:

c$OMP DO ORDERED SCHEDULE(DYNAMIC) DO I=LB,UB,ST CALL WORK(I) END DO . . . SUBROUTINE WORK(K) c$OMP ORDERED WRITE(*,*) K c$OMP END ORDERED

Contents

Index

DIGITAL Fortran 90User Manual for DIGITAL UNIX Systems

C.3 Compilation Summary Section

Appendix DParallel Compiler Directives Reference Material

D.1 OpenMP Fortran API Compiler Directives

D.1.1 ATOMIC Directive

c

D.1.2 BARRIER Directive

c

D.1.3 CRITICAL Directive Construct

c

D.1.4 Directive Binding

D.1.5 Directive Nesting

D.1.6 DO Directive Construct

c

clause

DIGITAL Fortran 90
User Manual for
DIGITAL UNIX Systems

Appendix D
Parallel Compiler Directives Reference Material