Previous | Contents | Index |
The final entries on the compiler listing are the compiler options and compiler statistics.
The options shown include the ones specified on the f90 command line and the ones in effect as defaults during the compilation. The compiler statistics are the machine resources used by the compiler.
Example C-3 shows how compiler options and command-line options and compilation statistics appear on the listing.
Example C-3 Sample Compilation Summary |
---|
COMPILER OPTIONS BEING USED no -align commons no -align dcommons -align records no -align rec1byte no -align rec2byte no -align rec4byte no -align rec8byte -altparam -arch generic -assume accuracy_sensitive no -assume bigarrays no -assume byterecl no -assume dummy_aliases no -assume minus0 -assume underscore -assume source_include -assume zsize no -automatic -call_shared no -check bounds no -check format no -check omp_bindings no -check output_conversion no -check overflow -check power no -check underflow -convert native -double_size 64 no -d_lines -error_limit 30 no -extend_source no -f66 no -fpconstant -fpe0 -fprm nearest -free -g1 -granularity quadword no -hpf_matmul no -intconstant -integer_size 32 no -ladebug -machine_code -math_library accurate no -module -names lowercase -nearest_neighbor no -nowsf_main no -non_shared no -noinclude -numnodes 0 -O4 -inline speed no -transform_loops no -pipeline -speculate none -tune generic -unroll 0 no -pad_source -parallel manual no -pg -real_size 32 no -recursive -reentrancy none -shadow_width 0 no -shared no -show include -show map no -show wsfinfo no -show hpf_all no -show hpf_punt no -show hpf_nearest no -show hpf_comm no -show hpf_temps no -show hpf_indep no -show hpf_dev no -show hpf_default no -std no -synchronous_exceptions no -syntax_only no -vms -warn alignments no -warn argument_checking no -warn declarations -warn general -warn granularity no -warn truncated_source -warn uncalled -warn uninitialized -warn usage -warning_severity warning no -wsf no -fuse_xref -I path : /usr/lib/cmplrs/hpfrtl/,/usr/include/ -V filename : listing.l -o filename : listing.o COMPILER: DIGITAL Fortran 90 V5.x-xxx-xxxx |
A summary of compilation statistics appear at the end of the listing file.
This appendix provides reference material for:
The set of OpenMP Fortran Directives allows you to specify the actions taken by the compiler and run-time system when executing a DIGITAL Fortran program in parallel.
For information about the directive format, refer to Chapter 6.
D.1.1 ATOMIC Directive
The ATOMIC directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
The ATOMIC directive takes the following form:
|
c
Is one of the following: C (or c), !, or * (see Chapter 6).
Rules and Restrictions for the ATOMIC Directive
This directive applies only to the immediately following statement, which must have one of the following forms:
x = x operator expr x = expr operator x x = intrinsic (x, expr) x = intrinsic (expr, x) |
In the preceding statements:
This directive permits optimization beyond that of the critical section around the assignment. An implementation can replace all ATOMIC directives by enclosing the statement in a critical section. All of these critical sections must use the same unique name.
Only the load and store of x are atomic; the evaluation of expr is not atomic. To avoid race conditions, all updates of the location in parallel must be protected using the ATOMIC directive, except those that are known to be free of race conditions. The function intrinsic, the operator operator, and the assignment must be the intrinsic function, operator, and assignment.
The following restriction applies to the ATOMIC directive:
Example
The following program avoids race conditions by protecting all simultaneous updates of the location, by multiple threads, with the ATOMIC directive. The ATOMIC directive applies only to the statement immediately following it. As a result, Y is not updated atomically:
c$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(X,Y,INDEX,N) DO I=1,N CALL WORK(XLOCAL, YLOCAL) c$OMP ATOMIC X(INDEX(I)) = X(INDEX(I)) + XLOCAL Y(I) = Y(I) + YLOCAL END DO |
The BARRIER directive synchronizes all the threads in a team. When encountered, each thread waits until all of the other threads in the team have reached the barrier.
The BARRIER directive takes the following form:
|
c
Is one of the following: C (or c), !, or * (see Chapter 6).
Rules and Restrictions for the BARRIER Directive
The following restrictions apply to BARRIER directives:
Examples
The directive binding rules call for a BARRIER directive to bind to the closest enclosing PARALLEL directive. For more information about directive binding, see Section D.1.4.
In the following example, the BARRIER directive ensures that all threads have executed the first loop and that it is safe to execute the second loop:
c$OMP PARALLEL c$OMP DO PRIVATE(i) DO i = 1, 100 b(i) = i END DO c$OMP BARRIER c$OMP DO PRIVATE(i) DO i = 1, 100 a(i) = b(101-i) END DO c$OMP END PARALLEL |
The CRITICAL directive restricts access to the enclosed code to only one thread at a time.
The CRITICAL directive takes the following form:
|
c
Is one of the following: C (or c), !, or * (see Chapter 6).
Rules and Restrictions for CRITICAL and END CRITICAL
Directives
The optional name argument identifies the critical section.
A thread waits at the beginning of a critical section until no other thread in the team is executing a critical section having the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global entities of the program. If the name conflicts with any other entity, the behavior of the program is undefined.
The following restrictions apply to the CRITICAL directive:
Examples
The following example includes several CRITICAL directives, and illustrates a queuing model in which a task is dequeued and worked on. To guard against multiple threads dequeuing the same task, the dequeuing operation must be in a critical section. Because there are two independent queues in this example, each queue is protected by CRITICAL directives having different names, XAXIS and YAXIS, respectively:
c$OMP PARALLEL DEFAULT(PRIVATE) SHARED(X,Y) c$OMP CRITICAL(XAXIS) CALL DEQUEUE(IX_NEXT, X) c$OMP END CRITICAL(XAXIS) CALL WORK(IX_NEXT, X) c$OMP CRITICAL(YAXIS) CALL DEQUEUE(IY_NEXT,Y) c$OMP END CRITICAL(YAXIS) CALL WORK(IY_NEXT, Y) c$OMP END PARALLEL |
The rules that apply to the dynamic binding of directives are:
The rules that apply to the dynamic nesting of directives are:
Examples
The following program containing nested PARALLEL regions is legal, because the inner and outer DO directives bind to different PARALLEL regions:
c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO I =1, N c$OMP PARALLEL SHARED(I,N) c$OMP DO DO J =1, N CALL WORK(I,J) END DO c$OMP END PARALLEL END DO !$OMP END PARALLEL |
The following variation of the preceding example is also legal:
c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO I =1, N CALL SOME_WORK(I,N) END DO c$OMP END PARALLEL . . . SUBROUTINE SOME_WORK(I,N) c$OMP PARALLEL DEFAULT(SHARED) c$OMP DO DO J =1, N CALL WORK(I,J) END DO c$OMP END PARALLEL RETURN END |
The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. The loop that follows a DO directive cannot be a DO WHILE or a DO loop not having loop control. The iterations of the DO loop are distributed across the existing team of threads.
A DO directive takes the following form:
|
c
Is one of the following: C (or c), !, or * (see Chapter 6).clause
Is one of the following:
- FIRSTPRIVATE(list)
- LASTPRIVATE(list)
- ORDERED
- PRIVATE(list)
- REDUCTION( operator
- intrinsic :list )
- SCHEDULE(type[,chunk])
- FIRSTPRIVATE(list)
The FIRSTPRIVATE clause provides a superset of the functionality provided by the PRIVATE clause. Variables that appear in the list are subject to PRIVATE clause semantics. In addition, private copies of the variables are initialized from the original object existing before the construct.- LASTPRIVATE(list)
This clause provides a superset of the functionality provided by the PRIVATE clause. Variables that appear in list are subject to the PRIVATE clause semantics.
When the LASTPRIVATE clause appears on a DO directive, the thread that executes the sequentially last iteration updates the version of the object it had before the construct. When the LASTPRIVATE clause appears in a SECTIONS directive, the thread that executes the lexically last SECTION updates the version of the object it had before the construct.
Subobjects that are not assigned a value by the last iteration of the DO or the lexically last SECTION of the SECTIONS directive are undefined after the construct.- ORDERED
If ordered sections are contained in the dynamic extent of the DO directive, the ORDERED clause must be present. For more information about ordered sections, see the ORDERED directive described in Section D.1.9.- PRIVATE(list)
The PRIVATE clause declares the variables in list to be private to each thread in a team. The behavior of a variable declared in a PRIVATE clause is as follows:
- A new object of the same type is declared once for each thread in the team. The new object is no longer storage associated with the original object.
- All references to the original object in the lexical extent of the directive construct are replaced with references to the private object.
- Variables defined as PRIVATE are undefined for each thread on entering the construct and the corresponding shared variable is undefined on exit from a parallel construct.
- Contents, allocation state, and association status of variables defined as PRIVATE are undefined when they are referenced outside the lexical extent (but inside the dynamic extent) of the construct, unless they are passed as actual arguments to called routines.
- REDUCTION(
:list )
- operator
- intrinsic
This clause performs a reduction on the variables that appear in list, with operator or intrinsic.
- operator
Is one of the following: +, *, -, .AND., .OR., .EQV., or .NEQV.- intrinsic
Is one of the following: MAX, MIN, IAND, IOR, or IEOR- :list
The list of variables on which the reduction is performed.
Variables in list must be named scalar variables of intrinsic type. Variables that appear in a REDUCTION clause must be SHARED in the enclosing context. A private copy of each variable in list is created for each thread as if the PRIVATE clause had been used. The private copy is initialized according to the operator. See Table D-1 for more information.
At the end of the REDUCTION, the shared variable is updated to reflect the result of combining the original value of the (shared) reduction variable with the final value of each of the private copies using the operator specified. The reduction operators are all associative (except for subtraction), and the compiler can freely reassociate the computation of the final value (the partial results of a subtraction reduction are added to form the final value).
The value of the shared variable becomes undefined when the first thread reaches the clause containing the reduction, and it remains so until the reduction computation is complete. Normally, the computation is complete at the end of the REDUCTION construct. If the REDUCTION clause is used on a construct to which NOWAIT is also applied, however, the shared variable remains undefined until a barrier synchronization has been performed. This ensures that all the threads have completed the REDUCTION clause.
The REDUCTION clause is intended to be used on a region or worksharing construct in which the reduction variable is used only in reduction statements having one of the following forms:
x = x operator expr x = expr operator x (except for subtraction) x = intrinsic (x,expr) x = intrinsic (expr, x)
Some reductions can be expressed in other forms. For instance, a MAX reduction might be expressed as follows:
IF (x .LT. expr) x = expr
Alternatively, the reduction might be hidden inside a subroutine call. Be careful that the operator you specify in the REDUCTION clause matches the reduction operation.
Table D-1 lists the operators and intrinsics and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable.
Table D-1 Operator and Intrinsic Initialization Values Operator/Intrinsic Initialization Value + 0 * 1 - 0 .AND. .TRUE. .OR. .FALSE. .EQV. .TRUE. .NEQV. .FALSE. MAX Smallest representable number MIN Largest representable number IAND All bits on IOR 0 IEOR 0
You can specify any number of reduction clauses on the directive, but a variable can appear only once in a REDUCTION clause for that directive.- SCHEDULE(type[,chunk])
This clause specifies how iterations of the DO loop are divided among the threads of the team. Within the SCHEDULE(type[,chunk]) clause syntax, type can be one of the following:
Type Effect STATIC When (STATIC, chunk) is specified, iterations are divided into pieces of a size specified by chunk. The pieces are statically dispatched to threads in the team in a round-robin fashion in the order of the thread number. Chunk must be a scalar integer expression. When chunk is not specified, the iterations are first divided into contiguous pieces by dividing the number of iterations by the number of threads in the team. Each piece is then dispatched to a thread before loop execution begins.
DYNAMIC When (DYNAMIC, chunk) is specified, the iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically obtains the next set of iterations. When no chunk is specified, it defaults to 1.
GUIDED When (GUIDED, chunk) is specified, the chunk size is reduced exponentially with each succeeding dispatch. Chunk specifies the minimum number of iterations to dispatch each time. If there are less than chunk iterations remaining, the rest are dispatched. When no chunk is specified, it defaults to 1.
RUNTIME When (RUNTIME) is specified, the decision regarding scheduling is deferred until run time. The schedule type and chunk size can be chosen at run time by using the OMP_SCHEDULE environment variable. When (RUNTIME) is specified, it is illegal to specify chunk.
In the absence of the SCHEDULE clause, the default schedule type is STATIC.
Rules and Restrictions for DO and END DO Directives
If you do not specify an END DO directive, an END DO directive is assumed at the end of the DO loop. If you specify NOWAIT on the END DO directive, threads do not synchronize at the end of the parallel loop. Threads that finish early proceed straight to the instruction following the loop without waiting for the other members of the team to finish the DO directive.
Parallel DO loop control variables are block-level entities within the DO loop. If the loop control variable also appears in the LASTPRIVATE list of the parallel DO, it is copied out to a variable of the same name in the enclosing PARALLEL region. The variable in the enclosing PARALLEL region must be SHARED if it is specified on the LASTPRIVATE list of a DO directive.
The following restrictions apply to the DO directives:
Examples
In the following example, the loop iteration variable is private by default, and it is not necessary to declare it explicitly. The END DO directive is optional:
c$OMP PARALLEL c$OMP DO DO I=1,N B(I) = (A(I) + A(I-1)) / 2.0 END DO c$OMP END DO c$OMP END PARALLEL |
If there are multiple independent loops within a parallel region, you can use the NOWAIT clause, shown in the following example, to avoid the implied BARRIER at the end of the DO directive:
c$OMP PARALLEL c$OMP DO DO I=2,N B(I) = (A(I) + A(I-1)) / 2.0 END DO c$OMP END DO NOWAIT c$OMP DO DO I=1,M Y(I) = SQRT(Z(I)) END DO c$OMP END DO NOWAIT c$OMP END PARALLEL |
Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables as arguments to a LASTPRIVATE clause so that the values of the variables are the same as when the loop is executed sequentially. In the following example, the value of I at the end of the parallel region will equal N+1, as in the sequential case:
c$OMP PARALLEL c$OMP DO LASTPRIVATE(I) DO I=1,N A(I) = B(I) + C(I) END DO c$OMP END PARALLEL CALL REVERSE(I) |
Ordered sections are useful for sequentially ordering the output from work that is done in parallel. Assuming that a reentrant I/O library exists, the following program prints out the indexes in sequential order:
c$OMP DO ORDERED SCHEDULE(DYNAMIC) DO I=LB,UB,ST CALL WORK(I) END DO . . . SUBROUTINE WORK(K) c$OMP ORDERED WRITE(*,*) K c$OMP END ORDERED |
Previous | Next | Contents | Index |