Visible to Intel only — GUID: GUID-5C5B651F-F94D-4070-BB55-6087CFC5902E
Visible to Intel only — GUID: GUID-5C5B651F-F94D-4070-BB55-6087CFC5902E
Language Support for Auto-Parallelization
This topic addresses specific C++ language features that better help to parallelize code.
Annotating Functions with Declarations
Annotate your functions with the declaration:
Linux
__declspec(concurrency_safe(cost(cycles) | profitable))
Windows
__attribute__(concurrency_safe(cost(cycles) | profitable))
The declaration guides the compiler to parallelize more loops and straight-line code.
Using the concurrency_safe attribute indicates to the compiler that there are no unaffected side-effects and no illegal (or improperly synchronized) memory access interfences among multiple invocations of the annotated function or between an invocation of this annotated function and other statements in the program, if they are executed concurrently.
The cost clause specifies the execution cycles of the annotated function for the compiler to perform parallelization profitability analysis while compiling its enclosing loops or blocks. The profitable clause indicates that the loops or blocks that contain calls to the annotated function are profitable to parallelize.
The following example illustrates the use of this declaration:
#define N 10
#define M 40
#define NValue N
#if defined(COSTLOW)
// The function cost is ~5 cycles, the loop calling "foo" will not be parallellized
__declspec(concurrency_safe(cost(5)))
#elif defined(COSTHIGH)
// The function cost is ~100 cycles, so the loop calling "foo" will be paralleized
__declspec(concurrency_safe(cost(200)))
#elif defined(PROFITABLE)
// The function is profitable to be executed in parallel, so the loop calling "foo"
// should be paralleized.
__declspec(concurrency_safe(profitable))
#endif
__declspec(noinline)
int foo(float A[], float B[]) {
for (int i = 0; i < N; i++) {
B[i] = A[i];
}
return N;
}
int testp(float A[], float B[], float* In[], float* Out[]) {
int i, j;
for (i = 0; i < M; i++) {
foo (A, B);
for (j = 0; j < N; j++) {
Out[i][j] = In[i][j] + (NValue*j);
}
}
return N;
}
[C:/temp] icl -c -DCOSTLOW -Qparallel -Qpar-report2 -Qansi-alias v.cpp
C:\temp\v.cpp(28): (col. 3) remark: loop was not parallelized: insufficient computational work.
[C:/temp] icl -c -DCOSTHIGH -Qparallel -Qpar-report -Qansi-alias v.cpp
C:\temp\v.cpp(28): (col. 3) remark: LOOP WAS AUTO-PARALLELIZED.
[C:/temp] icl -c -DPROFITABLE -Qparallel -Qpar-report -Qansi-alias v.cpp
C:\temp\v.cpp(28): (col. 3) remark: LOOP WAS AUTO-PARALLELIZED.