Visible to Intel only — GUID: GUID-BC22D9EC-7749-4338-BE06-1D862F73BAC0
Visible to Intel only — GUID: GUID-BC22D9EC-7749-4338-BE06-1D862F73BAC0
ivdep Attribute
Include the ivdep attribute in your single task kernel to direct the Intel® oneAPI DPC++/C++ Compiler to ignore memory aliasing dependencies carried by the loop that the attribute is applied to. This attribute applies only to the loop it is applied to, and not to any of the future loops that might appear as a result of the [[intel::loop_coalesce(N)]] attribute.
Syntax
[[intel::ivdep]]
[[intel::ivdep(safelen)]]
[[intel::ivdep(array)]]
[[intel::ivdep(array, safelen)]]
[[intel::ivdep(safelen, array)]]
Applying the ivdep attribute incorrectly results in functionally incorrect hardware and potential functional differences between the hardware run and emulation. The ivdep attribute is ignored in emulation.
During compilation, the Intel® oneAPI DPC++/C++ Compiler creates hardware that ensures load and store instructions operate within dependency constraints. An example of a dependency constraint is that dependent load and store instructions must execute in order. The presence of the ivdep attribute instructs the Intel® oneAPI DPC++/C++ Compiler to remove the extra hardware between load and store instructions in the loop that immediately follows the attribute declaration in the kernel code. Removing the extra hardware may reduce logic utilization and lower the II value.
You can provide more information about loop dependencies by specifying a safelen parameter to the attribute by adding an integer type C++ constant expression argument to the attribute. The safelen parameter specifies the maximum number of consecutive loop iterations without loop-carried dependencies. For example, [[intel::ivdep(32)]] indicates to the compiler that there are at least 32 iterations of the loop before loop-carried dependencies is introduced. That is, while the [[intel::ivdep]] attribute guarantees to the compiler that there are no implicit memory dependencies between any iteration of this loop, [[intel::ivdep(32)]] guarantees that there does not exist a loop-carried dependence with a dependence distance less than 32. For example, if an iteration reads from memory, the preceding 31 iterations and succeeding 31 iterations are guaranteed not to write to the same memory location.
To specify that accesses to a particular memory array inside a loop do not cause loop-carried dependencies, add the array parameter to the attribute by specifying the array variable name as an argument to the attribute. The array specified by the ivdep attribute must be a local or private memory array, or a pointer variable that points to a global, local, or private memory storage. The array specified by the ivdep attribute can also be an array or a pointer member of a struct.
The ivdep attribute applies to memory aliasing dependencies and not memory ordering dependencies.
For example, the ivdep attribute in the following code does not cause the compiler to ignore dependencies.
atomic_ref<int, memory_order::acquire, memory_scope_device> B_ref(B); atomic_ref<int, memory_order::release, memory_scope_device> A_ref(A); [[intel::ivdep]] for (int i = 0; i < N; ++i) { // Assume A and B don't alias A_ref[i] = ...; ... = B_ref[i]; }
Because the atomic store to A has release semantics (that is, the store must happen after any memory operation that comes before it), and the atomic load from B has acquire semantics (that is, the load must happen before any memory operation that comes after it) then the ivdep attribute will not cause the compiler to ignore the dependence between the load on iteration i and the store on iteration i+1.
Examples
// No loop-carried dependencies for accesses to arrays A and B [[intel::ivdep]] for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; } // No loop-carried dependencies for accesses to array A // Compiler inserts hardware that reinforces dependency constraints for B [[intel::ivdep(A)]] for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; } // No loop-carried dependencies for array A inside struct [[intel::ivdep(S.A)]] for (int i = 0; i < N; i++) { S.A[i] = S.A[i - X[i]]; } // No loop-carried dependencies for array A inside the struct pointed by S [[intel::ivdep(S->X[2][3].A)]] for (int i = 0; i < N; i++) { S->X[2][3].A[i] = S.A[i - X[i]]; }
// The ivdep directive will not be applied to usages of accessorA inside the lambda foo or the function bar [[intel::ivdep(accessorA, safelen)]] for (;;){ auto foo = [=](){ ...; // Uses of accessorA } bar(accessorA, ...); }
When specifying the array name an ivdep attribute must apply to, the compiler still claims there is a memory dependency on that array. Consider the following example code:
[[intel::ivdep( kSafeLen, histogram.data_ )]] for( uint32_t n = 0; n < kInitNumInputs; ++n ) { // Compute the Histogram index to increment uint32_t hist_group = input[n] % kNumOutputs; auto hist_count = histogram.read( hist_group ); hist_count++; histogram.write( hist_group, hist_count ); }
This for loop returns an II of 2 and reports memory dependency on the histogram.data_ array, to which the compiler must apply the ivdep attribute. histogram.data_ is accessed in the histogram.write() and histogram.read() function calls, which prevents the ivdep directive from working.
You can apply [[intel::ivdep(safelen, array)]] only to accesses of that specific array by name in the code that is lexicographically in the loop, which means the array is not traced through function calls.
if a loop with the [[intel::ivdep]] or [[intel::ivdep(safelen)]] attributes has an array declared within the scope of the loop itself, the effects of the ivdep apply to the loads and stores of that array.
It is currently undefined behavior to add an ivdep attribute to a loop that includes a locally-defined array if the same indices of that array are accessed in consecutive loop iterations. Even though such a program does not technically have loop-carried dependencies, this is a current limitation of the compiler.
Consider the following example:
[[intel::ivdep]] for (int i = 0; i < N; ++i) { int A[2]; A[0] = i; A[1] = i+1; ... B[idx[i]] = A[i%2]; }
Although A is local to the loop, because we generate one physical memory for A, we need to make sure that the writes to A in iteration i happen after the read from A in iteration i-1. Removing that dependence may lead to undefined behavior.
For this example, if the accesses to B do not have loop-carried dependencies, it would be legal to replace [[intel::ivdep]] with [[intel::ivdep(B)]].
For additional information, refer to "Loop ivdep Sample" on GitHub.