Visible to Intel only — GUID: GUID-61E605F3-E836-4F93-87CC-468B74112F1A
Visible to Intel only — GUID: GUID-61E605F3-E836-4F93-87CC-468B74112F1A
Use Branching Accurately
You can improve the performance of the Intel® Core™ and Intel® Xeon® processors by converting the uniform conditions that are equal across all work-items into compile time branches.
According to this approach, you have a single kernel that implements all desired behaviors, and let the host logic disable the paths that are not currently required. However, setting constants to branch on calculations wastes the device facilities, as the data is still being calculated before it discarded. Consider a preprocessor directives-based approach instead - use #ifndef blocks.
Consider the example where the original kernel uses constants for branching:
__kernel void foo(__constant int* src, __global int* dst, unsigned char bFullFrame, unsigned char bAlpha) { … if(bFullFrame)//uniform condition (equal for all work-items { … if(bAlpha) //uniform condition { … } else { … } else { … } }
Now consider the same kernel, but with use of compile time branches (“specialization” technique):
__kernel void foo(__constant int* src, __global int* dst) { … #ifdef bFullFrame { … #ifdef bAlpha { … } #else { … } #endif #else { … } #endif } }
Also consider similar optimization for other constants.
Minimize or, in best case, avoid using branching in short computations with min, max, clamp, and select built-ins instead of if and else clauses.
Move memory accesses that are common to the then and else blocks outside of the conditional code.
Consider the original code with use of the if and else clauses:
if (…) {//condition x = A[i1];// reading from A … // calculations B[i2] = y;// storing into B } else { q = A[i1];// reading from A with same index as in first clause … // different calculations B[i2] = w; // storing into B with same index as in first clause }
Now consider the optimized code that uses temporary variables:
temp1 = A[i1]; //reading from A in advance if (…) {//condition x = temp1; … // some calculations temp2 = y; //storing into temporary variable } else { q = temp1; … //some calculations temp2 = w; //storing into temporary variable } B[i2] =temp2; //storing to B once