Visible to Intel only — GUID: efb1517862229547
Ixiasoft
1. Introduction to Standard Edition Best Practices Guide
2. Reviewing Your Kernel's report.html File
3. OpenCL Kernel Design Best Practices
4. Profiling Your Kernel to Identify Performance Bottlenecks
5. Strategies for Improving Single Work-Item Kernel Performance
6. Strategies for Improving NDRange Kernel Data Processing Efficiency
7. Strategies for Improving Memory Access Efficiency
8. Strategies for Optimizing FPGA Area Usage
A. Additional Information
2.1. High Level Design Report Layout
2.2. Reviewing the Report Summary
2.3. Reviewing Loop Information
2.4. Reviewing Area Information
2.5. Verifying Information on Memory Replication and Stalls
2.6. Optimizing an OpenCL Design Example Based on Information in the HTML Report
2.7. HTML Report: Area Report Messages
2.8. HTML Report: Kernel Design Concepts
3.1. Transferring Data Via Channels or OpenCL Pipes
3.2. Unrolling Loops
3.3. Optimizing Floating-Point Operations
3.4. Allocating Aligned Memory
3.5. Aligning a Struct with or without Padding
3.6. Maintaining Similar Structures for Vector Type Elements
3.7. Avoiding Pointer Aliasing
3.8. Avoid Expensive Functions
3.9. Avoiding Work-Item ID-Dependent Backward Branching
4.3.4.1. High Stall Percentage
4.3.4.2. Low Occupancy Percentage
4.3.4.3. Low Bandwidth Efficiency
4.3.4.4. High Stall and High Occupancy Percentages
4.3.4.5. No Stalls, Low Occupancy Percentage, and Low Bandwidth Efficiency
4.3.4.6. No Stalls, High Occupancy Percentage, and Low Bandwidth Efficiency
4.3.4.7. Stalling Channels
4.3.4.8. High Stall and Low Occupancy Percentages
7.1. General Guidelines on Optimizing Memory Accesses
7.2. Optimize Global Memory Accesses
7.3. Performing Kernel Computations Using Constant, Local or Private Memory
7.4. Improving Kernel Performance by Banking the Local Memory
7.5. Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor
7.6. Minimizing the Memory Dependencies for Loop Pipelining
Visible to Intel only — GUID: efb1517862229547
Ixiasoft
5.2. Removing Loop-Carried Dependencies Caused by Accesses to Memory Arrays
Include the ivdep pragma in your single work-item kernel to assert that accesses to memory arrays will not cause loop-carried dependencies.
During compilation, the creates hardware that ensures load and store instructions operate within dependency constraints. An example of a dependency constraint is that dependent load and store instructions must execute in order. The presence of the ivdep pragma instructs the offline compiler to remove this extra hardware between load and store instructions in the loop that immediately follows the pragma declaration in the kernel code. Removing the extra hardware might reduce logic utilization and lower the II value in single work-item kernels.
- If all accesses to memory arrays that are inside a loop will not cause loop-carried dependencies, add the line #pragma ivdep before the loop in your kernel code.
Example kernel code:
// no loop-carried dependencies for A and B array accesses #pragma ivdep for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; }
- To specify that accesses to a particular memory array inside a loop will not cause loop-carried dependencies, add the line #pragma ivdep array (array_name) before the loop in your kernel code.
The array specified by the ivdep pragma must be a local or private memory array, or a pointer variable that points to a global, local, or private memory storage. If the specified array is a pointer, the ivdep pragma also applies to all arrays that may alias with specified pointer.
The array specified by the ivdep pragma can also be an array or a pointer member of a struct.
Example kernel code:
// No loop-carried dependencies for A array accesses // The offline compiler will insert hardware that reinforces dependency constraints for B #pragma ivdep array(A) for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; } // No loop-carried dependencies for array A inside struct #pragma ivdep array(S.A) for (int i = 0; i < N; i++) { S.A[i] = S.A[i - X[i]]; } // No loop-carried dependencies for array A inside the struct pointed by S #pragma ivdep array(S->X[2][3].A) for (int i = 0; i < N; i++) { S->X[2][3].A[i] = S.A[i - X[i]]; } // No loop-carried dependencies for A and B because ptr aliases // with both arrays int *ptr = select ? A : B; #pragma ivdep array(ptr) for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; } // No loop-carried dependencies for A because ptr only aliases with A int *ptr = &A[10]; #pragma ivdep array(ptr) for (int i = 0; i < N; i++) { A[i] = A[i - X[i]]; B[i] = B[i - Y[i]]; }