Visible to Intel only — GUID: GUID-8DDAD0A8-9253-4C84-AC9E-726EFA2A66BD
Visible to Intel only — GUID: GUID-8DDAD0A8-9253-4C84-AC9E-726EFA2A66BD
Host-Side Analysis Optimization Tips
While you run the host-side performance analysis, the Code Analyzer identifies inefficient use of the OpenCL API. When the analysis is done, a TIPS screen appears, showing all the detected issues, each issue also has a short description.
Click a specific tip to open a related report and highlight the data within the report which is relevant to this tip. In addition a popup window appears, with a recommendation how to fix the reported issue:
The following table summarizes the recommendations that are reported from the Tips.
int Title | Description | Recommendation |
---|---|---|
Inefficient clCreateBuffer | The host program includes a call to clCreateBuffer, where flags includes CL_MEM_COPY_HOST_PTR. | There are two ways to ensure zero-copy path on memory objects mapping. For best results, allocate memory with CL_MEM_ALLOC_HOST_PTR, this method ensures that the memory is efficiently mirrored on the host. Another way is to allocate properly aligned and sized memory yourself and share the pointer with the OpenCL framework by using the CL_MEM_USE_HOST_PTR flag. |
clCreateBuffer call where host_ptr is not 4K aligned. | The host program includes a call to clCreateBuffer where host_ptr is not 4K aligned. | For best results, align memory address to host memory page (4K bytes) |
clCreateBuffer call where size is not a multiple of 64 bytes | The host program includes call to clCreateBuffer where size is not a multiple of 64 bytes. | For best results, make sure that the amount of memory you allocate and the size of the corresponding OpenCL buffer is a multiple of the cache line sizes (64 bytes). |
Redundant calls to clBuildProgram | The host program includes several calls to clBuildProgram with the same arguments. | When possible, call clGetProgramInfo to retrieve binaries generated from calls to clCreateProgramWithSource and clBuildProgram. |
Redundant calls to clCompileProgram | The host program includes several calls to clCompileProgram with the same arguments. | When possible, call clGetProgramInfo to retrieve previously compiled binaries. |
Redundant calls to clCreateContextFromType. | The host program includes calls to clCreateContextFromType with the same arguments. | Consider using the same OpenCL context instead of recreating it. |
Redundant calls to clCreateContext. | The host program includes calls to clCreateContext with the same arguments. | Consider using the same OpenCL context instead of recreating it. |
Redundant calls to clCreateCommandQueue | The host program includes several calls to clCreateCommandQueue that refer to the same device | Consider using the same command-queue to access the device. |
Redundant calls to clCreateCommandQueueWithProperties | The host program includes several calls to clCreateCommandQueueWithProperties that refer to the same device. | Consider using the same command-queue to access the device. |
clEnqueueReadBuffer call | The host program includes several calls to clEnqueueReadBuffer | When possible, use clEnqueueMapBuffer and clEnqueueUnmapMemObject instead of calls to clEnqueueReadBuffer or clEnqueueWriteBuffer. |
clEnqueueWriteBuffer call | The host program includes several calls to clEnqueueWriteBuffer | When possible, use clEnqueueMapBuffer and clEnqueueUnmapMemObject instead of calls to clEnqueueReadBuffer or clEnqueueWriteBuffer. |
clEnqueueReadImage call | The host program includes several calls to clEnqueueReadImage | When possible, use clEnqueueMapImage and clEnqueueUnmapMemObject instead of calls to clEnqueueReadImage or clEnqueueWriteImage. |
clEnqueueWriteImage call | The host program includes several calls to clEnqueueWriteImage | When possible, use clEnqueueMapImage and clEnqueueUnmapMemObject instead of calls to clEnqueueReadImage or clEnqueueWriteImage. |
clEnqueueReadBufferRect call | The host program includes several calls to clEnqueueReadBufferRect | When possible, use clEnqueueMapBuffer and clEnqueueUnmapMemObject instead of calls to clEnqueueReadBufferRect or clEnqueueWriteBufferRect. |
clEnqueueWriteBufferRect call | The host program includes several calls to clEnqueueWriteBufferRect | When possible, use clEnqueueMapBuffer and clEnqueueUnmapMemObject instead of calls to clEnqueueReadBufferRect or clEnqueueWriteBufferRect. |
The work-group dimensions are defined as "column" work-group | The host program includes a call to clEnqueueNDRange where the work-group dimensions are defined as "column" work-group. | When reading from memory, best to reorganize the work-group to read in lines instead of columns. |
Performance Information | Kernel register pressure is too high, spill fills will be generated. Additional surface needs to be allocated. | Consider simplifying your kernel. |
Performance Information | Kernel private memory usage is too high and exhaust register space. Additional surface needs to be allocated. | Consider reducing the amount of private memory used, avoid using private memory arrays. |
Performance Information | Local workgroup sizes selected for this workload may not be optimal | consider using a different local workgroup size, |
Performance Information | Not aligned surface detected. Driver needs to disable L3 caching. | |
Performance Information | Kernel submission requires coherency with CPU, this may impact performance. | |
Performance Information | Null local workgroup size detected, Following sizes will be used for execution |