Comparing OpenCL™ and Native Code Performance

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Download PDF

ID 773005

Date 10/30/2018

Version 2018

Public

Comparing OpenCL™ and Native Code Performance

When comparing an OpenCL™ kernel performance on CPU device with native code performance, make sure that both versions of code are as similar as possible. Consider the following guidelines:

Wrap exactly the same set of operations.
Do not include program build time in the kernel execution time. You can amortize this step by program precompilation using the clCreateProgramFromBinary call.
Track data transfers costs separately.
Use data mapping to make data transfers similar to the way data is passed in native code (by use of pointers). Refer to the Mapping Memory Objects (USE_HOST_PTR) section
Ensure the working set is identical for native and OpenCL code.
Make the memory access patterns equal (row-wise compared to column-wise).
Demand the same accuracy. Consider the example for CPU device. rsqrt(x) is inherently of the higher accuracy than __mm_rsqrt_ps SSE intrinsic. To use the same accuracy in native code and OpenCL code, do one of the following:
- Equip __mm_rsqrt_ps in your native code with couple of additional Newton-Raphson iterations to match the precision of OpenCL™ rsqrt.
- Use native_rsqrt in your OpenCL™ kernel, which maps exactly to the rsqrtps instruction in the final assembly code.
- Use the relaxed-math compilation flag to enable similar accuracy for the whole program. Similarly to rsqrt, you can use the relaxed versions of rcp, sqrt, and so on. Refer to the Developer Guide for Intel® SDK for OpenCL™ Applications for the full list of supported functions.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Comparing OpenCL™ and Native Code Performance

See Also