Visible to Intel only — GUID: GUID-E0372EEF-4A59-48BE-957A-2F33D23D1E9C
Visible to Intel only — GUID: GUID-E0372EEF-4A59-48BE-957A-2F33D23D1E9C
Use Lower Math Precision
OpenCL™ offers two basic ways to trade precision for speed:
- native_* and half_* math built-ins, which have lower precision, but are faster than their un-prefixed variants
- The compiler optimization options that enable optimizations for floating-point arithmetic for the whole OpenCL program, for example, the -cl-fast-relaxed-math flag.
In general, while the -cl-fast-relaxed-math flag is a quick way to get performance gains for kernels with many math operations, it does not permit fine numeric accuracy control. Consider experimenting with the native_* equivalents separately for each specific case, keeping track of the resulting accuracy.
Native_ versions of math built-ins are supported in hardware and run substantially faster, while offering lower accuracy. Use native trigonometry and transcendental functions, such as sin, cos, exp, and log, when performance is more important than precision.
For a full list of OpenCL build options and option descriptions, refer to the the OpenCL specification.
See Also
Use Restrict Qualifier for Kernel Arguments
Intel Tools for OpenCL™ Applications
OpenCL™ 1.2 Specification at https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf