Visible to Intel only — GUID: GUID-4F939C68-344B-4F51-B826-F5184DACC47C
Visible to Intel only — GUID: GUID-4F939C68-344B-4F51-B826-F5184DACC47C
IMF Device Library
The Intel Math Functions (IMF) Device Library is a set of standard math functions implemented for execution on SYCL devices (GPU, CPU, and accelerators). Most of IMF functions comply with ISO C99, SYCL, OpenCL™, IEEE754 standards in terms of computed outputs and IEEE754-special values processing.
The function interfaces are available through the header file:
#include <sycl/ext/intel/math.hpp>
Accuracy
All IMF device library functions provide following accuracy levels:
- default: Default accuracy compliant to the best of OpenCL/SYCL/CUDA requirements.
- ha: High accuracy (units-in-the-last-place (ULP) is not greater than 1.0).
- la: Low accuracy (ULP is not greater than 4.0).
- ep: Enhanced performance (where approximately half of the mantissa bits are correct).
All functions provide the default accuracy implementations. A subset of functions contains additional ha, la, and ep accuracy flavors for possible accuracy-performance balance in specific use cases.
The extended accuracy versions are available with the nested namespaces: ha, la, and ep. For example:
float sycl::ext::intel::math::acos ( float x ); // default accuracy float sycl::ext::intel::math::ha::acos ( float x ); // ha (High Accuracy) float sycl::ext::intel::math::la::acos ( float x ); // la (Low Accuracy) float sycl::ext::intel::math::ep::acos ( float x ); // ep (Enhanced Performance)
The following table shows math function accuracies. The accuracy is measured in ULP's on uniformly distributed random input values along commonly used function-specific work intervals with the addition of:
- Values with random mantissa and all possible exponent fields.
- Corner cases (sub-normals, largest normal values, etc.).
- IEEE754-special numbers (zeroes, Inf(A)'s, NaN's, etc.).
PRECISION | fp64 (double) | fp32 (float) | fp16 (sylc::half) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACCURACY | default | ha | la | ep | default | ha | la | ep | default | ha | la | ep |
acos | 0.79 | 0.79 | 2.27 | 4.0E+07 | 3.0 | 0.78 | 3.0 | 525.0 | ||||
asin | 0.72 | 0.72 | 2.61 | 4.1E+07 | 3.73 | 0.69 | 3.73 | 535.0 | ||||
atan | 0.65 | 0.65 | 2.14 | 2.2E+07 | 0.87 | 0.87 | 3.05 | 2.2E+03 | ||||
atan2 | 0.76 | 0.76 | 2.31 | 2.2E+07 | 2.65 | 0.87 | 2.65 | 436 | ||||
acosh | 1.37 | 0.89 | 1.37 | 1.39 | 0.86 | 1.39 | 1.6E+03 | |||||
asinh | 1.6 | 0.62 | 1.6 | 1.58 | 0.68 | 1.58 | 1.6E+03 | |||||
atanh | 2.12 | 0.65 | 2.12 | 1.85 | 0.56 | 1.85 | 1.5E+03 | |||||
ceil | 0.0 | 0.0 | ||||||||||
cbrt | 0.73 | 0.79 | ||||||||||
copysign | 0.0 | 0.0 | ||||||||||
cdfnorm*** | 1.0 | 1.12 | ||||||||||
cdfnorminv*** | 2.0 | 3.46 | ||||||||||
cos | 0.85 | 0.85 | 3.23 | 6.1E+07 | 1.79 | 0.64 | 1.79 | 2.5E+03 | 1.43 | |||
cosh | 0.75 | 0.75 | 1.42 | 1.99 | 0.56 | 1.99 | 380.0 | |||||
cospi | 1.0 | 1.78 | ||||||||||
erf | 0.82 | 0.82 | 2.07 | 7.03 | 0.90 | 0.90 | 2.16 | 6.33 | ||||
erfc | 2.92 | 0.75 | 2.92 | 2.72 | 0.76 | 2.72 | ||||||
erfcinv | 1.0 | 3.15 | ||||||||||
erfcx | 2.0 | 2.34 | ||||||||||
erfinv | 1.41 | 1.0 | ||||||||||
exp10 | 1.0 | 0.51 | 1.00 | 2.8E+07 | 0.93 | 0.93 | ||||||
exp2 | 0.71 | 0.71 | 1.07 | 6.0E+04 | 0.68 | 0.68 | 1.66 | |||||
exp | 0.92 | 0.92 | 1.25 | 1.7E+07 | 0.82 | 0.82 | 1.61 | 0.83 | 1.61 | |||
expm1 | 0.75 | 0.75 | 1.76 | 1.1E+07 | 0.74 | 0.74 | 1.69 | 328.0 | ||||
fdim | 0.0 | 0.0 | ||||||||||
floor | 0.0 | 0.0 | ||||||||||
fmod | 0.0 | 0.0 | ||||||||||
frexp | 0.0 | 0.0 | ||||||||||
hypot | 1.12 | 0.85 | 1.12 | 0.96 | 0.5 | 0.96 | ||||||
cyl_bessel_i0 | 1.36 | 5.21 | ||||||||||
cyl_bessel_i1 | 2.77 | 5.69 | ||||||||||
j0 | 3.81 | 2.78 | ||||||||||
j1 | 3.01 | 2.38 | ||||||||||
jn | 2.7E+03 | 8.0E+01 | ||||||||||
lgamma | 3.52 | 2.99 | ||||||||||
ilogb | 0.0 | 0.0 | ||||||||||
isfinite | 0.0 | 0.0 | ||||||||||
isinf | 0.0 | 0.0 | ||||||||||
isnan | 0.0 | 0.0 | ||||||||||
ldexp | 0.0 | 0.0 | ||||||||||
llrint | 0.0 | 0.0 | ||||||||||
llround | 0.0 | 0.0 | ||||||||||
log | 0.5 | 0.5 | 1.35 | 4.0E+07 | 0.94 | 0.94 | 1.14 | 1.5E+03 | 0.59 | |||
log10 | 0.5 | 0.5 | 1.9 | 1.58 | 0.72 | 1.58 | 989.0 | 0.58 | ||||
log1p | 0.77 | 0.77 | 1.6 | 0.55 | 0.55 | 1.73 | 1.6E+03 | |||||
log2 | 0.5 | 0.5 | 1.58 | 0.71 | 0.71 | 1.93 | 889.0 | 0.6 | ||||
logb | 0.0 | 0.0 | ||||||||||
lrint | 0.0 | 0.0 | ||||||||||
lround | 0.0 | 0.0 | ||||||||||
modf | 0.0 | 0.0 | ||||||||||
nan | 0.0 | 0.0 | ||||||||||
nearbyint | 0.0 | 0.0 | ||||||||||
nextafter | 0.0 | 0.0 | ||||||||||
norm | 1.31 | 1.46 | ||||||||||
norm3d | 0.5 | 1.04 | ||||||||||
norm4d | 0.5 | 1.09 | ||||||||||
pow | 0.98 | 0.85 | 0.98 | 1.05 | 0.78 | 1.05 | 1.8E+03 | |||||
powi | 1.48 | 18.4 | ||||||||||
rcbrt | 0.53 | 0.85 | ||||||||||
remainder | 0.0 | 0.0 | ||||||||||
remquo | 0.0 | 0.0 | ||||||||||
rhypot | 0.75 | 1.36 | ||||||||||
rint | 0.0 | 0.0 | ||||||||||
rnorm | 2.2 | 1.66 | ||||||||||
rnorm3d | 0.74 | 1.24 | ||||||||||
rnorm4d | 0.75 | 1.26 | ||||||||||
round | 0.0 | 0.0 | ||||||||||
saturate | 0.0 | |||||||||||
scalbn | 0.0 | 0.0 | ||||||||||
signbit | 0.0 | 0.0 | ||||||||||
sin | 0.85 | 0.85 | 3.15 | 6.1E+07 | 1.96 | 0.65 | 1.96 | 2.5E+03 | 1.88 | |||
sincos | 1.49 | 0.85 | 1.49 | 2.8E+07 | 2.38 | 0.86 | 2.38 | |||||
sincospi | 2.0 | 1.78 | ||||||||||
sinh | 1.74 | 0.79 | 1.74 | 1.34 | 0.68 | 1.34 | 1.1E+03 | |||||
sinpi | 1.0 | 1.78 | ||||||||||
tan | 0.52 | 0.52 | 3.01 | 5.2E+07 | 3.88 | 0.76 | 3.88 | |||||
tanh | 0.65 | 0.65 | 2.11 | 0.57 | 0.57 | 1.36 | 1.5E+03 | |||||
tgamma | 9.06 | 3.01 | ||||||||||
trunc | 0.0 | 0.0 | ||||||||||
y0 | 5.47 | 3.2 | ||||||||||
y1 | 3.64 | 4.86 | ||||||||||
yn | 2.0E+03 | 145.0 |
The accuracy of the inlined functions: inv, sqrt and rsqrt is defined by the OpenCL™/SYCL standards and may be affected by -f[no-]fast-math compiler switch.
The obtained ULP ranges are obtained via random sampling over large number of data points. The actual ULP value might be higher for specific values of arguments.
The cdfnorm and cdfnorminv have CUDA-specific aliases: normcdf and normcdfinv, which are mapped to the same computation kernels.
- IMF Device Library Usage Example
- IMF Device Library Function List
- IMF Device Library Trigonometric Functions
- IMF Device Library Hyperbolic Functions
- IMF Device Library Exponential Functions
- IMF Device Library Logarithmic Functions
- IMF Device Library Power Functions
- IMF Device Library Special Functions
- IMF Device Library Rounding Functions
- IMF Device Library Miscellaneous Functions