Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 6/24/2024
Public
Document Table of Contents

IMF Device Library

The Intel Math Functions (IMF) Device Library is a set of standard math functions implemented for execution on SYCL devices (GPU, CPU, and accelerators). Most of IMF functions comply with ISO C99, SYCL, OpenCL™, IEEE754 standards in terms of computed outputs and IEEE754-special values processing.

The function interfaces are available through the header file:

#include <sycl/ext/intel/math.hpp>

Accuracy

All IMF device library functions provide following accuracy levels:

  • default: Default accuracy compliant to the best of OpenCL/SYCL/CUDA requirements.
  • ha: High accuracy (units-in-the-last-place (ULP) is not greater than 1.0).
  • la: Low accuracy (ULP is not greater than 4.0).
  • ep: Enhanced performance (where approximately half of the mantissa bits are correct).

All functions provide the default accuracy implementations. A subset of functions contains additional ha, la, and ep accuracy flavors for possible accuracy-performance balance in specific use cases.

The extended accuracy versions are available with the nested namespaces: ha, la, and ep. For example:

float sycl::ext::intel::math::acos ( float x );     // default accuracy
float sycl::ext::intel::math::ha::acos ( float x ); // ha (High Accuracy)
float sycl::ext::intel::math::la::acos ( float x ); // la (Low Accuracy)
float sycl::ext::intel::math::ep::acos ( float x ); // ep (Enhanced Performance)

The following table shows math function accuracies. The accuracy is measured in ULP's on uniformly distributed random input values along commonly used function-specific work intervals with the addition of:

  • Values with random mantissa and all possible exponent fields.
  • Corner cases (sub-normals, largest normal values, etc.).
  • IEEE754-special numbers (zeroes, Inf(A)'s, NaN's, etc.).

PRECISION fp64 (double) fp32 (float) fp16 (sylc::half)
ACCURACY default ha la ep default ha la ep default ha la ep
                         
acos 0.79 0.79 2.27 4.0E+07 3.0 0.78 3.0 525.0        
asin 0.72 0.72 2.61 4.1E+07 3.73 0.69 3.73 535.0        
atan 0.65 0.65 2.14 2.2E+07 0.87 0.87 3.05 2.2E+03        
atan2 0.76 0.76 2.31 2.2E+07 2.65 0.87 2.65 436        
acosh 1.37 0.89 1.37   1.39 0.86 1.39 1.6E+03        
asinh 1.6 0.62 1.6   1.58 0.68 1.58 1.6E+03        
atanh 2.12 0.65 2.12   1.85 0.56 1.85 1.5E+03        
ceil 0.0       0.0              
cbrt 0.73       0.79              
copysign 0.0       0.0              
cdfnorm*** 1.0       1.12              
cdfnorminv*** 2.0       3.46              
cos 0.85 0.85 3.23 6.1E+07 1.79 0.64 1.79 2.5E+03 1.43      
cosh 0.75 0.75 1.42   1.99 0.56 1.99 380.0        
cospi 1.0       1.78              
erf 0.82 0.82 2.07 7.03 0.90 0.90 2.16 6.33        
erfc 2.92 0.75 2.92   2.72 0.76 2.72          
erfcinv 1.0       3.15              
erfcx 2.0       2.34              
erfinv 1.41       1.0              
exp10 1.0 0.51 1.00 2.8E+07 0.93 0.93            
exp2 0.71 0.71 1.07 6.0E+04 0.68 0.68     1.66      
exp 0.92 0.92 1.25 1.7E+07 0.82 0.82     1.61 0.83 1.61  
expm1 0.75 0.75 1.76 1.1E+07 0.74 0.74 1.69 328.0        
fdim 0.0       0.0              
floor 0.0       0.0              
fmod 0.0       0.0              
frexp 0.0       0.0              
hypot 1.12 0.85 1.12   0.96 0.5 0.96          
cyl_bessel_i0 1.36       5.21              
cyl_bessel_i1 2.77       5.69              
j0 3.81       2.78              
j1 3.01       2.38              
jn 2.7E+03       8.0E+01              
lgamma 3.52       2.99              
ilogb 0.0       0.0              
isfinite 0.0       0.0              
isinf 0.0       0.0              
isnan 0.0       0.0              
ldexp 0.0       0.0              
llrint 0.0       0.0              
llround 0.0       0.0              
log 0.5 0.5 1.35 4.0E+07 0.94 0.94 1.14 1.5E+03 0.59      
log10 0.5 0.5 1.9   1.58 0.72 1.58 989.0 0.58      
log1p 0.77 0.77 1.6   0.55 0.55 1.73 1.6E+03        
log2 0.5 0.5 1.58   0.71 0.71 1.93 889.0 0.6      
logb 0.0       0.0              
lrint 0.0       0.0              
lround 0.0       0.0              
modf 0.0       0.0              
nan 0.0       0.0              
nearbyint 0.0       0.0              
nextafter 0.0       0.0              
norm 1.31       1.46              
norm3d 0.5       1.04              
norm4d 0.5       1.09              
pow 0.98 0.85 0.98   1.05 0.78 1.05 1.8E+03        
powi 1.48       18.4              
rcbrt 0.53       0.85              
remainder 0.0       0.0              
remquo 0.0       0.0              
rhypot 0.75       1.36              
rint 0.0       0.0              
rnorm 2.2       1.66              
rnorm3d 0.74       1.24              
rnorm4d 0.75       1.26              
round 0.0       0.0              
saturate         0.0              
scalbn 0.0       0.0              
signbit 0.0       0.0              
sin 0.85 0.85 3.15 6.1E+07 1.96 0.65 1.96 2.5E+03 1.88      
sincos 1.49 0.85 1.49 2.8E+07 2.38 0.86 2.38          
sincospi 2.0       1.78              
sinh 1.74 0.79 1.74   1.34 0.68 1.34 1.1E+03        
sinpi 1.0       1.78              
tan 0.52 0.52 3.01 5.2E+07 3.88 0.76 3.88          
tanh 0.65 0.65 2.11   0.57 0.57 1.36 1.5E+03        
tgamma 9.06       3.01              
trunc 0.0       0.0              
y0 5.47       3.2              
y1 3.64       4.86              
yn 2.0E+03       145.0            
NOTE:

The accuracy of the inlined functions: inv, sqrt and rsqrt is defined by the OpenCL™/SYCL standards and may be affected by -f[no-]fast-math compiler switch.

The obtained ULP ranges are obtained via random sampling over large number of data points. The actual ULP value might be higher for specific values of arguments.

The cdfnorm and cdfnorminv have CUDA-specific aliases: normcdf and normcdfinv, which are mapped to the same computation kernels.