IMF Device Library

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

Download PDF

ID 767253

Date 6/24/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-4F939C68-344B-4F51-B826-F5184DACC47C

View Details

IMF Device Library

The Intel Math Functions (IMF) Device Library is a set of standard math functions implemented for execution on SYCL devices (GPU, CPU, and accelerators). Most of IMF functions comply with ISO C99, SYCL, OpenCL™, IEEE754 standards in terms of computed outputs and IEEE754-special values processing.

The function interfaces are available through the header file:

#include <sycl/ext/intel/math.hpp>

Accuracy

All IMF device library functions provide following accuracy levels:

default: Default accuracy compliant to the best of OpenCL/SYCL/CUDA requirements.
ha: High accuracy (units-in-the-last-place (ULP) is not greater than 1.0).
la: Low accuracy (ULP is not greater than 4.0).
ep: Enhanced performance (where approximately half of the mantissa bits are correct).

All functions provide the default accuracy implementations. A subset of functions contains additional ha, la, and ep accuracy flavors for possible accuracy-performance balance in specific use cases.

The extended accuracy versions are available with the nested namespaces: ha, la, and ep. For example:

float sycl::ext::intel::math::acos ( float x );     // default accuracy
float sycl::ext::intel::math::ha::acos ( float x ); // ha (High Accuracy)
float sycl::ext::intel::math::la::acos ( float x ); // la (Low Accuracy)
float sycl::ext::intel::math::ep::acos ( float x ); // ep (Enhanced Performance)

The following table shows math function accuracies. The accuracy is measured in ULP's on uniformly distributed random input values along commonly used function-specific work intervals with the addition of:

Values with random mantissa and all possible exponent fields.
Corner cases (sub-normals, largest normal values, etc.).
IEEE754-special numbers (zeroes, Inf(A)'s, NaN's, etc.).

PRECISION	fp64 (double)				fp32 (float)				fp16 (sylc::half)
ACCURACY	default	ha	la	ep	default	ha	la	ep	default	ha	la	ep

acos	0.79	0.79	2.27	4.0E+07	3.0	0.78	3.0	525.0
asin	0.72	0.72	2.61	4.1E+07	3.73	0.69	3.73	535.0
atan	0.65	0.65	2.14	2.2E+07	0.87	0.87	3.05	2.2E+03
atan2	0.76	0.76	2.31	2.2E+07	2.65	0.87	2.65	436
acosh	1.37	0.89	1.37		1.39	0.86	1.39	1.6E+03
asinh	1.6	0.62	1.6		1.58	0.68	1.58	1.6E+03
atanh	2.12	0.65	2.12		1.85	0.56	1.85	1.5E+03
ceil	0.0				0.0
cbrt	0.73				0.79
copysign	0.0				0.0
cdfnorm***	1.0				1.12
cdfnorminv***	2.0				3.46
cos	0.85	0.85	3.23	6.1E+07	1.79	0.64	1.79	2.5E+03	1.43
cosh	0.75	0.75	1.42		1.99	0.56	1.99	380.0
cospi	1.0				1.78
erf	0.82	0.82	2.07	7.03	0.90	0.90	2.16	6.33
erfc	2.92	0.75	2.92		2.72	0.76	2.72
erfcinv	1.0				3.15
erfcx	2.0				2.34
erfinv	1.41				1.0
exp10	1.0	0.51	1.00	2.8E+07	0.93	0.93
exp2	0.71	0.71	1.07	6.0E+04	0.68	0.68			1.66
exp	0.92	0.92	1.25	1.7E+07	0.82	0.82			1.61	0.83	1.61
expm1	0.75	0.75	1.76	1.1E+07	0.74	0.74	1.69	328.0
fdim	0.0				0.0
floor	0.0				0.0
fmod	0.0				0.0
frexp	0.0				0.0
hypot	1.12	0.85	1.12		0.96	0.5	0.96
cyl_bessel_i0	1.36				5.21
cyl_bessel_i1	2.77				5.69
j0	3.81				2.78
j1	3.01				2.38
jn	2.7E+03				8.0E+01
lgamma	3.52				2.99
ilogb	0.0				0.0
isfinite	0.0				0.0
isinf	0.0				0.0
isnan	0.0				0.0
ldexp	0.0				0.0
llrint	0.0				0.0
llround	0.0				0.0
log	0.5	0.5	1.35	4.0E+07	0.94	0.94	1.14	1.5E+03	0.59
log10	0.5	0.5	1.9		1.58	0.72	1.58	989.0	0.58
log1p	0.77	0.77	1.6		0.55	0.55	1.73	1.6E+03
log2	0.5	0.5	1.58		0.71	0.71	1.93	889.0	0.6
logb	0.0				0.0
lrint	0.0				0.0
lround	0.0				0.0
modf	0.0				0.0
nan	0.0				0.0
nearbyint	0.0				0.0
nextafter	0.0				0.0
norm	1.31				1.46
norm3d	0.5				1.04
norm4d	0.5				1.09
pow	0.98	0.85	0.98		1.05	0.78	1.05	1.8E+03
powi	1.48				18.4
rcbrt	0.53				0.85
remainder	0.0				0.0
remquo	0.0				0.0
rhypot	0.75				1.36
rint	0.0				0.0
rnorm	2.2				1.66
rnorm3d	0.74				1.24
rnorm4d	0.75				1.26
round	0.0				0.0
saturate					0.0
scalbn	0.0				0.0
signbit	0.0				0.0
sin	0.85	0.85	3.15	6.1E+07	1.96	0.65	1.96	2.5E+03	1.88
sincos	1.49	0.85	1.49	2.8E+07	2.38	0.86	2.38
sincospi	2.0				1.78
sinh	1.74	0.79	1.74		1.34	0.68	1.34	1.1E+03
sinpi	1.0				1.78
tan	0.52	0.52	3.01	5.2E+07	3.88	0.76	3.88
tanh	0.65	0.65	2.11		0.57	0.57	1.36	1.5E+03
tgamma	9.06				3.01
trunc	0.0				0.0
y0	5.47				3.2
y1	3.64				4.86
yn	2.0E+03				145.0

NOTE:

The accuracy of the inlined functions: inv, sqrt and rsqrt is defined by the OpenCL™/SYCL standards and may be affected by -f[no-]fast-math compiler switch.

The obtained ULP ranges are obtained via random sampling over large number of data points. The actual ULP value might be higher for specific values of arguments.

The cdfnorm and cdfnorminv have CUDA-specific aliases: normcdf and normcdfinv, which are mapped to the same computation kernels.

Parent topic: Optimization and Programming

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

IMF Device Library

Accuracy

See Also