Developer Reference
Visible to Intel only — GUID: GUID-6144F473-3D3A-4A52-AA21-50C273375F3B
Intel® oneAPI Math Kernel Library Vector Mathematics Performance and Accuracy Data
Vector Mathematics (VM) computes elementary functions on vector arguments. VM includes a set of highly optimized implementations of computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, and others) that operate on vectors. VM can improve performance for applications like nonlinear optimization, computations of integrals, and others.
The table below describes VM three accuracy modes and expected performance level and maximum accuracy error for single and double precision for each mode.
Performance / Max. Error | High Accuracy (HA) | Low Accuracy (LA) | Enhanced Performance (EP) |
---|---|---|---|
Expected performance | Default | Better performance | Best performance available |
Maximum accuracy error | 1 ULP* | 4 ULP* | The lower half of the significand bits may be incorrect |
* Unit in the last place (ULP)
Most VM functions have different implementations corresponding to each of these three modes.
Given the reduction in accuracy as described in the table, the EP mode may be adequate for applications that do not rely on accurate results, such as media applications or some Monte Carlo simulations.
Accuracy behavior is processor specific, so results might differ slightly across processor families and components of one family such as processor models or libraries. Results might also vary slightly from release to release. However, all differences are within specified error bounds. Error and special value behavior are identical for HA and LA functions regardless of the processor used to run the software. For the EP mode, correct error and special value behavior are not guaranteed.
To control the VM accuracy modes, use the vmlSetMode function. For more information, refer to the Intel® oneAPI Math Kernel Library Developer Reference.
NOTE on Performance:
Performance numbers in the tables are shown for working argument intervals. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions accurately for huge arguments. Each function lists the working interval over which performance is measured. The same page contains graphs that show how the performance behavior depends on the vector length. There are two extreme cases: short and long vectors.
For short vectors, functions incur certain overheads, which are amortized with an increasing vector length. For vectors longer than a few dozens of elements, the performance remains quite flat until the L2 cache size is exceeded due to the length of the vector.
Data prefetching greatly reduces the performance penalty for vectors that do not fit in the cache.
NOTE on Accuracy:
The design requirements for the HA functions are to have an accuracy error less than 1.0 ULP, and to have all special values processed correctly. For the LA functions, the error bound is 4.0 ULP.
For the EP functions, approximately one half of the bits in the significand (the most significant ones) of the floating-point result need to be correct. For details, see the accuracy tables with ULP errors for all the functions. Any deviations from these error bounds are highlighted in the accuracy tables, and should be considered to be temporary.
For complex functions, the ULP error is the maximum of the two ULP errors calculated for the real and the imaginary parts of the result.
Special Value Processing
Special values are processed in conformance with the C9X standard. See the information for the special value behavior of every function in the Intel® oneAPI Math Kernel Library Developer Reference.