We are providing two files, RECIP14.c and RECIP28EXP2.c, containing reference implementations for the scalar versions of 10 approximation instructions introduced in the Intel® Architecture Instruction Set Extensions Programming Reference document. The files can be downloaded from the links provided above.
RECIP14.c contains emulation routines for the underlying algorithms of:
- VRCP14PD - Compute Approximate Reciprocals of Packed Float64 Values with relative error of less than 2-14
- VRCP14SD - Compute Approximate Reciprocal of Scalar Float64 Value with relative error of less than 2-14
- VRCP14PS - Compute Approximate Reciprocals of Packed Float32 Values with relative error of less than 2-14
- VRCP14SS - Compute Approximate Reciprocal of Scalar Float32 Value with relative error of less than 2-14
- VRSQRT14PD - Compute Approximate Reciprocals of Square Roots of Packed Float64 Values with relative error of less than 2-14
- VRSQRT14SD - Compute Approximate Reciprocal of Square Root of Scalar Float64 Value with relative error of less than 2-14
- VRSQRT14PS - Compute Approximate Reciprocals of Square Roots of PackedFloat32 Values with relative error of less than 2-14
- VRSQRT14SS - Compute Approximate Reciprocal of Square Root of Scalar Float32 Value with relative error of less than 2-14
The corresponding emulation routines (only scalar versions) are:
- RCP14S - reciprocal approximation for Float32
- RCP14D - reciprocal approximation for Float64
- RSQRT14S - reciprocal square root approximation for Float32
- RSQRT14D - reciprocal square root approximation for Float64
RECIP28EXP2.c contains emulation routines for the underlying algorithms of:
- VRCP28PD - Approximation to the Reciprocal of Packed Double Precision Floating-Point Values with Less Than 2-28 Relative Error
- VRCP28SD - Approximation to the Reciprocal of Scalar Double Precision Floating-Point Value with Less Than 2-28 Relative Error
- VRCP28PS - Approximation to the Reciprocal of Packed Single Precision Floating-Point Values with Less Than 2-28 Relative Error
- VRCP28SS - Approximation to the Reciprocal of Scalar Single Precision Floating-Point Value with Less Than 2-28 Relative Error
- VRSQRT28PD - Approximation to the Reciprocal Square Root of Packed Double Precision Floating-Point Values with Less Than 2-28 Relative Error
- VRSQRT28SD - Approximation to the Reciprocal Square Root of Scalar Double Precision Floating-Point Value with Less Than 2^-28 Relative Error
- VRSQRT28PS - Approximation to the Reciprocal Square Root of Packed Single Precision Floating-Point Values with Less Than 2-28 Relative Error
- VRSQRT28SS - Approximation to the Reciprocal Square Root of Scalar Single Precision Floating-Point Value with Less Than 2-28 Relative Error
- VEXP2PD - Approximation to the Exponential 2x of Packed Double Precision Floating-Point Values with Less Than 2-23 Relative Error
- VEXP2PS - Approximation to the Exponential 2x of Packed Single Precision Floating-Point Values with Less Than 2-23 Relative Error
The corresponding emulation routines (only scalar versions) are:
- RCP28S - reciprocal approximation for Float32
- RCP28D - reciprocal approximation for Float64
- RSQRT28S - reciprocal square root approximation for Float32
- RSQRT28D - reciprocal square root approximation for Float64
- EXP2S - Base-2 exponential approximation for Float32
- EXP2D - Base-2 exponential approximation for Float64
The reference functions have to be compiled with the DAZ and FTZ mode turned off (e.g. with the Intel compiler for Linux, using the -no-ftz option), and have to be run with the rounding mode set to round-to-nearest, and with floating-point exceptions masked.
Usage example for RCP14S and RCP14D
The following example may be compiled with any of the following (or other, equivalent) commands:
where main.c is shown below:
Usage example for RSQRT14S and RSQRT14D
The following example may be compiled with any of the following (or other, equivalent) commands:
where main.c is shown below:
Usage example for RCP28S and RCP28D
The following example may be compiled with any of the following (or other, equivalent) commands:
where main.c is shown below:
Usage example for RSQRT28S and RSQRT28D
The following example may be compiled with any of the following (or other, equivalent) commands:
where main.c is shown below:
Usage example for EXP2S and EXP2D
The following example may be compiled with any of the following (or other, equivalent) commands:
where main.c is shown below: