Visible to Intel only — GUID: GUID-E2268FF1-6906-423B-B35C-D07F431EF320
Visible to Intel only — GUID: GUID-E2268FF1-6906-423B-B35C-D07F431EF320
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) BF16 Instructions
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) BF16 instruction intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
variable | definition |
---|---|
a | a source vector element |
b | a second source vector element |
k | mask used as a selector; depending on the intrinsic, it may be a writemask or a zeromask |
_mm_cvtne2ps_pbh
__m128bh _mm_cvtne2ps_pbh (__m128 a, __m128 b)
Instructions: vcvtne2ps2bf16 xmm, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst.
_mm_mask_cvtne2ps_pbh
__m128bh _mm_mask_cvtne2ps_pbh (__m128bh src, __mmask8 k, __m128 a, __m128 b)
Instructions: vcvtne2ps2bf16 xmm {k}, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm_maskz_cvtne2ps_pbh
__m128bh _mm_maskz_cvtne2ps_pbh (__mmask8 k, __m128 a, __m128 b)
Instructions: vcvtne2ps2bf16 xmm {k}{z}, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm256_cvtne2ps_pbh
__m256bh _mm256_cvtne2ps_pbh (__m256 a, __m256 b)
Instructions: vcvtne2ps2bf16 ymm, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst.
_mm256_mask_cvtne2ps_pbh
__m256bh _mm256_mask_cvtne2ps_pbh (__m256bh src, __mmask16 k, __m256 a, __m256 b)
Instructions: vcvtne2ps2bf16 ymm {k}, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm256_maskz_cvtne2ps_pbh
__m256bh _mm256_maskz_cvtne2ps_pbh (__mmask16 k, __m256 a, __m256 b)
Instructions: vcvtne2ps2bf16 ymm {k}{z}, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm512_cvtne2ps_pbh
__m512bh _mm512_cvtne2ps_pbh (__m512 a, __m512 b)
Instructions: vcvtne2ps2bf16 zmm, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst.
_mm512_mask_cvtne2ps_pbh
__m512bh _mm512_mask_cvtne2ps_pbh (__m512bh src, __mmask32 k, __m512 a, __m512 b)
Instructions: vcvtne2ps2bf16 zmm {k}, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm512_maskz_cvtne2ps_pbh
__m512bh _mm512_maskz_cvtne2ps_pbh (__mmask32 k, __m512 a, __m512 b)
Instructions: vcvtne2ps2bf16 zmm {k}{z}, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and stores the results in a single vector dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm_cvtneps_pbh
__m128bh _mm_cvtneps_pbh (__m128 a)
Instructions: vcvtneps2bf16 xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst.
_mm_mask_cvtneps_pbh
__m128bh _mm_mask_cvtneps_pbh (__m128bh src, __mmask8 k, __m128 a)
Instructions: vcvtneps2bf16 xmm {k}, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm_maskz_cvtneps_pbh
__m128bh _mm_maskz_cvtneps_pbh (__mmask8 k, __m128 a)
Instructions: vcvtneps2bf16 xmm {k}{z}, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm256_cvtneps_pbh
__m128bh _mm256_cvtneps_pbh (__m256 a)
Instructions: vcvtneps2bf16 xmm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst.
_mm256_mask_cvtneps_pbh
__m128bh _mm256_mask_cvtneps_pbh (__m128bh src, __mmask8 k, __m256 a)
Instructions: vcvtneps2bf16 xmm {k}, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm256_maskz_cvtneps_pbh
__m128bh _mm256_maskz_cvtneps_pbh (__mmask8 k, __m256 a)
Instructions: vcvtneps2bf16 xmm {k}{z}, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm512_cvtneps_pbh
__m256bh _mm512_cvtneps_pbh (__m512 a)
Instructions: vcvtneps2bf16 ymm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst.
_mm512_mask_cvtneps_pbh
__m256bh _mm512_mask_cvtneps_pbh (__m256bh src, __mmask16 k, __m512 a)
Instructions: vcvtneps2bf16 ymm {k}, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm512_maskz_cvtneps_pbh
__m256bh _mm512_maskz_cvtneps_pbh (__mmask16 k, __m512 a)
Instructions: vcvtneps2bf16 ymm {k}{z}, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Converts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.
_mm_dpbf16_ps
__m128 _mm_dpbf16_ps (__m128 src, __m128bh a, __m128bh b)
Instructions: vdpbf16ps xmm, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst.
_mm_mask_dpbf16_ps
__m128 _mm_mask_dpbf16_ps (__m128 src, __mmask8 k, __m128bh a, __m128bh b)
Instructions: vdpbf16ps xmm {k}, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm_maskz_dpbf16_ps
__m128 _mm_maskz_dpbf16_ps (__mmask8 k, __m128 src, __m128bh a, __m128bh b)
Instructions: vdpbf16ps xmm {k}{z}, xmm, xmm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set).
_mm256_dpbf16_ps
__m256 _mm256_dpbf16_ps (__m256 src, __m256bh a, __m256bh b)
Instructions: vdpbf16ps ymm, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst.
_mm256_mask_dpbf16_ps
__m256 _mm256_mask_dpbf16_ps (__m256 src, __mmask8 k, __m256bh a, __m256bh b)
Instructions: vdpbf16ps ymm {k}, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm256_maskz_dpbf16_ps
__m256 _mm256_maskz_dpbf16_ps (__mmask8 k, __m256 src, __m256bh a, __m256bh b)
Instructions: vdpbf16ps ymm {k}{z}, ymm, ymm
CPUID Flags: AVX512_BF16 + AVX512VL
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set).
_mm512_dpbf16_ps
__m512 _mm512_dpbf16_ps (__m512 src, __m512bh a, __m512bh b)
Instructions: vdpbf16ps zmm, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst.
_mm512_mask_dpbf16_ps
__m512 _mm512_mask_dpbf16_ps (__m512 src, __mmask16 k, __m512bh a, __m512bh b)
Instructions: vdpbf16ps zmm {k}, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using writemask k. Elements are copied from src when the corresponding mask bit is not set.
_mm512_maskz_dpbf16_ps
__m512 _mm512_maskz_dpbf16_ps (__mmask16 k, __m512 src, __m512bh a, __m512bh b)
Instructions: vdpbf16ps zmm {k}{z}, zmm, zmm
CPUID Flags: AVX512_BF16 + AVX512F
Computes the dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and stores the results in dst using zeromask k. Elements are zeroed out when the corresponding mask bit is not set.