Intrinsics for Bit Manipulation Operations

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 7/13/2023

Version

Public

Intrinsics for Bit Manipulation Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

variable	definition
`src`	source element to use based on writemask result
`k`	writemask used as a selector
`a`	first source vector element

_mm_lzcnt_epi32

__m128i _mm_lzcnt_epi32(__m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results.

_mm_mask_lzcnt_epi32

__m128i _mm_mask_lzcnt_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm_maskz_lzcnt_epi32

__m128i _mm_maskz_lzcnt_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm256_lzcnt_epi32

__m256i _mm256_lzcnt_epi32(__m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results.

_mm256_mask_lzcnt_epi32

__m256i _mm256_mask_lzcnt_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm256_maskz_lzcnt_epi32

__m256i _mm256_maskz_lzcnt_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm_lzcnt_epi64

__m128i _mm_lzcnt_epi64(__m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results.

_mm_mask_lzcnt_epi64

__m128i _mm_mask_lzcnt_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm_maskz_lzcnt_epi64

__m128i _mm_maskz_lzcnt_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm256_lzcnt_epi64

__m256i _mm256_lzcnt_epi64(__m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results.

_mm256_mask_lzcnt_epi64

__m256i _mm256_mask_lzcnt_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm256_maskz_lzcnt_epi64

__m256i _mm256_maskz_lzcnt_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm_multishift_epi64_epi8

__m128i _mm_multishift_epi64_epi8(__m128i a, __m128i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.

_mm_mask_multishift_epi64_epi8

__m128i _mm_mask_multishift_epi64_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm_maskz_multishift_epi64_epi8

__m128i _mm_maskz_multishift_epi64_epi8(__mmask16 k, __m128i a, __m128i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm256_multishift_epi64_epi8

__m256i _mm256_multishift_epi64_epi8(__m256i a, __m256i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.

_mm256_mask_multishift_epi64_epi8

__m256i _mm256_mask_multishift_epi64_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm256_maskz_multishift_epi64_epi8

__m256i _mm256_maskz_multishift_epi64_epi8(__mmask32 k, __m256i a, __m256i b)

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_multishift_epi64_epi8

__m512i _mm512_multishift_epi64_epi8(__m512i a, __m512i b)

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.

_mm512_mask_multishift_epi64_epi8

__m512i _mm512_mask_multishift_epi64_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_multishift_epi64_epi8

__m512i _mm512_maskz_multishift_epi64_epi8(__mmask64 k, __m512i a, __m512i b)

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

Parent topic: Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Additional Instructions