Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Bit Manipulation Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


variable definition
src

source element to use based on writemask result

k

writemask used as a selector

a

first source vector element


_mm_lzcnt_epi32

__m128i _mm_lzcnt_epi32(__m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results.



_mm_mask_lzcnt_epi32

__m128i _mm_mask_lzcnt_epi32(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_lzcnt_epi32

__m128i _mm_maskz_lzcnt_epi32(__mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_lzcnt_epi32

__m256i _mm256_lzcnt_epi32(__m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results.



_mm256_mask_lzcnt_epi32

__m256i _mm256_mask_lzcnt_epi32(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_lzcnt_epi32

__m256i _mm256_maskz_lzcnt_epi32(__mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntd

Counts the number of leading zero bits in each packed 32-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_lzcnt_epi64

__m128i _mm_lzcnt_epi64(__m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results.



_mm_mask_lzcnt_epi64

__m128i _mm_mask_lzcnt_epi64(__m128i src, __mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_lzcnt_epi64

__m128i _mm_maskz_lzcnt_epi64(__mmask8 k, __m128i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_lzcnt_epi64

__m256i _mm256_lzcnt_epi64(__m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results.



_mm256_mask_lzcnt_epi64

__m256i _mm256_mask_lzcnt_epi64(__m256i src, __mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_lzcnt_epi64

__m256i _mm256_maskz_lzcnt_epi64(__mmask8 k, __m256i a)

CPUID Flags: AVX512CD, AVX512VL

Instruction(s): vplzcntq

Counts the number of leading zero bits in each packed 64-bit integer in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm_multishift_epi64_epi8

__m128i _mm_multishift_epi64_epi8(__m128i a, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.



_mm_mask_multishift_epi64_epi8

__m128i _mm_mask_multishift_epi64_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm_maskz_multishift_epi64_epi8

__m128i _mm_maskz_multishift_epi64_epi8(__mmask16 k, __m128i a, __m128i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm256_multishift_epi64_epi8

__m256i _mm256_multishift_epi64_epi8(__m256i a, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.



_mm256_mask_multishift_epi64_epi8

__m256i _mm256_mask_multishift_epi64_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm256_maskz_multishift_epi64_epi8

__m256i _mm256_maskz_multishift_epi64_epi8(__mmask32 k, __m256i a, __m256i b) 

CPUID Flags: AVX512VBMI, AVX512VL

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_multishift_epi64_epi8

__m512i _mm512_multishift_epi64_epi8(__m512i a, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value.



_mm512_mask_multishift_epi64_epi8

__m512i _mm512_mask_multishift_epi64_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_maskz_multishift_epi64_epi8

__m512i _mm512_maskz_multishift_epi64_epi8(__mmask64 k, __m512i a, __m512i b) 

CPUID Flags: AVX512VBMI

Instruction(s): vpmultishiftqb

For each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).