_mm_mask_i32gather_ps, _mm256_mask_i32gather

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-01963F91-B5F2-4143-B55F-F9521205C161

View Details

_mm_mask_i32gather_ps, _mm256_mask_i32gather_ps

Gathers 2/4 packed single-precision floating point values from memory referenced by the given base address, dword indices and scale, and using the given single-precision FP mask values. The corresponding Intel® AVX2 instruction is VGATHERDPS.

Syntax

extern __m128 _mm_mask_i32gather_ps(__m128 def_vals, float const * base, __m128i vindex __m128 vmask, const int scale);

extern __m256 _mm256_mask_i32gather_ps(__m256 def_vals, float const * base, __m256i vindex __m256 vmask, const int scale);

Arguments

`def_vals`	the vector of single-precision FP values copied to the destination when the corresponding element of the single-precision FP mask is '0'.
`base`	the base address used to reference the loaded FP elements.
`vindex`	the vector of dword indices used to reference the loaded FP elements.
`vmask`	the vector of FP elements used as a vector mask; only the most significant bit of each data element is used as a mask.
`scale`	The compilation time literal constant, which is used as the vector indices scale to address the loaded elements. Possible values are one of the following: 1, 2, 4, 8.

Description

The intrinsics conditionally load 2/4 packed single-precision floating-point values from memory using dword indices according to mask values.

Below is the pseudo-code for the intrinsics:

_mm_mask_i32gather_ps():

result[31:0] = (vmask[31]==1) ? (mem[base+vindex[31:0]*scale]) : (def_vals[31:0]);
result[63:32] = (vmask[63]==1) ? (mem[base+vindex[63:32]*scale]) : (def_vals[63:32]);
result[95:64] = (vmask[95]==1) ? (mem[base+vindex[95:64]*scale]) : (def_vals[95:64]);
result127:96] = (vmask[127]==1) ? (mem[base+vindex[127:96]*scale]) : (def_vals[127:96]);

_mm256_mask_i32gather_ps():

result[31:0] = (vmask[31]==1) ? (mem[base+vindex[31:0]*scale]) : (def_vals[31:0]);
result[63:32] = (vmask[63]==1) ? (mem[base+vindex[63:32]*scale]) : (def_vals[63:32]);
result[95:64] = (vmask[95]==1) ? (mem[base+vindex[95:64]*scale]) : (def_vals[95:64]);
result127:96] = (vmask[127]==1) ? (mem[base+vindex[127:96]*scale]) : (def_vals[127:96]);
result[159:128] = (vmask[159]==1) ? (mem[base+vindex[159:128]*scale]) : (def_vals[159:128]);
result[191:160] = (vmask[191]==1) ? (mem[base+vindex[191:160]*scale]) : (def_vals[191:160]);
result[223:192] = (vmask[223]==1) ? (mem[base+vindex[223:192]*scale]) : (def_vals[223:192]);
result[255:224] = (vmask[255]==1) ? (mem[base+vindex[255:224]*scale]) : (def_vals[255:224]);

Returns

A 128/256-bit vector with conditionally gathered single-precision FP values.

Parent topic: Intrinsics for GATHER Operations

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in