Visible to Intel only — GUID: GUID-32D98B00-04F2-4EA2-82D8-E7F22EAACD92
Visible to Intel only — GUID: GUID-32D98B00-04F2-4EA2-82D8-E7F22EAACD92
Vectorization Basics for Intel® Architecture Processors
Intel® Architecture Processors provide performance acceleration using Single Instruction Multiple Data (SIMD) instruction sets, which include:
- Intel® Streaming SIMD Extensions (Intel® SSE)
- Intel® Advanced Vector Extensions (Intel® AVX) instructions
- Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions
- Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation instructions, Intel® AVX-512 Conflict Detection instructions, Intel® AVX-512 Doubleword and Quadword instructions, Intel® AVX-512 Byte and Word instructions, and Intel® AVX-512 Vector Length Extensions for Intel® processors
By processing multiple data elements in a single instruction, these ISA extensions enable data parallelism.
When using SIMD instructions, vector registers can store a group of data elements of the same data type, such as float or char. The number of data elements that fit in one register depends on the microarchitecture and on the data type width: for example, in case CPU supports vector register width 512 bits, each vector (ZMM) register can store sixteen float numbers, sixteen 32-bit integer numbers, and so on.
When using the SPMD technique, the Intel® OpenCL™ implementation can map the work items to the hardware according to one of the following:
- Scalar code, when work-items execute one-by-one
- SIMD elements, when several work-items fit into one register to run simultaneously
The Intel® SDK for OpenCL™ Applications contains an implicit vectorization module, which implements the second method. Depending on the kernel code, this operation might have some limitations. If the vectorization module optimization is disabled, the Intel SDK for OpenCL Applications uses the first method.