Tips for Auto-Vectorization

OpenCL™ Developer Guide for Intel® Processor Graphics

Download PDF

ID 773088

Date 3/20/2019

Version 2019.4

Public

Visible to Intel only — GUID: GUID-029CE20A-09A7-48EB-AE0E-7DA5711BFB8C

View Details

Tips for Auto-Vectorization

Upon kernel compilation, the vectorization module often transforms the kernel memory access pattern from array of structures (AOS) to structure of arrays (SOA), which is SIMD friendly.

This transformation comes with a certain cost, specifically the transpose penalty. If you organize the input data in SOA instead of AOS, it reduces the transpose penalty.

For example, the following code suffers from transpose penalty:

__kernel void sum(__global float4* input, __global float* output)
{
int tid  = get_global_id(0);
output[tid] = input[tid].x + input[tid].y + input[tid].z + input[tid].w;
}

While the following piece of code does not suffer from the transpose penalty:

__kernel void sum(__global float* inx, __global float* iny, __global float* inz, __global float* inw,  __global float* output)
{
int tid  = get_global_id(0);
output[tid] = inx[tid] + iny[tid] + inz[tid] + inw[tid];
}

Take care when dealing with branches. Particularly, avoid data loads and stores within the statements:

if (…) {//condition
        x = A[i1];// reading from A 
        … // calculations
        B[i2] = y;// storing into B 
} else {
         q = A[i1];// reading from A with same index as in first clause
         …  // different calculations
         B[i2] = w; // storing into B with same index as in first clause
}

The following code avoids loading from and storing to memory within branches:

temp1 = A[i1]; //reading from A in advance
if (…) {//condition
        x = temp1; 
        … // some calculations
        temp2 = y; //storing into temporary variable
} else {
        q = temp1; 
        … //some calculations
        temp2 = w; //storing into temporary variable
}
B[i2] =temp2; //storing to B once

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Processor Graphics

Tips for Auto-Vectorization

See Also