Efficient Data Layout

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Download PDF

ID 773005

Date 10/30/2018

Version 2018

Public

Efficient Data Layout

The vectorization module transforms scalar data type operations on adjacent work-items into an equivalent vector operation. If vector operations already exist in the kernel source code, the module scalarizes (breaks into component operations) and revectorizes them. Such operation improves performance by transforming the memory access pattern of the kernel into a structure of arrays (SOA), which is often more cache-friendly than an array of structures (AOS).

This transformation comes with a certain cost. Organizing the input data in SOA reduces the transpose penalty. For example, the following code example suffers from transpose overhead:

__kernel void sum(__global float4* input, __global float* output)
{
int tid  = get_global_id(0);
output[tid] = input[tid].x + input[tid].y + input[tid].z + input[tid].w;
}

While the next piece of code does not suffer from this penalty:

__kernel void sum(__global float* inx, __global float* iny, __global float* inz, __global float* inw,  __global float* output)
{
int tid  = get_global_id(0);
output[tid] = inx[tid] + iny[tid] + inz[tid] + inw[tid];
}

To make the vectorization the most efficient, the sequential work items should refer to sequential memory locations. Otherwise, data gathering required for processing in SIMD might be expensive performance-wise. For example:

int tid  = get_global_id(0);
output[tid] = inx[tid]; //sequential access (with respect to adjacent work-items)
output[tid] = inx[2*tid]; //non-sequential access (triggers data gathering)

There is an alternative to AOS that generally preserves the expressiveness of AOS and efficiency of SOA. It is sometimes referenced as strided (or “stripped)” SOA, or even AOSSOA. Consider the code example below:

struct AOSSOA
{
float [16] x;
float [16] y;
float [16] z;
};

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Efficient Data Layout