Visible to Intel only — GUID: GUID-D66C41EE-53C6-4A63-BE53-D1402C162559
Visible to Intel only — GUID: GUID-D66C41EE-53C6-4A63-BE53-D1402C162559
Benefitting From Implicit Vectorization
OpenCL™ Code Builder includes an implicit vectorization module as part of the program build process. When it is beneficial in terms of performance, this module packs several work-items together and executes them with SIMD instructions. This enables you to benefit from the vector units in the Intel® Architecture Processors without writing explicit vector code.
The vectorization module transforms scalar data type operations by adjacent work-items into an equivalent vector operations. When vector operations already exist in the kernel source code, the module scalarizes (breaks them down into component operations) and revectorizes them. This improves performance by transforming the memory access pattern of the kernel into a structure of arrays (SOA), which is often more cache-friendly than an array of structures (AOS).
You can find more details in the "Intel OpenCL™ Implicit Vectorization Module overview" article.
The implicit vectorization module works best for the kernels that operate on elements, which are four-byte wide, such as float or int data types. You can define the computational width of a kernel using the OpenCL vec_type_hint attribute.
Since the default computation width is four-byte, kernels are vectorized by default. If your kernel uses vectors explicitly, you can specify __attribute__((vec_type_hint(<typen>))) with typen of any vector type (for example, float3 or char4). This attribute indicates to the vectorization module that it should apply only transformations that are useful for this type.
The performance benefit from the vectorization module might be lower for the kernels that include a complex control flow.
To benefit from vectorization, your code does not need for loops within kernels. For best results, let the kernel deal with a single data element, and let the vectorization module take care of the rest. The more straightforward your OpenCL code is, the more optimization you get from vectorization.Writing the kernel in the plain scalar code is what works best for efficient vectorization. This method of coding avoids potential disadvantages associated with explicit (manual) vectorization described in the "Using Vector Data Types" section.
See Also
Tips for Auto-Vectorization Module
Intel OpenCL™ Implicit Vectorization Module overview at http://llvm.org/devmtg/2011-11/Rotem_IntelOpenCLSDKVectorizer.pdf