Developer Guide and Reference

ID 767253
Date 10/31/2024
Public
Document Table of Contents

SIMD Data Layout Templates

SIMD Data Layout Templates (SDLT) is a C++11 template library providing containers that represent arrays of Plain Old Data objects (a struct whose data members do not have any pointers/references and no virtual functions) using layouts that enable generation of efficient SIMD (single instruction multiple data) vector code. SDLT uses standard ISO C++11 code. It does not require a special language or compiler to be functional. Still, it takes advantage of performance features (such as OpenMP* SIMD extensions and pragma ivdep) that may not be available to all compilers. It is designed to promote scalable SIMD vector programming. To use the library, specify SIMD loops and data layouts using explicit vector programming model and SDLT containers, and let the compiler efficiently generate efficient SIMD code.

Many library interfaces employ generic programming, in which interfaces are defined by requirements on types and not specific types. The C++ Standard Template Library (STL) is an example of generic programming. Generic programming enables SDLT to be flexible yet efficient. The generic interfaces enable you to customize components to your specific needs.

The net result is that SDLT enables you to specify a preferred SIMD data layout far more conveniently than re-structuring your code completely with a new data structure for effective vectorization and can improve performance at the same time.

Motivation

C++ programs often represent an algorithm in terms of high-level objects. There is a set of data for many algorithms that the algorithm will need to process. It is common for the dataset to be represented as an array of plain old data objects. It is common for developers to represent that array with a container from the C++ Standard Template Library, like std::vector. For example:

struct Point3s 
{
    float x;
    float y;
    float z;
	   // helper methods
};

std::vector<Point3s> inputDataSet(count);
std::vector<Point3s> outputDataSet(count);

for(int i=0; i < count; ++i) {
  Point3s inputElement = inputDataSet[i];
  Point3s result = // transformation of inputElement that is independent of other iterations
                   // can keep algorithm high level using object helper methods
  outputDataSet[i] = result;
}

When possible, a compiler may attempt to vectorize the loop above. However, the overhead of loading the Array of Structures dataset into vector registers may overcome any performance gain of vectorizing. Programs exhibiting the scenario above could be good candidates for use in an SDLT container with a SIMD-friendly internal memory layout. SDLT containers provide accessor objects to import and export primitives between the underlying memory layout and the object's original representation. For example:

SDLT_PRIMITIVE(Point3s, x, y, z)

sdlt::soa1d_container<Point3s> inputDataSet(count);
sdlt::soa1d_container<Point3s> outputDataSet(count);

auto inputData = inputDataSet.const_access();
auto outputData = outputDataSet.access();

#pragma forceinline recursive
#pragma omp simd
for(int i=0; i < count; ++i) {
  Point3s inputElement = inputData[i];
  Point3s result = // transformation of inputElement that is independent of other iterations
                   // can keep algorithm high level using object helper methods
  outputData[i] = result;
}

When a local variable inside the loop is initialized or stored using that loop's index , the compiler's vectorizer can now access the underlying SIMD-friendly data format and, when possible, perform unit stride loads. If the compiler can prove that nothing outside the loop can access its local object, then it can optimize its private representation of the loop object as Structure of Arrays (SOA). In our example, the container's underlying memory layout is also SOA, and unit stride loads can be generated. The container also allocates aligned memory, and its accessor objects provide the compiler with the correct alignment information to optimize code generation accordingly.

This documentation is for SDLT version 2, which extends version 1 by introducing support for n-dimensional containers.

Backwards Compatibility

Public interfaces of version 2 are fully backward compatible with interfaces of version 1.

The backward compatibility includes:

  • Existing source code compatibility. Any source code using the SDLT v1 public API (non-internal interfaces) can be recompiled against SDLT v2 headers with no changes.
  • Binary compatibility:
    • Because SDLT v2 APIs exist in a new namespace, sdlt::v2, all ABI linkage should not collide with any existing SDLT v1 ABIs that exist only in the sdlt namespace.
    • A binary, dynamically linked library that uses SDLT v1 internally can be linked into a program using SDLT v2, and vice versa.

Limitations on backward compatibility include:

  • Passing SDLT containers or accessors as part of a library's public API (ABI). When SDLT is used as part of an ABI, that library and the calling code must use the same version of SDLT. The versions must be matched; they cannot be mixed.

This compatibility does not cover internal implementation. The internal implementation for SDLT v1 was updated and unified with parts introduced in v2, so backward compatibility is not guaranteed for codes dependent on internal interfaces.

Deprecated

The interfaces below are deprecated; use the replacements provided in the table.

Deprecated Interface Deprecated in Version Replaced By
sdlt::fixed_offset<>
v2
sdlt::fixed<>
sdlt::aligned_offset<>
v2
sdlt::aligned<>

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201