Intel® CPU Runtime for OpenCL™ Applications 2021 Release Notes

ID 740118
Updated 6/28/2021
Version Latest
Public

author-image

By

This page provides the current Release Notes for Intel® CPU Runtime for OpenCL™ Applications for Intel® Core™ and Intel® Xeon® processors. This page covers the CPU (x86-64) OpenCL™ implementation only. See the OpenCL™ Runtimes for Intel® Processors article for additional Intel® Graphics Technology information. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

All files are in PDF format - Adobe Reader* (or compatible) required.
For OpenCL™ developer tools, visit the Intel® SDK for OpenCL™ Applications 2019 page.
For questions or technical support, visit Intel® Software Developer Support.

NOTE: For Intel Xeon® Phi™ coprocessor device support, you must install Intel MPSS version 3.3 available here. (Deprecated)

2021

2021.4

Release Notes 

Overview

  • Migrated to OpenCL 3.0.
  • Native debugger is set as default on Windows. No needs to set CL_CONFIG_USE_NATIVE_DEBUGGER=1 when debugging program in Visual Studio.
  • Minor bug fixings

2021.3

Release Notes

Overview

  • CL_DEVICE_AFFINITY_DOMAIN_NUMA partition mode is now supported when creating sub-devices on Linux platforms.
  • Minor bug fixes

2021.2

Release Notes 

Overview

  • Removed the dependency on libxml2
  • Fixed bug in SVM/USM memory management
  • Fixed compatibility issue with oneTBB 2021.2 

2021.1

Release Notes 

Overview

  • Added subgroup support
  • Supporting latest Windows including Windows* 10, Windows Server 2016* and Windows Server 2019* 
  • Supporting Linux* distributions including Ubuntu* 20.04 LTS, Red Hat* Enterprise Linux* 8.1, CentOS* 8.x, SUSE* 15.x 
  • The Intel® CPU Runtime for OpenCL™ Applications for Linux is distributed through APT and YUM repositories. Please refer to the Release Notes for installation instructions. 
  • Bug fixes

2018

18.1

Release Notes

Overview

  • Support of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) ISA on Intel® Xeon® Platinum processor (formerly code name Skylake)
  • Enabled features of OpenCL™ 2.1. The product is based on a published Khronos* Specification and has passed the Khronos Conformance Process. The conformance record can be found at. Refer to submission #322 recorded on October 7, 2018.
  • Support for vectorization width 16 for the environment and configuration file variable CL_CONFIG_CPU_VECTORIZER_MODE, as well as for OpenCL™ C optional kernel attribute intel_vec_len_hint
  • Support for OpenCL™ Kernel debugging on Linux* OS with GDB*
  • Improved coexistence support with Intel® Graphics Compute Runtime for OpenCL™ Driver when both are installed.
  • Changed the platform name returned via clGetPlatformInfo(...) OpenCL™ API call with CL_PLATFORM_NAME bitflag to “Intel(R) CPU Runtime for OpenCL(TM) Applications”
  • New environment variable CL_CONFIG_CPU_TARGET_ARCH. It generates code exclusively for a given target CPU architecture. Allows only lowering the instruction set level supported by CPU:

Allowed values are:

skx

Generates code for processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation instructions, Intel® AVX-512 Conflict Detection instructions, Intel® AVX-512 Doubleword and Quadword instructions, Intel® AVX-512 Byte and Word instructions and Intel® AVX-512 Vector Length Extensions for Intel® processors, and the instructions enabled with core-avx2.

core-avx2

Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2 SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.

corei7-avx

Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.

corei7 Generates code for processors that support Intel® SSE4.2 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
  • Fixed an issue with user functions not being inlined in programs created using clCreateProgramWithIL(...) OpenCL™ API call
  • Fixed incorrectly reported CL_DEVICE_MAX_COMPUTE_UNITS for multi-socket Intel® Xeon® systems (reported on forum)
  • Fixed incompatibility with Intel® Threading Building Blocks (Intel® TBB) max_allowed_parallelism parameter
  • Fixed an issue with CL_DRIVER_VERSION returning incorrect driver version
  • Improved OpenCL™ C compiler diagnostics
  • Minor bug fixes
  • Updated the compiler infrastructure to LLVM* version 6.0
  • Intel® CPU Runtime for OpenCL™ Applications 18.1 supports CPU only. For Intel® Xeon Phi™ coprocessor support, use the version 14.2. For more information, see OpenCL™ runtime entry and release notes on the OpenCL™ driver page.

2016

16.1.2

Release Notes

Overview:

  • New optional __attribute__((intel_vec_len_hint(<uint>)))
    • This attribute can be used to provide a hint to the compiler that the kernel will perform best if vectorized to the specified vector length.
    • You can specify one of the following lengths for this attribute: 
      uint Description
      0 The compiler uses heuristics to decide whether to vectorize the kernel,
      and if so, which vector length to use. This is the default behavior.
      1 No vectorization is performed by the compiler. Explicit vector data types
      in kernels are left intact.
      4 Disables heuristics and vectorizes to the length of 4 respectively.
      8 Disables heuristics and vectorizes to the length of 8 respectively.
  • New OpenCL™ C predefined macro __INTEL_OPENCL_CPU_<CPUSIGN>
    • This macro can be used to fine tune the kernel for a specific CPU device microarchitecture. <CPUSIGN> is the CPU signature of a device.
    • You can specify one of the following values for this macro:
      Macro Intel Microarchitectures
      __INTEL_OPENCL_CPU_SKL__ Intel® microarchitecture code name Skylake
      __INTEL_OPENCL_CPU_SKX__ Intel® microarchitecture code name
      Skylake on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_BDW__ Intel® microarchitecture code name
      Broadwell
      __INTEL_OPENCL_CPU_BDW_XEON__ Intel® microarchitecture code name Broadwell on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_HSW__ Intel® microarchitecture code name Haswell
      __INTEL_OPENCL_CPU_HSW_XEON__ Intel® microarchitecture code name Haswell on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_IVB__ Intel® microarchitecture code name Ivy Bridge
      __INTEL_OPENCL_CPU_IVB_XEON__ Intel® microarchitecture code name Ivy Bridge on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_SNB__ Intel® microarchitecture code name Sandy Bridge
      __INTEL_OPENCL_CPU_SNB_XEON__ Intel® microarchitecture code name Sandy Bridge on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_WST__ Intel® microarchitecture code name Westmere
      __INTEL_OPENCL_CPU_WST_XEON__ Intel® microarchitecture code name Westmere on Intel Xeon® processor family
      __INTEL_OPENCL_CPU_UNKNOWN__ Unknown microarchitecture
  • Improved heuristics for choosing local size when ndrange is enqueued to the
    command queue that was created with
    CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL property (extension
    https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_thread_local_exec.
    txt).
  • A fix for a previous issue where an incorrect library was loaded when running on Intel®
    microarchitecture code name Skylake.

16.1.1

Release Notes

Overview:

  • Fix for the known incompatibility issue with the CPU Kernel Debugger from the Intel® SDK for OpenCL™ Applications 2016 R2 and the CPU only runtime package version 16.1.
  • Performance optimizations:
    • Compiler vectorizer heuristic tuning for a set of workloads
    • Workgroup fusion optimization improvements
    • Performance enhancements of the vload()/vstore() built-in functions
  • Fix for the issue reported on the forum: vectorizer produces incorrect code on SSE42 architectures when using the samplerless read_imagef() built-in function with image2d_t and int2 coordinates as arguments.
  • cl_khr_gl_sharing extension was disabled due to incompatibility with the Microsoft* Basic Display Adapter. To use this extension, please install OpenCL Driver for Intel® Iris™ Graphics and HD Graphics for Windows* OS. The driver package includes the OpenCL Runtime package for CPUs.
  • Due to performance bug Threading Building Blocks (TBB) library was downgraded from 4.2,Interface version 7001, Oct 2 2013" to 4.2, Interface version 7005 , Jun 1 2014

16.1

Release Notes

Overview:

  • Support for Intel® Core™ 6th generation and Xeon® v4 processors (former Intel microarchitecture codename Broadwell)
  • Support for OpenCL™ 2.0 specification
  • Improved cross-CPU support of pre-compiled kernel binary in Runtime:
    • Enables loading pre-generated kernel binaries that saves OpenCL program build time. For more information, see https://software.intel.com/en-us/node/540584
    • Enables generating a JIT binary for target CPU model by the Intel® SDK for OpenCL™ - Offline Compiler. For more information, see https://software.intel.com/en-us/node/539388
  • Bug and memory leak fixes.
  • Compiler infrastructure was updated to LLVM version 3.6.2

2015

15.1

Release Notes

Overview:

  • Removed support for the Intel® Xeon Phi™ coprocessors
  • New performance-related environment variables:
    • CL_CONFIG_CPU_RT_LOOP_UNROLL_FACTOR for loop unrolling of loops with non-constant trip count (CPU only)
    • CL_CONFIG_USE_FAST_RELAXED_MATH for enabling computations with floating-point calculation optimizations (forcing –cl-fast-relaxed-math)
  • Improved Microsoft Visual Studio* debugging of OpenCL kernels on CPU device
  • Bug and memory leak fixes
  • Several performance enhancements including better auto-vectorization and alias analysis of OpenCL kernels for CPU device.

2014

14.2 (deprecated)

Release Notes

Overview:

  • Added support for offline kernel compilation and kernel binary distribution on Intel® Xeon Phi™ coprocessors. With this release, on both Intel® Xeon Phi™ coprocessor and Intel CPU, the kernel binary is the final executable binary in contrast to the previous release, where the kernel binary on Intel Xeon Phi coprocessor was an intermediate code.
  • Improved kernel invocation time on Intel Xeon Phi coprocessor device in case of batching kernel commands into in-order queues
  • Optimized compiler vectorizer
  • New feature - User logger for API tracing and debugging functional failures in OpenCL applications
  • New environment variable CL_CONFIG_CPU_VECTORIZER_MODE
  • SPIR is now conformant on Intel Xeon Phi coprocessor
  • Bug fixes

14.1 (deprecated)

Release Notes

Overview:

  • Support for OpenCL Standard Portable Intermediate Representation (SPIR) 1.2 consumption.
  • Intel® Manycore Platform Software Stack (Intel® MPSS) 3.2 and 3.2.3 support.
    NOTE: Using OpenCL Runtime 14.1 with MPSS 3.2.1 is not recommended, as this combination introduces stability issues.
  • Performance improvements:
    • Faster execution of code dominated by statically diverging dynamically uniform branches
    • More efficient event traversing algorithm
    • NO_DMA mode is default, which improves buffer creation speed (not a preview feature anymore)
    • Improved device side memory pool control
  • CPU only: Starting with this release, kernel binary is the very final machine code. This enables creating the kernel binary offline and distributing it with the application machine code binary. This also eliminates the compilation time at the end-use product (clCreateProgramWithBinary)
  • Bug fixed (for Intel® Xeon Phi™ coprocessors only): Compilation crash when a struct is defined globally in the CL file.
  • New performance-related environment variables on Intel Xeon Phi –see the user guide for details
    • CL_CONFIG_MIC_DEVICE_FORCE_BUFFERS_PINNING_ON_HOST
    • CL_CONFIG_MIC_DEVICE_2MB_POOL_FINI_SIZE_MB
    • CL_CONFIG_MIC_DEVICE_2MB_POOL_INIT_SIZE_MB
  • Added 32-bit version of the runtime for Windows OS.
  • Added OpenCL CPU device support on Intel Core™ processors.