2024 Tools Update
I’m pleased to share that the latest updates (2024.1) for the 2024 Software Development Tools from Intel are now available. This includes numerous optimizations to help boost AI performance wherever you need them—from AI PCs to data centers and supercomputers. I am also happy to share that this update includes the world’s first conformant SYCL* 2020 compiler.
World’s First SYCL 2020 Conformant Compiler
C++ with SYCL has proven to be a valuable way to target accelerators, regardless of vendor or architecture. Development projects have adopted it to target NVIDIA*, AMD*, and Intel GPUs with considerable success–with compelling performance data published over the past few years (including data in my first blog on 2024.0 tools, and data Steve published in his blog). Codeplay has steadily supported and advanced the oneAPI plugins for NVIDIA and AMD GPUs (learn more).
SYCL is an important standard from the Khronos Group that reached an important level of completeness with SYCL 2020. That ambitious standard added numerous important capabilities, including our favorite five (backends, USM, reductions, group library, atomic references) detailed in an earlier piece titled “Five Outstanding Additions Found in SYCL 2020.”
Multiple C++ compilers have supported most of SYCL for the past several years, but it is quite notable to have an implementation now recognized as conformant by Khronos. Intel submitted the results of its compiler (available in the 2024.1 release) and, following a successful review, it is now listed as the first conformant SYCL 2020 implementation on the Khronos site.
Steve Hikida, a long-time compiler guru and Intel Vice President and General Manager of Compiler Engineering, likes to emphasize how the first conformant implementation of any standard helps lift everyone. He published a more detailed blog on this accomplishment in his blog: Intel® Compiler First to Achieve SYCL* 2020 Conformance. In it, he mentions this key value when he says, “Now that we have a complete reference compiler implementation, it is possible to determine easily whether an application is correctly using SYCL. Developers can design and test their software with confidence, knowing that if their codebase compiles successfully with the Intel oneAPI DPC++/C++ compiler, a future SYCL conformant compiler for another set of hardware will be able to process the same source code with the only compatibility concerns being in hardware-specific optimizations, not in code correctness."
We were really, really, close when we released the 2024.0 version late last year. Since then, we tied up the few remaining loose ends, ran the necessary tests (SYCL Conformance Test Suite for the Khronos Group SYCL standard), submitted our results, and received the okay and listing by Khronos. We are delighted that this has all come together this spring!
We have reached this critical point, and we are not done innovating. In order to address important customer needs, we continue to look for ways to build on where we are. A great example is our support for graphs (aka “SYCL Graph”)—an innovation available in the Intel compiler to explore a feature that we offer for consideration in a future SYCL standard. A recent Codeplay blog shares that SYCL Graph has support for NVIDIA GPUs as well. Every standard evolves based on usage and feedback, and we look forward to your feedback as critical in this process.
Great resources for learning SYCL are the book (eBook is free) and the resources of sycl.tech.
Of Course, the 2024.1 Update is Much More Than Just SYCL
We made a number of updates and improvements since our initial 2024 release in December (see the announcement: 2024 Tools Deliver Standards, Performance, Dependability, and Innovation).
First up, our tool's commitments to high performance and productivity include a broad array of support for AI acceleration.
From AI PC to Data Center AI – Intel Tools Are There to Help
Intel’s recent introduction of the world’s first AI PC is well-supported by our tools, helping application development span from the PC up to the latest data centers and supercomputers. Our Intel software tools supply key foundational support for programming, which is complemented by other Intel investments in frameworks, OpenVINO™ toolkit, drivers, and more. We are here to help you put AI everywhere.
In this release, there are many new optimizations and refinements. Here are a few of the most notable:
- Continues our commitment to an intelligent Parallel STL offering (one that helps appropriate balance between the CPU and accelerators while always working on any system), the Intel® oneAPI DPC++ Library extended C++ standard parallelism with histogram algorithms to accelerate AI, scientific, and other data-intensive applications.
- Improves performance for complex models like large language models (LLMs) and Stable Diffusion, the Intel® oneAPI Deep Neural Network Library (oneDNN) offers performance improvements for Intel® Arc™ GPUs, Intel data center GPUs, and Intel® Xeon® Scalable processors.
- Delivers 100% conformance to the Python Array API standard with the Data Parallel Control library (dpctl) includes support for NVIDIA* devices. New functions include types for reduction, statistics, sorting, set, elementwise, linear algebra, and in-place elementwise operations.
- Delivers NumPy acceleration (via drop-in upgrade) is easier still thanks to the Data Parallel Extension for NumPy* (dpnp) library. New performance enhancements and features include functions for linear algebra, data manipulation, mathematics, statistics, and data types. New extended support for keyword arguments includes functions for array creation, counting, indexing, linear algebra, data manipulation, mathematics, searching, sorting, and statistics.
- Delivers Numba acceleration from the dpnp library includes new performance enhancements and features/functions for linear algebra, data manipulation, mathematics, statistics, and data types. New extended support for keyword arguments has functions for array creation, counting, indexing, linear algebra, data manipulation, mathematics, searching, sorting, and statistics.
- Advances Intel optimizations of Modin with significant enhancements in both security and performance. The robust security solution ensures proactive identification and remediation of vulnerabilities, offering protection for your organization’s data assets. This release includes numerous performance enhancements to optimize asynchronous execution too.
- Advances Intel® Open Path Guiding Library (Intel® Open PGL) spatial structure build so it is now fully multithreaded to significantly reduce training time to optimize path efficiency.
- Advances Intel® oneAPI Deep Neural Network Library (oneDNN) with improvements in graphics processing for Intel data center GPUs and Intel Arc GPUs, perfect for complex models like LLMs and Stable Diffusion* and increased performance for Intel Xeon Scalable processors.
- Advances Intel® oneAPI Data Analytics Library (oneDAL) with performance enhancements for gradient boosting inference across XGBoost, LightGBM, and CatBoost* without sacrificing accuracy with new fast tree inference.
- Intel® oneAPI Collective Communications Library (oneCCL) delivers even more performance for distributed deep learning and machine learning training and inference workloads. All key communication patterns were further optimized to not only speed up message passing but also to do so in a memory-efficient manner. This release in particular improves inference performance.
- Intel® oneAPI Math Kernel Library (oneMKL) eases porting NVIDIA CUDA* applications to SYCL by adding multiple functions equivalent to those available in cuSolver*, cuBLAS* and the CUDA Math Library*.
Finding the AI Tools You Need Most
Our teams created a remarkably useful aid we call the AI Tools Selector. You will encounter it naturally when you go to download the AI tools. It is just for AI Linux tools – but feedback is overwhelmingly in favor of us doing this style interface more in the future. Please enjoy the AI Tools Selector and give us feedback on how and where to extend this fine work.
Commitments to Production-Conscious Developers
Intel’s commitment to offering reproducibility options has been welcomed by developers for more than a decade. We continually add support to keep up with new hardware and software yielding numerous new performance optimizations. In this release, we proudly add Conditional Numeric Reproducibility (CNR) support in BLAS 3 even when operations are offloaded in data center GPUs. We know these features matter a lot when debugging and deploying production-worthy applications, and we appreciate the warm reception we get from so many developers for our support here.
Our continued support for GDB* includes a few notable updates:
- Intel® Distribution for GDB was rebased to GDB 14, staying current and aligned with the latest enhancements supporting effective application debugging.
- Intel’s distribution adds the following:
- Online page fault handling for GPUs, allowing developers to monitor and troubleshoot memory access issues in real-time, while providing insight into GPU driver behavior, resulting in improved application performance and reliability.
- Large General Purpose Register File (GRF) debug mode support for GPUs providing developers with more visibility into the GPU's internal state and allowing for more comprehensive debugging and optimization of GPU-accelerated applications. This mode is particularly useful for debugging complex or performance-critical code.
Standards – We Deliver Leading Support
We believe our commitment to open standards is unequaled. We know that open standards, with open governance, are highly valuable in helping make your coding investment portable, performance portable, and free of vendor lock-in. Standard parallelism is best when supported by an open ecosystem.
We work hard to implement every aspect of the standards we support. We diligently work to be highly transparent in enumerating our support and including useful tips, including any notes on any known limitations. The table below is a handy summary of our support and includes links to particularly useful details.
Fortran programmers will enjoy our continued commitment to offer an incredible compiler, thanks to many additions in this update release:
- More help on compatibility and interoperability between C and Fortran code in the intrinsic module ISO_C_BINDING
- Trigonometric calculations simplify support by allowing developers to use intrinsic functions that accept arguments as degrees.
- Support for predefined data types of specific sizes in intrinsic module ISO_FORTRAN_ENV.
- OpenMP offload includes more runtime checks to alert the developer if mapped data is not currently allocated/associated.
MPI improvements include support for using GPU Remote Memory Access (RMA) more efficiently via one-sided communication, and the new MPI 4.0 Persistent Collectives feature. Developers can now enjoy higher productivity through reduced code changes when implementing large-count operations by using the new MPI 4.0 large counts feature.
C/C++ users see numerous enhancements to OpenMP and SYCL support, including the previously mentioned conformance to SYCL 2020.
Our cryptographic support in Intel® Integrated Performance Primitives library now satisfies FIPS 140-3 security requirements.
Standard | Status | Details |
---|---|---|
SYCL 2020 |
Conformant – see Khronos list |
|
C++ |
Conforms to C++20, C++17, C++14, and C++11 while supporting many C++23 features already. We also cover older C++ standards in line with LLVM support (we are an LLVM-based compiler release). |
|
C |
We also cover all the C standards thanks to LLVM support (we are an LLVM-based compiler release). |
|
Fortran |
Conforms to 2018, 2008, 2003, 95, 90, 77, and 66 standards, plus supports high compatibility with Visual Fortran, DEC Fortran 90, and VAX FORTRAN 77. Of course, this includes Coarray and DO CONCURRENT. DO CONCURRENT supports a reduce clause and GPU offload. |
|
OpenMP |
Supports 1.x, 2.x, 3.x, 4.x, 5.0, 5.1, 5.2, and much of the latest TR12 (aka “future 6.0”) |
|
MPI |
MPI-1, MPI-2.2 and MPI-3.1 specification conformance Some MPI 4.0 support is now implemented. |
|
FIPS 140-3 - Security Requirements for Cryptographic Modules |
Intel IPP Cryptography’s now offers compliance to FIPS 140-3. It is ideal for anyone who manages sensitive data. |
|
UXL / oneAPI |
Full support for oneAPI current specification. |
We are a founding member of UXL and will continue to provide technical contributions and support in our products. |
Use the Latest Tools Today
Download the latest tools directly from Intel or via popular repositories.
You can find a lot more information in release notes for all the tools and libraries.
Thank you for your continued use and support of our tools, and we look forward to your feedback.