Boost AI and Accelerated Compute Productivity with Intel® oneAPI Toolkits 2025.0

November 18, 2024

Get the Latest on All Things CODE

author-image

By

oneAPI has come a long way. Five years ago, in November 2019, Bill Savage (then VP, Intel Architecture, Graphics and Software and GM Compute Performance and Developer Products) announced the creation of the oneAPI industry initiative at SC19 in Denver, Colorado.

The latest 2025.0 release of the Intel® Developer Tools represents our tremendous progress on oneAPI’s promise of an open cross-architecture software development platform for highly parallel accelerated Edge, AI, and HPC compute which scales without vendor lock-in.

You can find the complete detailed news update listing all the key feature improvements at this link:
→ The Intel® Software Development Tools 2025.0 Are Here

Our current product releases put an emphasis on strengthening developer productivity through a wholehearted embrace of open industry standards like LLVM*, SPIR-V,* OpenMP*, SYCL*, Fortran*, MPI* and Python*. We focus on expanding support for the latest parallel programming extensions, optimizations, and tuning for the latest AI and compute platforms. We pursue these improvements always having productivity, software scaling, maintainability, and flexibility in mind.

The result is a comprehensive software development stack covering everything from parallel hardware runtimes to Fortran, C/C++, and Python tools to data analytics, ML and AI models and frameworks.

Figure 1: Software Development Stack

Of course, this software development tools stack includes optimized support for the latest Intel platforms: Intel® Xeon® 6 processors with E-cores, Intel® Xeon® 6 processors with P-cores and Intel® Core Ultra processors (Series 2) with integrated GPU and NPU.

Five Years of oneAPI: The Movement Evolves

The vision of oneAPI is to provide a comprehensive set of libraries, open source repositories, SYCL* -based C++ language extensions, and optimized reference implementations to accelerate the following goals:

  1. Define a common unified and open multiarchitecture multivendor software platform.
  2. Ensure functional code and performance portability across hardware vendors and accelerator technologies.
  3. Maintain and nurture a comprehensive set of library APIs to cover programming domain needs across sectors and use cases.
  4. Provide a developer community and open forum to drive a unified API functionality and interfaces that meet the needs for a unified multiarchitecture software development model.
  5. Encourage collaboration on oneAPI projects and compatible oneAPI implementations across the developer ecosystem.

With millions of installations and a constantly growing active developer community, including the over 30 research institutions and universities comprising the oneAPI Academic Centers of Excellence and a long list of awesome-onapi projects, oneAPI has evolved into a cornerstone for the Unified Acceleration Foundation (UXL) established over a year ago.  

The oneAPI software platform and open multiarchitecture programming model, with its community of over 150 active participants, libraries, and specifications complete the rich set of building blocks the UXL Foundation is expanding on:

Figure 2: oneAPI Specification Elements

oneAPI along with the SYCL and OpenCL projects at Khronos Group* and UXL Foundation’s other affiliate partner, the Autoware Foundation*, assists the foundation's over 30 members, silicon vendors, software vendors, original design manufacturers,  AI solution providers, and automotive companies in achieving their goals. Its reach expands beyond open accelerated parallel computing into Artificial Intelligence, Visual Computing, Edge Computing, and more.

A Community Celebrates

oneAPI changed the way the software developer community can scale entire application software stacks across various hardware configurations. It frees software developers from vendor-lock minimizing the need for code rewrites when moving between hardware platforms with diverse heterogenous architectures and compute capabilities.

Below some of our fellow travelers and contributors in the oneAPI community share their experience adopting the open standards paradigm that defines oneAPI as well as the UXL Foundation:

Celebrating five years of oneAPI. In ExaHyPE, oneAPI has been instrumental in implementing the numerical compute kernels for hyperbolic equation systems, making a huge difference in performance with SYCL providing the ideal abstraction and agnosticism for exploring these variations. This versatility enabled our team, together with Intel engineers, to publish three distinct design paradigms for our kernels.

– Dr. Tobias Weinzierl, director, Institute for Data Science, Durham University

oneAPI has revolutionized the way we approach heterogeneous computing by enabling seamless development across architectures. Its open, unified programming model has accelerated innovation in fields from AI to HPC, unlocking new potential for researchers and developers alike. Happy 5th anniversary to oneAPI!

– Dr. Gal Oren, assistant professor, Department of Computer Science, Technion Israel Institute of Technology

Intel's commitment to their oneAPI software stack is testament to their developer-focused, open-standards commitment. As oneAPI celebrates its 5th anniversary, it provides comprehensive and performant implementations of OpenMP and SYCL for CPUs and GPUs, bolstered by an ecosystem of library and tools to make the most of Intel processors.

– Dr. Tom Deakin, senior lecturer, head of Advanced HPC Research Group, University of Bristol

Happy 5th anniversary, oneAPI! We've been partners since the private beta program in 2019. We are currently exploring energy-efficient solutions for simulations in material science and data analysis in bioinformatics with different accelerators. For that, the components of oneAPI, its compilers with back ends for various GPUs and FPGAs, oneMKL, and the performance tools Intel® VTune™ Profiler and Intel® Advisor are absolutely critical.

– Dr. Thomas Steinke, head of the Supercomputing Department, ZIB Zuse Institute Berlin

Flexibility Through Open Industry Standards

Let us discuss some of the new and extended capabilties that help you take software development productivity to the next level.

Intel not only was a founding member of oneAPI and the UXL Foundation. Our commitment and contribution to the open source software ecosystem has a long history. With the 2025.0 release we continue to be at the forefront of adopting the latest open industry standard features and proposals.

LLVM

Let us start by having a look at LLVM Sanitizers. They help identify and pinpoint undesirable or undefined behavior in your code. They provide a convenient way for software developers to verify code changes before submitting them to a repository branch. Intel Compilers support the following sanitizers:

  1. AddressSanitizer - detect memory safety bugs.
  2. UndefinedBehaviourSanitizer - detect undefined behavior bugs.
  3. MemorySanitizer - detect use of uninitialized memory bugs.
  4. ThreadSanitizer – detect data races.
  5. Device-Side AddressSanitizer – detect memory safety bugs in SYCL device code.

Among these the ThreadSanitizer and Device-Side AddressSanitizer have been newly added,

  • The new ThreadSanitizer allows you to catch data races in OpenMP and threaded applications. You can enable the sanitizer via the -fsanitize=thread flag.
  • The AddressSanitizer, a tool for detecting memory errors in C/C++ code, now includes support for SYCL device code. To activate this feature for the device code, use the flag -Xarch_device -fsanitize=address. The flag -Xarch_host -fsanitize=address should be used to identify memory access problems in the host code. This new SYCL accelerator extension thus provides a Device-Side AddressSanitizer. 

Find out more in the article:
→ Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler

oneTBB

The collaboration between Intel and the oneAPI Center of Excellence (COE) at Durham University under on oneTBB is a prime example of our community engagement. Professor Tobias Weinzierl and his team develop ExaHyPE, a generic collection of state-of-the-art numerical ingredients to write new solvers for hyperbolic equation systems.

The most widely used oneTBB generic parallel algorithms, such as parallel_for, take range as an argument. Together with Intel engineers, they developed and proposed a new oneTBB range type, blocked_nd_range targeted for inclusion in the official oneTBB specification. You can find more details about this extension in the Unified Acceleration Foundation (UXL) GitHub* oneAPI Specification.

Multi-threaded applications run faster with the new Intel® oneAPI Threading Building Blocks' task_group, flow_graph and parallel_for_each improved scalability.

Durham University is also working with the oneTBB development on enhancements to the task_group API. Among them is an extension to the types of the objects returned by task_group (task_handle objects) to represent tasks for the purpose of adding dependencies. With the new handles, you can submit tasks straightaway, setting them as predecessors to other tasks later. This dramatically increases the theoretical concurrency, avoiding sequential task graph assembly.

Find out more in the article:
The oneAPI Center of Excellence at Durham University Brings Its Experience from ExaHype into oneTBB

In addition, oneTBB flow graph now enables you to process overlapping messages on a shared graph, waiting for a specific message using the new try_put_and_wait experimental API.

OpenMP

In the latest releases of the Intel® oneAPI DPC++/C++ Compiler and Intel® Fortran Compiler, we are implementing many new features introduced as part of OpenMP 5.2 enhancements and the latest proposals for OpenMP 6.0. This includes support for the latest generation Intel® Arc™ Graphics GPU, Intel® Data Center GPU, and the integrated Intel® Arc™ Xe2 Graphics GPU.

Looking specifically at GPU execution control OpenMP provides:

  1. Data management in heterogeneous memory architecture.
  2. Execution policy/configuration of GPU thread management.
  3. Leveraging existing APIs to GPU-optimized libraries or other compilation units.
  4. GPU instructions selection/optimization.
  5. Control flow/branch control of concurrent thread execution.

With the Intel® Compilers 2025.0 we introduce the following new OpenMP features:

  • GROUPPRIVATE directive (in OpenMP 6.0) for data management on GPU shared local memory.
  • LOOP directive,
  • REDUCTION clause on TEAMS directive,
  • NOWAIT clause on TARGET directive (OpenMP 5.1) for execution policy.
  • INTEROP clause on DISPATCH directive (OpenMP 6.0) for leveraging existing APIs and enabling SYCL* and OpenMP interoperability.

Find out more in the article:
Advanced OpenMP* Device Offload with Intel® Compilers

MPI

Intel® MPI now offers a full MPI 4.0 implementation including partitioned communication and improved error handling.

New optimizations for MPI_Allreduce improve scale up and scale out performance for Intel GPUs.

Cryptography

In today’s digital environment, ensuring the security of cryptographic modules is critical, especially for organizations that manage sensitive information. Be ready for FIPS Compliance and the new challenges of bad actors in  a post-quantum computing world. The Intel® Cryptography Primitives Library stays at the forefront of security and privacy with following the latest open standards with thread-safe parallelism enabled cryptography algorithm solutions.

Speed Through Parallelism    

The emphasis on parallel compute acceleration does however not stop there.

oneMKL

The Intel® oneAPI Math Kernel Library (oneMKL) in its 30th year continues to be on the forefront of math libraries.

 

  • Workloads using single precision 3d real in-place FFT get significant improvements executing on Intel® Data Center GPUs.
  • The SYCL Discrete Fourier Transform API is easier to use and to debug with more compilation messages added for type safety, reducing time to develop application, especially when targeting Intel GPUs.
  • Sparse domain on SYCL API now supports sparse matrices using Coordinate Format (COO). This format is widely used for fast sparse matrices construction, and it can be easily converted to other popular formats such as Compressed Sparse Row (CSR) and Compressed Sparse Column) CSC.
  • New distribution models and data types are available for Random Number Generation (RNG) using SYCL device API.

One specific example how oneMKL algorithm performance benefits from our laser focus on increased parallelism is the introduction of sub-sequence parallelism for random number generation on GPU.

Random number generators are essential in many applications, such as cryptography, simulation, and scientific computing. They play a key role in providing the random seeds for many different prediction scenarios from predictive maintenance, quantitative finance risk assessment, or earthquake and tsunami emergency response planning.

We enable improving its performance even more by splitting the algorithm into several smaller tasks that can be processed simultaneously.

Find out how it works in the article:
Fast Sub-Stream Parallelization for oneMKL MRG32k3a Random Number Generator

Intel® Distribution for Python

Streamlining parallel Python execution is at the heart of a lot of our work upstreaming and contributing our optimizations to the PyTorch ecosystem.

The latest updates to the Intel® Distribution for Python come with the following new capabilities:  

  • Drop-in, near-native performance on CPU and GPU for numeric compute, powered by oneAPI.
  • Data Parallel Extension for Python (dpnp) expands compatibility, adding NumPy* 2.0 support in the runtime and providing asynchronous execution of offloaded operations. This update provides significant new functionality, with support for more than 25 new functions and keywords, providing 90% functional compatibility with CuPy*. 
  • Data Parallel Control (dpctl) expands compatibility, adding NumPy 2.0 support in the runtime and providing asynchronous execution of offloaded operations. This update expands the functionality of Python Array API support.

PyTorch Optimizations

Native Support for PyTorch 2.5 is accessible on Intel’s Data Center GPUs, Core Ultra processors, and client GPUs, where it can be used to develop on Windows with out-of-the-box support for Intel® Arc™ Graphics and Intel® Iris® XE Graphics GPUs.

The latest in CPU performance for PyTorch Inference is now available in PyTorch 2.5 with support for Torch.Compile with optimizations for the current processor generation and the ability to compile TorchInductors with the Intel oneAPI DPC++/C++ Compiler.

Optimized for the Latest AI and Compute Platforms

With the latest release of Intel Developer Tools, we can thus

and scale performance across the entire portfolio of Intel CPUs and GPUs.

Focus on Developer Productivity

It is however not only about performance, but also about productivity.

Compiler Optimization Reports

You can improve the performance of your application early during software development. Intel® oneAPI DPC++/C++ Compiler and Intel® Fortran Compiler 2025.0 come with substantially enhanced optimization reports covering

  • Inlining
  • Profile Guided Optimization
  • Loop Optimization
  • SIMD Vectorization
  • OpenMP
  • Code Generation

Let the compiler tell you how streamline your code as you are writing it.

Find out more in the article:
Develop Highly Optimized Applications Faster with Compiler Optimization Reports

Tuning AI Workload Performance

Application tuning and profiling with Intel® VTune™ Profiler 2025.0 is not limited to accelerated computing, but PyTorch and OpenVINO™  AI workloads using CPU, GPU and Intel Core Ultra Processor Series 2 NPU can be profiled as well.

Identify performance bottlenecks, streamline parallel execution of deep learning inference and training workloads, eliminate memory and cache latency, all by simply running the Intel VTune Profiler on your workload and leveraging Intel® Instrumentation and Tracing Technology API (ITT API) as well as Python awareness and OpenVINO integration.

With Intel VTune Profiler you can indeed streamline and accelerate ML and AI performance across CPU, GPU and NPU.

Find out more in the following cookbook recipes:
→ Profiling Large Language Models on Intel® Core™ Ultra 200V
→ Profiling OpenVINO™ Applications
→ Profiling Data Parallel Python* Applications

Download the Software

The 2025.0 releases of our Software Development Tools are available for download here:

Looking for smaller download packages?

Streamline your software setup with our toolkit selector to install full kits or new, right-sized sub-bundles containing only the needed components for specific use cases! Save time, reduce hassle, and get the perfect tools for your project with Intel® C++ Essentials, Intel® Fortan Essentials, and Intel® Deep Learning Essentials.

Additional Resources

Compilers

Libraries and APIs

Performance Profiling