High Performance AI Development with the Latest Intel® Software...

12/13/2024

Nikita Shiledarbaxi

Software Tools Technical Marketing Engineer

Robert Mueller-Albrecht

Software Tools Marketing Manager

Intel Corporation

With the latest 2025.0 release of the Intel® Software Development Tools this year, Intel’s extensive software portfolio powered by oneAPI and SYCL* aims to revolutionize AI and enable open, accelerated computing. Intel’s rapidly evolving software stack enables developers to perform accelerated, vendor-independent, parallel computing across modern heterogeneous architectures, including CPUs, GPUs, and other accelerators.

In a recent webinar, software developer tools expert Robert Mueller-Albrecht from Intel discussed the newest features introduced in the Intel Software Development Tools and how they help accelerate AI development on the latest architectures free from vendor lock-in. The topics covered in the webinar included:

PyTorch* and Python* optimizations by Intel for fast responsive AI,

Open source and open standards based accelerated parallel programming in C/C++ with SYCL* enabled by the Unified Acceleration (UXL) Foundation and the oneAPI community,

Tools for profiling the performance of AI workloads on CPUs, GPUs, and AI PC NPUs.

→ Watch the complete webinar recording: Develop AI with the Latest Intel Libraries and Tools.

This blog will give you highlights of the webinar and, hence, an overview of the developer-focused capabilities added to Intel’s software stack with the 2025.0 oneAPI release.

Intel® Software Development Tools for Production-Ready AI: What’s New?

Intel provides an extensive stack of software tools (as shown in Fig. 1 below) to easily deliver faster, high-performance, scalable, industry standards compliant AI solutions on the latest accelerated hardware of your choice. A variety of libraries, compilers, debuggers, and optimized extensions to traditional data science algorithms and frameworks for machine learning life cycle and data analytics enable accelerated AI development in CPU-only environments or a combination of CPUs, GPUs, and other AI accelerators. The components of the software stack aid in developing efficient AI and HPC workloads not only on individual computers but also in multi-node distributed computing environments using the Intel® oneAPI Collective Communications Library (oneCCL) and the Intel® MPI runtime library.

Fig.1: Intel Software Development Tools Stack

With the recent 2025.0 release of oneAPI tools, ready-to-use bundles of new product packages have been introduced for download, eliminating the need to download large base packages and add-ons. You can now download the Intel® oneAPI HPC Toolkit as a stand-alone package, not just as an extension of the Intel® oneAPI Base Toolkit. You can also get sub-packages specific to a programming language (for example, C/C++ and Fortran* essentials).

The AI Tools Selector now allows you to pick the repository and packages of your choice (e.g. AI Tools, OpenVINO™ Toolkit, or Intel® Gaudi® AI accelerator), Python* versions, Intel-optimized AI libraries and frameworks depending on the use case (e.g. classical ML, deep learning, data analytics, etc.)

Watch the webinar from [00:04:40] for more details on the newly available customized packages.

→ Check out a complete list of ways to install oneAPI developer toolkits.

Fast Responsive AI Development and Deployment

As noted above, Intel gives you the flexibility to choose the AI compute resources that best fit your needs from its software stack. Building on some of the most extensively used, industry standards-based AI tools and frameworks, we continuously endeavor to optimize and accommodate them in our software stack. You can thus streamline the use of existing libraries and frameworks in your AI workloads and migrate the legacy codebase to unlock the potential of the latest Intel hardware and AI accelerators without being locked into vendor-specific hardware.

PyTorch*, TensorFlow*, and Hugging Face* are some of the major communities we contribute to and collaborate with to integrate their frameworks into Intel’s AI software stack. Using our AI tools and libraries, you can integrate several popular Large Language Models (LLMs) such as Meta* Llama, Microsoft* Phi-3, and various Hugging Face models. With the open-source OpenVINO Toolkit, you can achieve expedited AI inference and, hence, streamlined AI development.

Beginning at [00:08:35] in the webinar recording, Robert discusses PyTorch Optimizations from Intel - their distinguishing features and how they aid AI workflows. PyTorch 2.5 supports these optimizations on the latest Intel® Xeon® processors, Intel® Core™ Ultra processors with Intel® Arc™ Graphics family, and Intel® Data Center GPU Max Series. The latest SYCL-based GPU offload support is now available for PyTorch 2.4, and work is in-progress for incorporating it with PyTorch 2.5. The binary distributions of Intel® Extension for PyTorch are available as an open-source repository and are upstreamed to the PyTorch community. The PyTorch optimizations allow you to leverage high-performance libraries such as:

Intel oneAPI Deep Neural Networks Library (oneDNN) for accelerated deep learning on Intel hardware,

Intel oneAPI DPC++ Library (oneDPL) for parallel programming in C/C++ with SYCL,

Intel oneAPI Math Kernel Library (oneMKL) for accelerated math routines.

At [00:09:20], the webinar presenter also demonstrates the minimal change required in your CUDA* code to utilize the Intel GPUs and SYCL device capabilities. Watch the webinar from [00:14:00] to learn about the current and planned improvements for Intel GPU support upstreamed into the open source PyTorch community.

The backbone of all our PyTorch optimizations is formed by the Intel® Distribution for Python and the Data Parallel Extensions of Python, which enable heterogeneous computing for high-performance data science, including more accurate data analytics, faster training, and inference with machine learning models.

Parallel Computing Approach for Accelerated AI

We celebrated significant milestones in 2024:

30 years since the first release of oneMKL,

The 5th anniversary of the oneAPI initiative

1 year since the Unified Acceleration (UXL) Foundation was established.

The motivation for the UXL Foundation as a catalyst enabling open-standards based multiarchitecture, cross-vendor, accelerated parallel computing, and how it helps achieve AI acceleration were discussed in the webinar (watch the recording from [00:17:40]).

The Linux Foundation’s Joint Development Foundation brought together 30+ industry-leading companies to form the UXL Foundation. This project provides a software ecosystem for vendor-independent heterogeneous computing across all accelerators. The UXL Foundation’s software ecosystem is based on SYCL and OpenCL™ parallel computing frameworks. The oneAPI software platform and the multiarchitecture programming paradigm, with its open-source libraries and frameworks, enrich the set of building blocks of the UXL Foundation’s developer ecosystem.

The oneAPI 2025.0 release added several new features to the Intel® oneAPI DPC++/C++ Compiler, the world’s first compiler fully conformant with SYCL 2020^[1]. Some of the major updates include increased OpenMP* support, expanded compiler optimization reporting, and more efficient SYCL offload with SYCL Graph. At [00:22:15] in the webinar recording, the presenter discusses the new functionalities recently added to the compiler.

The Intel® DPC++ Compatibility Tool and its open-source counterpart SYCLomatic enable easy automated migration of CUDA to C++ with SYCL code, allowing parallel execution of the migrated code on multiarchitecture hardware from diverse vendors. Check out the webinar recording from [00:26:20] onward to learn how the migration tools accomplish the code migration process in 5 simple steps and the new features added to the SYCLomatic tool with the 2025.0 oneAPI release. 126 new CUDA library APIs can now be translated to their oneAPI library counterparts and SYCL image API extensions.

With significant improvements attained on the latest CPUs and GPUs by the oneAPI performance libraries (such as oneCCL, oneMKL, oneDAL, oneDNN, oneDPL) in and beyond the field of AI, the Intel® Cryptography Primitives Library preserves data security and privacy in AI applications that deal with massive amounts of critical data (such as cryptographic operations in multimedia, embedded systems, enterprise data, etc).

Profiling the Performance of AI Applications

The Intel® VTune™ Profiler is an efficient performance analysis and debugging tool. It allows you to conduct in-depth analysis of the performance bottlenecks at both hardware and software level, such as identifying hotspots and I/O issues, and analyzing hardware utilization and memory consumption. You can replace the source code lines causing performance loss with Intel-optimized NumPy* and Data Parallel Extensions for Python (which include Data Parallel Extension for NumPy and Data Parallel Extension for Numba*).

Beginning at [00:35:20] in the webinar recording, you will find a demonstration of profiling a simple pairwise distance calculation Python example using VTune Profiler. You will also find that example discussed in a recent recipe for ‘Profiling Data Parallel Python Applications’ in the VTune Profiler Cookbook.

VTune Profiler is compatible with Intel CPUs, GPUs and AI PC NPUs powered by Intel Core Ultra processors. The tool helps profile OpenVINO applications on any combination of these hardware architectures. Watch the webinar recording from [00:42:24] when the presenter demonstrates how to profile OpenVINO applications using VTune Profiler. You can leverage various features of the tool depending on the underlying hardware. For instance, Hotspots Analysis for CPU bottlenecks, GPU Compute/Media Hotspots Analysis for ensuring efficient GPU utilization and NPU Exploration Analysis for analyzing performance on Intel® NPUs. More details are available in a recent recipe, ‘Profiling OpenVINO Applications,’ in the VTune Profiler Cookbook.

Fig 2. Intel Core Ultra Processor NPU Exploration using Intel VTune Profiler

What’s Next?

Explore Intel Software Development Tools, a rich set of developer tools, compilers, performance libraries, CUDA to SYCL code migration tools, performance profiling tools, and much more. Develop highly productive and scalable AI, HPC, and edge computing applications on multi-vendor heterogeneous hardware with these developer resources.

Check out ‘Awesome oneAPI’ and ‘Awesome OpenVINO’ open-source GitHub repositories to learn how you can leverage various oneAPI tools/libraries and OpenVINO toolkit for practical use cases in fields including autonomous systems, Generative AI (GenAI), Natural Language Processing (NLP), gaming, manufacturing, data visualization and rendering, etc.

Make the best use of AI frameworks and tools for accelerating and optimizing real-world AI applications at each stage – data preparation, model training, inference, or fixing bottlenecks for performance improvements. Sign up for Intel® Tiber™ AI Cloud, a unified platform where you can experiment with the Intel AI tools and framework optimizations on our latest hardware, including CPUs, GPUs, and other AI accelerators.