Explore SYCL with Samples from Intel

Download PDF

ID 772037

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents x

Explore SYCL with Samples from Intel

SYCL* applications are C++ programs for parallelism that empower you with tools for data-parallel programming and heterogeneous computing. SYCL brings a uniform programming experience across various computing substrates such as CPU, GPU, FPGA, and AI accelerators by offering a consistent C++ language and Application Program Interfaces (APIs). You have the flexibility to program and utilize each architecture either individually or in combination. This approach encourages you to learn the programming model once and apply it to various accelerators. Achieving the best performance on each accelerator class necessitates tailoring and fine-tuning algorithms, yet the core language and programming model stay the same across different target devices. For in-depth information on SYCL, visit the SYCL Specification.

This guide aims to enlighten you on navigating the oneAPI programming model, focusing on selecting and refining the most suitable architecture to ensure application peak performance.

To explore FPGA-specific samples, visit the Explore SYCL Through Intel® FPGA Code Samples page.

Build and Run a Sample Project

The links below take you to the Get Started with the Intel® oneAPI Base Toolkit content for the Command Line and IDE:

Build and Run a Sample Project Using the Command Line:
- Linux
- Windows
Build and Run a Sample Project Using an IDE:
GitHub (Each sample has a link for its specific GitHub repo; these links are found in their respective sample sections below.)

Sample 1: Simple Device Offload Structure

Sample 1 introduces Vector Add as the equivalent of a Hello, World! Sample for data parallel programs. It outlines the basic structure of a SYCL application by demonstrating how to target an offload device. Sample 1 includes two source files illustrating memory management using buffers or Unified Shared Memory (USM).

Vector Add supports both GPU and FPGA device selectors.

In this sample, you will learn to utilize SYCL's basic elements (features) to offload a straightforward computation using 1D arrays to accelerators. The basic features include:

A one-dimensional array of data.
A device selector queue, buffer, accessor, and kernel.
Memory management using buffers and accessors or USM.

Visit Code Sample: Vector Add for a detailed code walkthrough.

Get the sample:

CLI or IDE sample name: vector-add
Git Repo for Vector Add Sample

Sample 2: Basic SYCL Features Defined

Sample 2 walks you through the base tenets of SYCL using a two-dimensional stencil to simulate a wave propagating in a 2D isotropic medium with:

SYCL queues (including device selectors and exception handlers).
SYCL buffers and accessors.
The ability to call a function inside a kernel definition and pass accessor arguments as pointers. A function called inside the kernel performs a computation (it updates a grid point specified by the global ID variable) for a single time step.

Visit Code Sample: Two-Dimensional Finite-Difference Wave Propagation in Isotropic Media (ISO2DFD) for a detailed code walkthrough. Visit Explore Data Parallel C++ with Samples from Intel: ISO2DFD for a detailed video walkthrough.

Get the sample:

CLI or IDE sample name: iso2dfd_dpcpp
Git Repo for ISO2DFD Sample

Sample 3: Optimizing for More Complex Applications

Sample 3 builds on the SYCL concepts reviewed in the previous sample, explaining how to apply these concepts for solving complex stencil computations in 3D. Shifting from 2D to 3D grid sizes can expose common issues in general-purpose GPU (device) programming, such as inefficient data access patterns, low flops-to-byte ratios, and low occupancy. The sample demonstrates how to employ SYCL features to address these issues and optimize performance. It uses five versions of the same code, each iteration showing performance improvements.

The sample provides step-by-step instructions that walk you through the process of adapting CPU-based code for GPU offloading with SYCL and improving performance across several iterations with the help of Intel® Advisor. It shows the use of several important SYCL features:

Local buffers and accessors (declare local memory buffers and accessors to be accessed and managed by each SYCL workgroup).
Shared local memory (SLM) optimizations.
Kernels (including parallel_for function and nd-range<3> objects).

Get the sample:

CLI or IDE sample name: iso3dfd_dpcpp
Git Repo for Guided ISO3DFD Sample

Sample 4: Introducing Synchronization

Sample 4 introduces added complexity through a vast array of moving particles interacting with a stationary grid of cells. It serves to demonstrate new SYCL features, including Synchronization (atomic operations).

This code sample demonstrates how to offload computation to an accelerator using the following SYCL tools:

SYCL queues (including device selectors and exception handlers).
SYCL buffers and accessors (communicate data between the host and the device).
SYCL kernels (including parallel_for function and range<1> objects).
SYCL atomic operations for synchronization.
API-based programming: Use oneMKL to generate random numbers.

Visit Code Sample: Particle Diffusion for a detailed code walkthrough.

Get the sample:

CLI or IDE Sample name: particle-diffusion
Git Repo for Particle Diffusion Sample

Next Steps

Code Walkthroughs

Next, try a detailed code walkthrough on the following topics:

Determine Which Code to Offload

You can use Intel® Advisor to determine which parts of your code would benefit from offloading to an accelerator. The Offload Advisor feature lets you collect performance predictor data on top of the standard profiling capabilities. It identifies code you can offload to a target device to boost your CPU-based applications' performance. The Get Started with Intel® Advisor helps you:

Optimize CPU or GPU code for memory and computes with Roofline Analysis.
Enhance vector parallelism and its efficiency.
Model, tune, and test multiple threading designs.
Develop and examine data flow and dependency computation using heterogeneous algorithms.

Transform CUDA Code into SYCL Code

With the Intel® DPC++ Compatibility Tool, a migration engine, you can convert CUDA code into standards-based SYCL code. The Get Started Guide and User Guide assist in migrating your existing CUDA applications, outlining the general workflow. The tool supports transforming programs with multiple source and header files and includes:

One-time migration support for kernels and API calls.
An inline comments guide used to produce output, which can be compiled with the Intel® oneAPI DPC++/C++ Compiler.
Command-line tools and IDE plug-ins that streamline operations.

Additional Resources

You can access tutorials, videos, and webinar replays to learn more about SYCL and the supporting tools on the Intel® oneAPI Toolkits site.

Document	Description
Intel® oneAPI Programming Guide	Learn about oneAPI and SYCL, programming models and interfaces, SYCL runtimes, APIs, and software development processes.
Documentation Library	Look through our content to search for specific documents.
Explore SYCL Through Intel® FPGA Code Samples	Look through the FPGA code samples for more in-depth information.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in