Intel® SHMEM

PGAS Programming Implementation for Host and Device SYCL* Kernels

Overview

Intel® SHMEM is a C++ software library that enables applications to use OpenSHMEM communication APIs with device kernels implemented in SYCL*. It supports high-performance computing (HPC) and AI-focused Intel GPUs, beginning with the Intel® Data Center GPU Max Series.

Intel SHMEM implements a Partitioned Global Address Space (PGAS) programming model. This library includes a subset of host-initiated operations in the current OpenSHMEM standard and new device-initiated operations callable directly from GPU kernels.

Download the Stand-Alone Version

A stand-alone version of Intel SHMEM is available.

Download

Download as Part of the Toolkit

Intel SHMEM is included in the Intel® HPC Toolkit. Get the toolkit to analyze, optimize, and deliver applications that scale.

Get It Now

Features

A complete specification detailing the programming model, supported API, example programs, build and run instructions, and more.
Device and host API support for:
- OpenSHMEM 1.5 compliant point-to-point Remote Memory Access (RMA), atomic memory operations, signaling, memory ordering, and synchronization operations
- OpenSHMEM collective operations
Device API support for SYCL work-group and subgroup level extensions of RMA, signaling, collective, memory ordering, and synchronization operations.

Support of C++ template function routines replacing the C11 generic selection routines from the OpenSHMEM specification.
GPU Remote Direct Memory Access (RDMA) support when configured with Sandia OpenSHMEM with suitable libfabric providers for high-performance networking services.
Choice of device memory (by default) or Unified Shared Memory (USM) for the SHMEM symmetric heap.

More Information

Documentation & Code Samples

Documentation

Code Samples

View All Code Samples (GitHub)

Specifications

Related Tools

Intel® MPI Library: Optimized implementation of the MPI standard.
Intel® oneAPI Collective Communications Library: Scalable and efficient distributed training for deep neural networks.
Libfabric: Defines a communication API for high-performance parallel and distributed applications.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in