Visible to Intel only — GUID: GUID-AF3966F8-2B36-4A6F-A48B-34BE26821A22
Visible to Intel only — GUID: GUID-AF3966F8-2B36-4A6F-A48B-34BE26821A22
Overview of Intel® oneAPI Math Kernel Library (oneMKL) Sparse BLAS for DPC++
The following pages describe the oneMKL Sparse BLAS computational routines for DPC++ in detail. These routines, along with other helper routines (see Sparse BLAS Routines for the full list), are declared in the header file oneapi/mkl/spblas.hpp.
Several conventions are used throughout this document:
All oneMKL DPC++ data types and non-domain-specific functions are inside the oneapi::mkl:: namespace.
All oneMKL DPC++ Sparse BLAS functions are inside the oneapi::mkl::sparse namespace.
For brevity, the sycl namespace is omitted from DPC++ object types such as buffers and queues. For example, a single-precision, 1D buffer A would be written buffer<float,1> &A instead of sycl::buffer<float,1> &A.
Computational routines are overloaded on precision. Unless otherwise specified, all oneMKL Sparse BLAS computational routines support float, double, std::complex<float>, and std::complex<double> floating point types, and do not support mixed-precision computations yet.
oneMKL sparse BLAS domain currently does not offer bitwise-reproducibility (BWR) guarantees for most of its APIs.
For sparse matrix row and column indices, oneMKL Sparse BLAS supports std::int32_t and std::int64_t integer types for all supported matrix formats. Matrix handle creation routines are overloaded on integer types.
Some APIs require user-provided temporary workspaces. In case of sycl::buffer APIs, the temporary workspaces are of type sycl::buffer<std::uint8_t, 1> *, whereas in the case of USM APIs, they are of type void *.
For users of USM APIs, usage of oneMKL with all types of allocations (device, shared, and host) are supported; however, performance between them may differ. For maximum performance of Sparse BLAS APIs, we recommend using oneMKL with device memory allocations (sycl::malloc_device()) as much as possible except where specified otherwise, but explicit data movement associated with that is users’ responsibility.
Device Support
DPC++ supports several types of devices:
CPU device: Performs computations on a CPU using OpenCL™.
GPU device: Performs computations on a GPU using OpenCL™ or Level Zero.
Each routine details the device types that are currently supported.
In the current release of oneMKL DPC++ Sparse BLAS, all listed routines support use on CPU and GPU devices with the Compressed Sparse Row (CSR) matrix format unless otherwise noted. Limited support with the Coordinate (COO) matrix format is also available, specified in the documentation of each API.
Routine |
Description |
---|---|
Level 2: |
|
General sparse matrix-dense vector product |
|
General sparse matrix-dense vector product with fused dot product |
|
Symmetric sparse matrix-dense vector product |
|
Triangular sparse matrix-dense vector product |
|
Triangular solve of sparse matrix against a dense vector. |
|
Level 3: |
|
General sparse matrix-dense matrix product with dense matrix output |
|
Triangular solve of sparse matrix against a dense matrix. |
|
General sparse matrix-sparse matrix addition with sparse matrix output. |
|
General sparse matrix-sparse matrix product with sparse matrix output. |
|
General sparse matrix-sparse matrix product with dense matrix output. |