Single op partition on GPU

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Download PDF

ID 768875

Date 6/20/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-910F9E60-772F-4B0A-BC74-26CDA09DEECF

View Details

Single op partition on GPU

This is an example to demonstrate how to build a simple op graph and run it on gpu.

Example code: gpu_single_op_partition.cpp

Some key take-aways included in this example:

how to build a single-op partition quickly
how to create an engine, allocator and stream
how to compile a partition
how to execute a compiled partition

Some assumptions in this example:

Only workflow is demonstrated without checking correctness
Unsupported partitions should be handled by users themselves

Public headers

To start using oneDNN Graph, we must include the dnnl_graph.hpp header file in the application. All the C++ APIs reside in namespace dnnl::graph.


#include "oneapi/dnnl/dnnl_graph.hpp"
#include "oneapi/dnnl/dnnl_graph_sycl.hpp"
#include "oneapi/dnnl/dnnl_sycl.hpp"
using namespace dnnl::graph;
using namespace sycl;

#include 
#include <iostream>
#include <memory>
#include <vector>
#include <unordered_map>
#include <unordered_set>

#include "example_utils.hpp"
#include "graph_example_utils.hpp"

using namespace dnnl::graph;
using data_type = logical_tensor::data_type;
using layout_type = logical_tensor::layout_type;
using dim = logical_tensor::dim;
using dims = logical_tensor::dims;

gpu_single_op_partition_tutorial() function

Build Graph and Get Partitions

In this section, we are trying to create a partition containing the single op matmul without building a graph and getting partition.

Create first Matmul op (dnnl::graph::op) and attaches attributes to it, including transpose_a and transpose_b.


logical_tensor matmul_src0_desc {0, data_type::f32};
logical_tensor matmul_src1_desc {1, data_type::f32};
logical_tensor matmul_dst_desc {2, data_type::f32};
op matmul(0, op::kind::MatMul, {matmul_src0_desc, matmul_src1_desc},
        {matmul_dst_desc}, "matmul");
matmul.set_attr<bool>(op::attr::transpose_a, false);
matmul.set_attr<bool>(op::attr::transpose_b, false);

Compile and Execute Partition

In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior.

Create a dnnl::graph::allocator with two user-defined dnnl_graph_sycl_allocate_f and dnnl_graph_sycl_deallocate_f call-back functions.


allocator alloc = sycl_interop::make_allocator(
        sycl_malloc_wrapper, sycl_free_wrapper);

Define SYCL queue (code outside of oneDNN graph)


sycl::queue q = (ekind == engine::kind::gpu)
        ? sycl::queue(
                sycl::gpu_selector_v, sycl::property::queue::in_order {})
        : sycl::queue(
                sycl::cpu_selector_v, sycl::property::queue::in_order {});

Create a dnnl::engine based on SYCL device and context. Also, set a user-defined dnnl::graph::allocator to this engine.


dnnl::engine eng = sycl_interop::make_engine_with_allocator(
        q.get_device(), q.get_context(), alloc);

Create a dnnl::stream on a given engine


dnnl::stream strm = dnnl::sycl_interop::make_stream(eng, q);

Skip building graph and getting partition, and directly create the single-op partition


partition part(matmul, dnnl::engine::kind::cpu);

Compile the partition to generate compiled partition with the input and output logical tensors.


compiled_partition cp = part.compile(inputs, outputs, eng);

Execute the compiled partition on the specified stream.


cp.execute(strm, inputs_ts, outputs_ts);

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Single op partition on GPU

Public headers

gpu_single_op_partition_tutorial() function