Getting started on CPU with Graph API

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Download PDF

ID 768875

Date 6/20/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-3C14D146-8A24-4BE1-B230-7695A76103C0

View Details

Getting started on CPU with Graph API

This is an example to demonstrate how to build a simple graph and run it on CPU.

Example code: cpu_getting_started.cpp

Some key take-aways included in this example:

how to build a graph and get partitions from it
how to create an engine, allocator and stream
how to compile a partition
how to execute a compiled partition

Some assumptions in this example:

Only workflow is demonstrated without checking correctness
Unsupported partitions should be handled by users themselves

Public headers

To start using oneDNN Graph, we must include the dnnl_graph.hpp header file in the application. All the C++ APIs reside in namespace dnnl::graph.

#include <iostream>
#include <memory>
#include <vector>
#include <unordered_map>
#include <unordered_set>

#include <assert.h>

#include "oneapi/dnnl/dnnl_graph.hpp"

#include "example_utils.hpp"
#include "graph_example_utils.hpp"

using namespace dnnl::graph;
using data_type = logical_tensor::data_type;
using layout_type = logical_tensor::layout_type;
using dim = logical_tensor::dim;
using dims = logical_tensor::dims;

cpu_getting_started_tutorial() function

Build Graph and Get Partitions

In this section, we are trying to build a graph containing the pattern conv0->relu0->conv1->relu1. After that, we can get all of partitions which are determined by backend.

To build a graph, the connection relationship of different ops must be known. In oneDNN Graph, dnnl::graph::logical_tensor is used to express such relationship. So, next step is to create logical tensors for these ops including inputs and outputs.

NOTE:

It’s not necessary to provide concrete shape/layout information at graph partitioning stage. Users can provide these information till compilation stage.

Create input/output dnnl::graph::logical_tensor for first Convolution op.

logical_tensor conv0_src_desc {0, data_type::f32};
logical_tensor conv0_weight_desc {1, data_type::f32};
logical_tensor conv0_dst_desc {2, data_type::f32};

Create first Convolution op (dnnl::graph::op) and attaches attributes to it, such as strides, pads_begin, pads_end, data_format, etc.

op conv0(0, op::kind::Convolution, {conv0_src_desc, conv0_weight_desc},
        {conv0_dst_desc}, "conv0");
conv0.set_attr<dims>(op::attr::strides, {4, 4});
conv0.set_attr<dims>(op::attr::pads_begin, {0, 0});
conv0.set_attr<dims>(op::attr::pads_end, {0, 0});
conv0.set_attr<dims>(op::attr::dilations, {1, 1});
conv0.set_attr<int64_t>(op::attr::groups, 1);
conv0.set_attr<std::string>(op::attr::data_format, "NCX");
conv0.set_attr<std::string>(op::attr::weights_format, "OIX");

Create input/output logical tensors for first BiasAdd op and create the first BiasAdd op

logical_tensor conv0_bias_desc {3, data_type::f32};
logical_tensor conv0_bias_add_dst_desc {
        4, data_type::f32, layout_type::undef};
op conv0_bias_add(1, op::kind::BiasAdd, {conv0_dst_desc, conv0_bias_desc},
        {conv0_bias_add_dst_desc}, "conv0_bias_add");
conv0_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");

Create output logical tensors for first Relu op and create the op.

logical_tensor relu0_dst_desc {5, data_type::f32};
op relu0(2, op::kind::ReLU, {conv0_bias_add_dst_desc}, {relu0_dst_desc},
        "relu0");

Create input/output logical tensors for second Convolution op and create the second Convolution op.

logical_tensor conv1_weight_desc {6, data_type::f32};
logical_tensor conv1_dst_desc {7, data_type::f32};
op conv1(3, op::kind::Convolution, {relu0_dst_desc, conv1_weight_desc},
        {conv1_dst_desc}, "conv1");
conv1.set_attr<dims>(op::attr::strides, {1, 1});
conv1.set_attr<dims>(op::attr::pads_begin, {0, 0});
conv1.set_attr<dims>(op::attr::pads_end, {0, 0});
conv1.set_attr<dims>(op::attr::dilations, {1, 1});
conv1.set_attr<int64_t>(op::attr::groups, 1);
conv1.set_attr<std::string>(op::attr::data_format, "NCX");
conv1.set_attr<std::string>(op::attr::weights_format, "OIX");

Create input/output logical tensors for second BiasAdd op and create the op.

logical_tensor conv1_bias_desc {8, data_type::f32};
logical_tensor conv1_bias_add_dst_desc {9, data_type::f32};
op conv1_bias_add(4, op::kind::BiasAdd, {conv1_dst_desc, conv1_bias_desc},
        {conv1_bias_add_dst_desc}, "conv1_bias_add");
conv1_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");

Create output logical tensors for second Relu op and create the op

logical_tensor relu1_dst_desc {10, data_type::f32};
op relu1(5, op::kind::ReLU, {conv1_bias_add_dst_desc}, {relu1_dst_desc},
        "relu1");

Finally, those created ops will be added into the graph. The graph inside will maintain a list to store all these ops. To create a graph, dnnl::engine::kind is needed because the returned partitions maybe vary on different devices. For this example, we use CPU engine.

NOTE:

The order of adding op doesn’t matter. The connection will be obtained through logical tensors.

Create graph and add ops to the graph

graph g(dnnl::engine::kind::cpu);

g.add_op(conv0);
g.add_op(conv0_bias_add);
g.add_op(relu0);

g.add_op(conv1);
g.add_op(conv1_bias_add);
g.add_op(relu1);

After adding all ops into the graph, call dnnl::graph::graph::get_partitions() to indicate that the graph building is over and is ready for partitioning. Adding new ops into a finalized graph or partitioning a unfinalized graph will both lead to a failure.

g.finalize();

After finished above operations, we can get partitions by calling dnnl::graph::graph::get_partitions().

In this example, the graph will be partitioned into two partitions:

conv0 + conv0_bias_add + relu0
conv1 + conv1_bias_add + relu1

auto partitions = g.get_partitions();

Compile and Execute Partition

In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior.

Create a dnnl::engine. Also, set a user-defined dnnl::graph::allocator to this engine.

allocator alloc {};
dnnl::engine eng
        = make_engine_with_allocator(dnnl::engine::kind::cpu, 0, alloc);

Create a dnnl::stream on a given engine

dnnl::stream strm {eng};

Compile the partition to generate compiled partition with the input and output logical tensors.

compiled_partition cp = partition.compile(inputs, outputs, eng);

Execute the compiled partition on the specified stream.

cp.execute(strm, inputs_ts, outputs_ts);

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Getting started on CPU with Graph API

Public headers

cpu_getting_started_tutorial() function