Visible to Intel only — GUID: GUID-3C14D146-8A24-4BE1-B230-7695A76103C0
Visible to Intel only — GUID: GUID-3C14D146-8A24-4BE1-B230-7695A76103C0
Getting started on CPU with Graph API
This is an example to demonstrate how to build a simple graph and run it on CPU.
This is an example to demonstrate how to build a simple graph and run it on CPU.
Example code: cpu_getting_started.cpp
Some key take-aways included in this example:
how to build a graph and get partitions from it
how to create an engine, allocator and stream
how to compile a partition
how to execute a compiled partition
Some assumptions in this example:
Only workflow is demonstrated without checking correctness
Unsupported partitions should be handled by users themselves
Public headers
To start using oneDNN Graph, we must include the dnnl_graph.hpp header file in the application. All the C++ APIs reside in namespace dnnl::graph.
#include <iostream>
#include <memory>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <assert.h>
#include "oneapi/dnnl/dnnl_graph.hpp"
#include "example_utils.hpp"
#include "graph_example_utils.hpp"
using namespace dnnl::graph;
using data_type = logical_tensor::data_type;
using layout_type = logical_tensor::layout_type;
using dim = logical_tensor::dim;
using dims = logical_tensor::dims;
cpu_getting_started_tutorial() function
Build Graph and Get Partitions
In this section, we are trying to build a graph containing the pattern conv0->relu0->conv1->relu1. After that, we can get all of partitions which are determined by backend.
To build a graph, the connection relationship of different ops must be known. In oneDNN Graph, dnnl::graph::logical_tensor is used to express such relationship. So, next step is to create logical tensors for these ops including inputs and outputs.
Create input/output dnnl::graph::logical_tensor for first Convolution op.
logical_tensor conv0_src_desc {0, data_type::f32};
logical_tensor conv0_weight_desc {1, data_type::f32};
logical_tensor conv0_dst_desc {2, data_type::f32};
Create first Convolution op (dnnl::graph::op) and attaches attributes to it, such as strides, pads_begin, pads_end, data_format, etc.
op conv0(0, op::kind::Convolution, {conv0_src_desc, conv0_weight_desc},
{conv0_dst_desc}, "conv0");
conv0.set_attr<dims>(op::attr::strides, {4, 4});
conv0.set_attr<dims>(op::attr::pads_begin, {0, 0});
conv0.set_attr<dims>(op::attr::pads_end, {0, 0});
conv0.set_attr<dims>(op::attr::dilations, {1, 1});
conv0.set_attr<int64_t>(op::attr::groups, 1);
conv0.set_attr<std::string>(op::attr::data_format, "NCX");
conv0.set_attr<std::string>(op::attr::weights_format, "OIX");
Create input/output logical tensors for first BiasAdd op and create the first BiasAdd op
logical_tensor conv0_bias_desc {3, data_type::f32};
logical_tensor conv0_bias_add_dst_desc {
4, data_type::f32, layout_type::undef};
op conv0_bias_add(1, op::kind::BiasAdd, {conv0_dst_desc, conv0_bias_desc},
{conv0_bias_add_dst_desc}, "conv0_bias_add");
conv0_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");
Create output logical tensors for first Relu op and create the op.
logical_tensor relu0_dst_desc {5, data_type::f32};
op relu0(2, op::kind::ReLU, {conv0_bias_add_dst_desc}, {relu0_dst_desc},
"relu0");
Create input/output logical tensors for second Convolution op and create the second Convolution op.
logical_tensor conv1_weight_desc {6, data_type::f32};
logical_tensor conv1_dst_desc {7, data_type::f32};
op conv1(3, op::kind::Convolution, {relu0_dst_desc, conv1_weight_desc},
{conv1_dst_desc}, "conv1");
conv1.set_attr<dims>(op::attr::strides, {1, 1});
conv1.set_attr<dims>(op::attr::pads_begin, {0, 0});
conv1.set_attr<dims>(op::attr::pads_end, {0, 0});
conv1.set_attr<dims>(op::attr::dilations, {1, 1});
conv1.set_attr<int64_t>(op::attr::groups, 1);
conv1.set_attr<std::string>(op::attr::data_format, "NCX");
conv1.set_attr<std::string>(op::attr::weights_format, "OIX");
Create input/output logical tensors for second BiasAdd op and create the op.
logical_tensor conv1_bias_desc {8, data_type::f32};
logical_tensor conv1_bias_add_dst_desc {9, data_type::f32};
op conv1_bias_add(4, op::kind::BiasAdd, {conv1_dst_desc, conv1_bias_desc},
{conv1_bias_add_dst_desc}, "conv1_bias_add");
conv1_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");
Create output logical tensors for second Relu op and create the op
logical_tensor relu1_dst_desc {10, data_type::f32};
op relu1(5, op::kind::ReLU, {conv1_bias_add_dst_desc}, {relu1_dst_desc},
"relu1");
Finally, those created ops will be added into the graph. The graph inside will maintain a list to store all these ops. To create a graph, dnnl::engine::kind is needed because the returned partitions maybe vary on different devices. For this example, we use CPU engine.
Create graph and add ops to the graph
graph g(dnnl::engine::kind::cpu);
g.add_op(conv0);
g.add_op(conv0_bias_add);
g.add_op(relu0);
g.add_op(conv1);
g.add_op(conv1_bias_add);
g.add_op(relu1);
After adding all ops into the graph, call dnnl::graph::graph::get_partitions() to indicate that the graph building is over and is ready for partitioning. Adding new ops into a finalized graph or partitioning a unfinalized graph will both lead to a failure.
g.finalize();
After finished above operations, we can get partitions by calling dnnl::graph::graph::get_partitions().
In this example, the graph will be partitioned into two partitions:
conv0 + conv0_bias_add + relu0
conv1 + conv1_bias_add + relu1
auto partitions = g.get_partitions();
Compile and Execute Partition
In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior.
Create a dnnl::engine. Also, set a user-defined dnnl::graph::allocator to this engine.
allocator alloc {};
dnnl::engine eng
= make_engine_with_allocator(dnnl::engine::kind::cpu, 0, alloc);
Create a dnnl::stream on a given engine
dnnl::stream strm {eng};
Compile the partition to generate compiled partition with the input and output logical tensors.
compiled_partition cp = partition.compile(inputs, outputs, eng);
Execute the compiled partition on the specified stream.
cp.execute(strm, inputs_ts, outputs_ts);