Verbose Mode

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Download PDF

ID 768875

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-368FCFFD-45B5-4BDB-BC3E-C76CAE4CB431

View Details

Verbose Mode

It is often useful to collect information about how much of an application runtime is spent executing oneDNN primitives and which of those take the most time. oneDNN verbose mode enables tracing execution of oneDNN primitives and collection of basic statistics like execution time and primitive parameters. When verbose mode is enabled oneDNN will print out information to stdout.

Build-Time Controls

At build-time, support for this feature is controlled via cmake option ONEDNN_VERBOSE.

CMake Option	Supported values (defaults in bold)	Description
ONEDNN_VERBOSE	ON , OFF	Enables verbose mode

Run-Time Controls

When the feature is enabled at build-time, the ONEDNN_VERBOSE environment variable can be used to turn verbose mode on and control the level of verbosity.

Environment variable	Value	Description
ONEDNN_VERBOSE	0 1 2	no verbose output (default) primitive information at execution primitive information at creation and execution
ONEDNN_VERBOSE_TIMESTAMP	0 1	display timestamps disabled (default) display timestamps enabled

This feature can also be managed at run-time with the following functions:

dnnl_set_verbose

The function setting takes precedence over the environment variable.

Example

Enable ONEDNN_VERBOSE

ONEDNN_VERBOSE=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"

This produces the following output (the line breaks were added to fit the page width):

dnnl_verbose,info,DNNL v1.3.0 (commit d0fc158e98590dfad0165e568ca466876a794597)
dnnl_verbose,info,cpu,runtime:OpenMP
dnnl_verbose,info,cpu,isa:Intel AVX2
dnnl_verbose,info,gpu,runtime:none
dnnl_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,0.0200195
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,,,16x16x5x5,0.0251465
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,0.0180664
dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,,,16,0.0229492
dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.0390625
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,,,2x16x7x7,0.173096

Enable ONEDNN_VERBOSE with timestamps

ONEDNN_VERBOSE=1 ONEDNN_VERBOSE_TIMESTAMP=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"

This produces the following output (the line breaks were added to fit the page width):

dnnl_verbose,info,oneDNN v2.0.0 (commit N/A)
dnnl_verbose,info,cpu,runtime:OpenMP
dnnl_verbose,info,cpu,isa:Intel AVX2
dnnl_verbose,info,gpu,runtime:none
dnnl_verbose,info,prim_template:timestamp,operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
dnnl_verbose,1607393146348.667969,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,3.58594
dnnl_verbose,1607393146356.743896,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,,,16x16x5x5,3.63916
dnnl_verbose,1607393146364.541992,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,,,2x16x7x7,2.35693
dnnl_verbose,1607393146367.198975,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,,,16,3.71191
dnnl_verbose,1607393146371.002930,exec,cpu,convolution,jit:avx2,forward_training,src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,,alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,3.93018
dnnl_verbose,1607393146380.145020,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,,,2x16x7x7,1.75708

Decrypting the Output

The first lines of verbose information, which are denoted with info, contain the build version and git hash, if available, as well as CPU and GPU runtimes, the supported instruction set architecture and the verbose output format template since amount of fields may vary depeding on set of enviroment variables enabled.

Each subsequent line of verbose information is formatted as a comma-separated list contains, in order of appearance in the line from left to right:

dnnl_verbose marker string
if ONEDNN_VERBOSE_TIMESTAMP=1 is specified, start time of the call. On Linux this number represents amount of milliseconds since Unix epoch. On Windows this number represents amount of milliseconds since the last system start.
operation: create:<cache_hit|cache_miss|from_cache_blob> or exec
engine kind: cpu or gpu (cpu2gpu or gpu2cpu for cross-engine reorder)
primitive name: convolution, reorder, sum, etc
primitive implementation
propagation kind: forward_training, forward_inference, backward, etc
information about all operation tensors (separated by space)
primitive attributes
auxiliary information like algorithm name or number of inputs
a problem description in benchdnn format
execution time in milliseconds

The information about a particular operation tensors has the following format: tensor_name _ data_type :: format_kind : format_tag : extra_flags, where:

tensor_name is one of the tensors names listed in the Naming Conventions, and denotes a tensor supported by the corresponding primitive
data_type, format_kind and format_tag denote values from dnnl::memory::data_type, dnnl::memory::format_kind and dnnl::memory::format_tag respectively. Note, that certain markers may be missing in some cases, such as format_tag for the tensor for the Winograd convolution.
extra_flags is unspecified information that is intended for development purposes

Please see the profiling example here, as it uses ONEDNN_VERBOSE output to tune oneDNN code to align with best practices.

NOTE:

When oneDNN verbose mode is enabled with GPU engines, oneDNN adds extra stream synchronization on entry and on exit in the dnnl::primitive::execute() call. The execution time is calculated based on wall time measured before and after primitive execution.

NOTE:

When oneDNN verbose mode is enabled for builds with Compute Library for the Arm architecture, any failures in the validation of Compute Library primitives will be detailed in the verbose output.

WARNING:

Verbose mode has non-negligible performance impact especially on GPU or if the output rate is high.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Verbose Mode

Build-Time Controls

Run-Time Controls

Example

Decrypting the Output