Supported Fusion Patterns

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Download PDF

ID 768875

Date 2/28/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-EE4B153D-2D22-4BAA-B214-0F2E31F0E96C

View Details

Supported Fusion Patterns

Fusion Patterns

The following fusion patterns are subgraphs that the oneDNN Graph API recognizes as candidate for fusion. The patterns are described using oneDNN Graph operation (op) names with the following convention.

NOTE:

oneDNN Graph performs limited input validation to minimize the performance overheads. The application is responsible for sanitizing inputs passed to the library. For large u8 or s8 inputs may lead to accumulator overflow, you can use floating point patterns instead of quantized patterns.

"+" describes a chain of two ops. The preceding op produces an output tensor, which is consumed by the following op as its first operand.

"[]" describes a component of the overall pattern description. For example, it could include a subgraph or all the op choices within the bracket.

"|" describes choices of multiple operations, say A+[B|C] means the graph partition contains A followed by B or C.

"," describes a graph composed of multiple subgraphs, each subgraph marks its output tensor explicitly, which is consumed by other subgraphs.

Superscript denotes the numbers of repetition pattern. For example, A+[B|C] means the graph partition contains A followed by three ops, each of them is either B or C. The superscript could be a range of number meaning allowing a range of repetition. If the range is between 0 and 1, we use superscript "?".

Subscript denotes the input and output tensors which need to explicitly mark the producer and consumer relation within one graph partition. For example, A +B+C refers to the pattern started with A followed by B and C, and C takes an implicit input tensor from B and an extra tensor t1 output from A. ">" refers to the output tensor, and "<" for input tensor. Input and output tensor between neighbor ops are not explicitly marked, for example, B consumes t1 implicitly in the example above.

Subscript "out" marks the output tensor of a certain op to be the output of a graph partition. For example, in A +B +C , B’s output and C’s output are marked as output tensors.

Subscript "in" marks the input tensor of a certain op to be the input of a graph partition. For example, in A +B A’s input and B’s second input are graph partition input, and they share the same input tensor in1. Most input tensors of a graph partition are not explicitly marked. For example, the input tensors of the first op are implicitly regarded as graph partition inputs. Besides, for input tensors of other ops, if they are not produced by any proceeding ops, they are regarded as implicit graph partition inputs. In the example A +B+C , A’s inputs are regarded as implicit graph partition inputs, and if B is a binary operation, the second input tensor is an implicit graph partition input.

The following categories will be used in describing fusion pattern.

Inference

Floating Point Patterns

Pattern	Description
Convolution + BiasAdd + BatchNormInference + [Unary \| Binary]	This pattern is widely used in Convolution Neural Networks, for example ResNet, ResNext, SSD, etc.
ConvTranspose + BiasAdd + [Unary \| Binary]	This pattern is widely used in Generative Adversarial Networks.
Interpolate + [Unary \| Binary]	This pattern is widely used for image processing.
MatMul + BiasAdd + [Unary \| Binary]	This pattern is widely used in language models and recommendation models, for example BERT, DLRM, etc.
Reduction + [Unary \| Binary]	This pattern is widely used for data processing, for example loss reduction.
Unary + Binary	This pattern is widely used in Convolution Neural Networks.
Binary + [Unary \| Binary]	This pattern is widely used in Generative Adversarial Networks, for example ParallelWaveGAN.
[AvgPool \| MaxPool] + Binary	This pattern is widely used in Convolution Neural Networks.
BatchNormInference + ReLU	This pattern is widely used in Convolution Neural Networks, for example DenseNet.
Reciprocal + Multiply	N/A
Reorder + Add	N/A

Quantized Patterns

Pattern	Description
Quantize + Dequantize , Dequantize , Dequantize + Convolution + BiasAdd + [Unary \| Binary ] + Quantize	N/A
Quantize + Dequantize , Dequantize , Dequantize + ConvTranspose + BiasAdd + [Unary \| Binary ] + Quantize	N/A
Quantize + Dequantize , Dequantize , Dequantize + MatMul + BiasAdd + [Unary \| Binary ] + Quantize	N/A
Dequantize + [AvgPool \| MaxPool] + Quantize	N/A
Dequantize , Dequantize + [AvgPool \| MaxPool] + Add + Quantize	N/A
Dequantize + Reorder + Quantize	N/A
Dequantize , Dequantize + Reorder + Add + Quantize	N/A

Training

Pattern	Description
ConvolutionBackwardWeights + BiasAddBackward	N/A
ReLUBackward + BatchNormTrainingBackward	N/A

All the above fusion patterns are supported by default.

Aggressive Fusion Patterns

Aggressive fusion patterns also follow the pattern description convention defined in the Fusion Patterns section.

NOTE:

Aggressive fusion patterns are only supported when Graph Compiler is enabled.

The following categories will also be used to describe aggressive fusion patterns.

ReshapeTranspose = [StaticReshape + StaticTranspose ]
Activation = [ReLU | Sigmoid | GELU]
ActivationBackward = [ReLUBackward | SigmoidBackward | GELUBackward]

Inference

Floating Point Patterns

Pattern	Description
MatMul + [Multiply \| Divide] + Add + Softmax + MatMul + StaticTranspose + Reorder	Multi-head Attention. This pattern is widely used in models containing encoder-decoder structures, for example BERT.
ReshapeTranspose , ReshapeTranspose , ReshapeTranspose + MatMul + [Multiply \| Divide] + Add + Softmax + MatMul + StaticTranspose + StaticReshape	Multi-head Attention.
MatMul + Activation , [MatMul + Activation ] , MatMul + Activation	Multi-layer Perceptron. This pattern is widely used in recommendation models, for example DLRM.
[Convolution + BiasAdd + ReLU] + Convolution + BiasAdd + Add + ReLU	Identical Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet.
Convolution + BiasAdd , [Convolution + BiasAdd + ReLU] + Convolution + BiasAdd + Add + ReLU	Convolutional Bottleneck. Enabled only in single thread runtime scenario. This pattern is widely used in Convolution Neural Networks, for example ResNet.

Quantized Patterns

Pattern	Description
Dequantize , Dequantize , Dequantize + MatMul + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul + StaticTranspose + Reorder + Quantize	Quantized Multi-head Attention.
Dequantize + ReshapeTranspose , Dequantize + ReshapeTranspose , Dequantize + MatMul + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul + StaticTranspose + StaticReshape + Quantize	Quantized Multi-head Attention.
Dequantize , Dequantize + MatMul + Activation + Quantize , [Dequantize , Dequantize + MatMul + Activation + Quantize ] , Dequantize , Dequantize + MatMul + Activation + Quantize	Quantized Multi-layer Perceptron.
Dequantize , Dequantize , [Dequantize , Dequantize + Convolution + BiasAdd + ReLU + Quantize] + Dequantize + Convolution + BiasAdd + Add + ReLU + Quantize	Quantized Identical Bottleneck. Enabled only in single thread runtime scenario.
[Dequantize , Dequantize + Convolution + BiasAdd + Quantize + Dequantize] , Dequantize , [Dequantize , Dequantize + Convolution + BiasAdd + ReLU + Quantize] + Dequantize + Convolution + BiasAdd + Add + ReLU + Quantize	Quantized Convolutional Bottleneck. Enabled only in single thread runtime scenario.

Training

Pattern	Description
Dequantize , Dequantize , Dequantize + MatMul + [Multiply \| Divide] + Add + Softmax + Quantize + Dequantize + MatMul + StaticTranspose + Reorder + Quantize	Multi-head Attention Training Forward Pattern.
StaticReshape + StaticTranspose + MatMul + Multiply + Subtract + Multiply + [Multiply \| Divide] + MatMul , Multiply + ReduceSum , MatMul , MatMul	Multi-head Attention Training Backward Pattern.
MatMul + Activation , [MatMul + Activation ] , MatMul + Activation	Multi-layer Perceptron Training Forward Pattern.
StaticTranspose , ActivationBackward + MatMul , ReduceSum , StaticTranspose + MatMul , [StaticTranspose , ActivationBackward + MatMul , ReduceSum , StaticTranspose + MatMul ] , StaticTranspose , ActivationBackward + MatMul , ReduceSum , StaticTranspose + MatMul	Multi-layer Perceptron Training Backward Pattern.
Convolution + BatchNormForwardTraining + ReLU + Convolution + BatchNormForwardTraining + ReLU + Convolution + BatchNormForwardTraining + Add + ReLU	Identical Bottleneck Training Forward Pattern.
Convolution + BatchNormForwardTraining , Convolution + BatchNormForwardTraining + ReLU + Convolution + BatchNormForwardTraining + ReLU + Convolution + BatchNormForwardTraining + Add + ReLU	Convolutional Bottleneck Training Forward Pattern.
ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + Add , ConvolutionBackwardWeights , ConvolutionBackwardWeights , ConvolutionBackwardWeights	Identical Bottleneck Training Backward Pattern.
ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + ReLUBackward + BatchNormTrainingBackward + ConvolutionBackwardData + Add , BatchNormTrainingBackward + ConvolutionBackwardData , ConvolutionBackwardWeights , ConvolutionBackwardWeights , ConvolutionBackwardWeights , ConvolutionBackwardWeights	Convolutional Bottleneck Training Backward Pattern.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Deep Neural Network Developer Guide and Reference

Supported Fusion Patterns

Fusion Patterns

Aggressive Fusion Patterns