Detect Dynamic Shapes

Dynamic Shapes and How to Detect Them

Dynamic shapes describe model behaviors that are based on how dynamic input data or operators affect the generation of variable output tensor shapes.

Usually dynamicity introduces recompilations, which slow down running the model. To optimize a model's speed:

Identify whether the model has dynamic inputs or ops.
If possible, mitigate the issues by following the steps in Handling Dynamic Shapes.

This article discusses some utilities to detect dynamic inputs and ops.

Types of Dynamicity

Two main areas generate dynamic shapes:

Inputs: The result of varying input shapes during training, such as varying sentence lengths in language models or differing image resolutions in an image model.
Ops: Occur for ops whose output shape depends on the actual input data, rather than only the input shapes. In other words, ops that have noninferable output shapes for given input shapes.

Look for Dynamicity

To look for dynamicity associated with a model, users can check for general recompilations by enabling the Dynamic Shape automated support feature in Intel® Gaudi® products. Use this feature by:

Step 1) Enable by setting the appropriate environment variables before executions:

export PT_HPU_METRICS_FILE=/root/metricslog.json
export PT_HPU_METRICS_DUMP_TRIGGERS=process_exit,metric_change

Enabling this feature gives a broad sense of the recompilations in the model. Running models with this feature enabled creates a metricslog.json file that shows how often the graph_compilation method is called. After a few steps, a reduction in recompilations is expected for static graphs.

Step 2) If recompilations continue to exist, to enable automated Dynamic Shape control in Intel® Gaudi® products, set the following:


export PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES=1

Setting this environment variable enables the PyTorch* bridge in the Intel® Gaudi® products and Graph Compiler to automatically manage dynamic shapes in model scripts. The graphs are automatically bucketed and padded into ranges to achieve a common size, reducing recompilations and improving performance when working with dynamic workloads.

If recompilations continue to exist or instability is still causing unacceptable performance, go to the next section.

Deeper Analysis of Model Data and Ops

The rest of this tutorial covers how to analyze a model with utilities that allow users to pinpoint areas of dynamicity and make targeted improvements.

Detect Dynamic Inputs

The data_dynamicity utility accepts a torch data loader and produces a report on the number of distinct input shapes. This allows users to evaluate if a dataset introduces low or high dynamicity because of the inputs. For more strategies on mitigating high-input dynamicity by padding, see the Text Datasets section below.

Note All examples in this article must be executed in a supported Pytorch* Libary Docker* Image for Intel® Gaudi® Accelerators.

Image Datasets - Low-Input Dynamicity

Consider the following example using a MNIST dataset with 60,000 training images and a specified batch size of 7:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from habana_frameworks.torch.utils.experimental import data_dynamicity
import torchvision
from torch.utils.data import DataLoader

# Creating a sample MNIST dataloader
mnist_ds = torchvision.datasets.MNIST('mnist', download=True, transform=torchvision.transforms.ToTensor())
mnist_dl = DataLoader(mnist_ds, batch_size=7, num_workers=2)

# Call the dataloader dynamicity utility on the dataloader
res = data_dynamicity(mnist_dl)

During training, two batch sizes will be encounterd because 60,000%7 = 3, meaning that the last batch will only contain 3 images. The dynamicity of the input is low in this case, and has a small affect on performance. Running the code produces the following report on dynamicity:

==============================================================================
|Shape                                               |Count                  |
==============================================================================
|((7, 1, 28, 28), (7,))                              |8571                   |
------------------------------------------------------------------------------
|((3, 1, 28, 28), (3,))                              |1                      |
------------------------------------------------------------------------------

Number of unique shapes:  2

The dynamicity of the input is low in this case, and has a small affect on performance.

Image Datasets - High-Input Dynamicity

Now consider the Flowers 102 dataset, which has images of 29 different input shapes. This example requires the scipy python package.


pip install scipy

Use the following script to analyze the Flowers 102 dataset:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from habana_frameworks.torch.utils.experimental import data_dynamicity
import torchvision
from torch.utils.data import DataLoader
import torch

# Join a list of images/labels into a single batched tensor
# In this case we find the image with the largets dimensions in the batch,
# and then pad everything else to that size
def collate(batch):
   dim1 = min([k[0].shape[1] for k in batch])
   dim2 = min([k[0].shape[2] for k in batch])
   images = torch.stack([k[0][:,:dim1,:dim2] for k in batch])
   labels = torch.tensor([k[1] for k in batch])
   return (images,labels)

flowers_ds = torchvision.datasets.Flowers102('flowers', download=True, transform=torchvision.transforms.ToTensor())
flowers_dl = DataLoader(flowers_ds, batch_size=7, num_workers=2, collate_fn=collate)
res = data_dynamicity(flowers_dl)

Running the example produces the following output:

===================================================================================
|Shape                                                      |Count                |
===================================================================================
|((7, 3, 500, 500), (7,))                                   |111                  |
-----------------------------------------------------------------------------------
|((7, 3, 500, 667), (7,))                                   |5                    |
----------------------------------------------------------------------------------
|((7, 3, 500, 528), (7,))                                   |2                    |
-----------------------------------------------------------------------------------
|((7, 3, 500, 501), (7,))                                   |2                    |
----------------------------------------------------------------------------------
|((7, 3, 500, 542), (7,))                                   |2                    |
--------------------------------------------------------------------------------
|((7, 3, 500, 592), (7,))                                   |1                    |
----------------------------------------------------------------------------------
***
***
|((7, 3, 500, 549), (7,))                                   |1                    |
-----------------------------------------------------------------------------------
|((5, 3, 500, 500), (5,))                                   |1                    |
-----------------------------------------------------------------------------------
Number of unique shapes:  29

This output shows there is a lot of dynamicity in input data shapes.

Depending on the use case, user can bucket images to certain fixed sizes or resize/crop them to a single shape to reduce dynamicity. A center-crop solution is shown in the following example, which makes the Flowers 102 dataset more static:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from habana_frameworks.torch.utils.experimental import data_dynamicity
import torchvision
from torch.utils.data import DataLoader
import torch

def collate(batch):
   images = torch.stack([k[0] for k in batch])
   labels = torch.tensor([k[1] for k in batch])
   return (images,labels)

# Center crop to a fixed size, applied as a transform
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.CenterCrop((300,300))])
flowers_ds = torchvision.datasets.Flowers102('flowers', download=True, transform=transform)
flowers_dl = DataLoader(flowers_ds, batch_size=7, num_workers=2, collate_fn=collate)
res = data_dynamicity(flowers_dl)

This method produces much more manageble input dynamicity:

==============================================================================
|Shape                                                 |Count                |
==============================================================================
|((7, 3, 300, 300), (7,))                              |145                  |
------------------------------------------------------------------------------
|((5, 3, 300, 300), (5,))                              |1                    |
------------------------------------------------------------------------------
Number of unique shapes:  2

Text Datasets

Since sentence sizes vary greatly, there is often high input dynamicity for text datasets. The following example has 443 different shapes for a SQUAD dataset when batching with batch size = 7.

First, install the datasets and transformers python packages:

pip install datasets
pip install transformers

In this script, each batch is padded to the largest sentence size:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer
import torch
from habana_frameworks.torch.utils.experimental import data_dynamicity

# Pad to max length sentence in each batch
def collate(batch):
    def pad(item, val, maxlen):
        return torch.tensor([i + [val]*(maxlen-len(i)) for i in item])
    token = [k['token_type_ids'] for k in batch]
    attention = [k['attention_mask'] for k in batch]
    inp = [k['input_ids'] for k in batch]
    token_lens = [len(i) for i in token]
    # Find the max length sentence in this batch
    max_len = max(token_lens)
    assert token_lens == [len(i) for i in attention] == [len(i) for i in inp]
    return {'token_type_ids': pad(token, 0, max_len), 'attention_mask': pad(attention, 0, max_len), 'input_ids': pad(inp, 0, max_len)}

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

squad_dataset = load_dataset('squad')
tokenized_dataset = squad_dataset.map(lambda x: tokenizer(x['context']), batched=True)

dt = DataLoader(tokenized_dataset['train'], batch_size=7, num_workers=2, collate_fn=collate)
res = data_dynamicity(dt)

This example generates the following output, demonstrating that text datasets have a lot of dynamicity in input data shapes:

=================================================================================================================
|Shape                                                                                                   |Count |
=================================================================================================================
|((-1023680607561683160, (7, 160)), (-4748259973688274144, (7, 160)), (-5213422677791015773, (7, 160)))  |114   |
-----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 145)), (-4748259973688274144, (7, 145)), (-5213422677791015773, (7, 145)))  |109   |
-----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 143)), (-4748259973688274144, (7, 143)), (-5213422677791015773, (7, 143)))  |108   |
------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 180)), (-4748259973688274144, (7, 180)), (-5213422677791015773, (7, 180)))  |107   |
----------------------------------------------------------------------------------------------------------------
***
***
|((-1023680607561683160, (7, 149)), (-4748259973688274144, (7, 149)), (-5213422677791015773, (7, 149)))  |99    |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 513)), (-4748259973688274144, (7, 513)), (-5213422677791015773, (7, 513)))  |1     |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 431)), (-4748259973688274144, (7, 431)), (-5213422677791015773, (7, 431)))  |1     |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (1, 159)), (-4748259973688274144, (1, 159)), (-5213422677791015773, (1, 159)))  |1     |
----------------------------------------------------------------------------------------------------------------
Number of unique shapes:  443

A simple way to massage the text input into static shapes is to pad the data to the longest sentence length. This can be inefficient computationally, however, because the compute effort on the padded sections (which are thrown away later) are wasted. But it will greatly reduce dynamicity in input data shapes. The next example shows results for the same SQUAD dataset padded to maximum sentence length:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer
import torch
from habana_frameworks.torch.utils.experimental import data_dynamicity

# Pad to max sentence length in the whole dataset
def get_collate(max_sentence):
    def collate(batch):
        def pad(item, val):
            return torch.tensor([i + [val]*(max_sentence-len(i)) for i in item])
        token = [k['token_type_ids'] for k in batch]
        attention = [k['attention_mask'] for k in batch]
        inp = [k['input_ids'] for k in batch]
        return {'token_type_ids': pad(token, 0), 'attention_mask': pad(attention, 0), 'input_ids': pad(inp, 0)}
    return collate

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

squad_dataset = load_dataset('squad')
tokenized_dataset = squad_dataset.map(lambda x: tokenizer(x['context']), batched=True)
# Find max sentence length in the whole dataset
max_sentence = max([len(dt['input_ids']) for dt in tokenized_dataset['train']])
dt = DataLoader(tokenized_dataset['train'], batch_size=7, num_workers=2, collate_fn=get_collate(max_sentence))
res = data_dynamicity(dt)

The resulting output shows a marked decrease in dynamicity, as expected:

==================================================================================================================
|Shape                                                                                                    |Count |
==================================================================================================================
|((-1023680607561683160, (7, 867)), (-4748259973688274144, (7, 867)), (-5213422677791015773, (7, 867)))   |12514 |
------------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (1, 867)), (-4748259973688274144, (1, 867)), (-5213422677791015773, (1, 867)))   |1     |
------------------------------------------------------------------------------------------------------------------
Number of unique shapes:  2

Another useful method to reduce dynamicity is bucketing. Bucketing can reduce compilation overhead but avoids wasting computation introduced by padding to the longest sentence. Select a hyperparameter (the number of buckets) and, in the dataset, use an algorithm to divide into buckets the range between the lengths of the shortest and the longest sentences. Then, for each batch, find the longest sentence in the batch and pad each input to a bucket size that is slightly larger than the size of the longest sentence.

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer
import torch
from habana_frameworks.torch.utils.experimental import data_dynamicity
import numpy as np

def get_buckets(sizes, num_buckets):
   buckets = np.unique(
      np.percentile(
            sizes,
            np.linspace(0, 100, num_buckets + 1),
            interpolation="lower",
      )[1:]
   )
   return buckets

# Find the largest sentence in the batch
# Then find the bucket just larger than it, and pad everything to that
def get_collate(buckets):
    def collate(batch):
        def pad(item, val):
            max_in_batch = max([len(i) for i in item])
            nearest_bucket = np.where(buckets>=max_in_batch)[0][0]
            return torch.tensor([i + [val]*(buckets[nearest_bucket]-len(i)) for i in item])
        token = [k['token_type_ids'] for k in batch]
        attention = [k['attention_mask'] for k in batch]
        inp = [k['input_ids'] for k in batch]
        return {'token_type_ids': pad(token, 0), 'attention_mask': pad(attention, 0), 'input_ids': pad(inp, 0)}
    return collate

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

squad_dataset = load_dataset('squad')
tokenized_dataset = squad_dataset.map(lambda x: tokenizer(x['context']), batched=True)
buckets = get_buckets([len(dt['input_ids']) for dt in tokenized_dataset['train']], 5)
dt = DataLoader(tokenized_dataset['train'], batch_size=7, num_workers=2, collate_fn=get_collate(buckets))
res = data_dynamicity(dt)

This is the resulting output:

===============================================================================================================
|Shape                                                                                                |Count |
================================================================================================================
|((-1023680607561683160, (7, 867)), (-4748259973688274144, (7, 867)), (-5213422677791015773, (7, 867))) |4543  |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 207)), (-4748259973688274144, (7, 207)), (-5213422677791015773, (7, 207))) |3350  |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 164)), (-4748259973688274144, (7, 164)), (-5213422677791015773, (7, 164))) |2277  |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 138)), (-4748259973688274144, (7, 138)), (-5213422677791015773, (7, 138))) |1388  |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (7, 114)), (-4748259973688274144, (7, 114)), (-5213422677791015773, (7, 114))) |956   |
----------------------------------------------------------------------------------------------------------------
|((-1023680607561683160, (1, 164)), (-4748259973688274144, (1, 164)), (-5213422677791015773, (1, 164))) |1     |
----------------------------------------------------------------------------------------------------------------
Number of unique shapes:  6

The model now has only six input shapes, where the sentence lengths have been separated into buckets and then used the smallest amount of padding possible to fill each bucket.

Note For a more advanced example of bucketing, see this case study using Wav2vec.

Detect Dynamic Ops

With the ability to detect dynamic inputs, it is now possible to detect dynamic ops in models. Dynamic ops are operations whose output shapes cannot be predicted just from knowing the input shapes.

Consider the following example: A simple toy model is run for five steps. The input shape changes at the fourth step which usually results in a recompilation. However, the model itself has dynamic ops, so the utility identifies the module, which might be dynamic.

The following code example can be copied into a Python* file dyn_ops.py, and then run in the terminal window.

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from habana_frameworks.torch.utils.experimental import detect_recompilation_auto_model
import torch

class InnerNet(torch.nn.Module):
   def __init__(self):
      super(InnerNet, self).__init__()
      self.conv = torch.nn.Conv2d(1, 8, 3, 3)

   def forward(self, x):
      x = torch.flatten(self.conv(x), 1)
      x = x[x>0] # This is dynamic
      return x.sum()

net = torch.nn.Sequential(torch.nn.ReLU(), InnerNet()).to('hpu')
net = detect_recompilation_auto_model(net) # wrap model in dynamic op detection utility

for bs in [20,20,30,30]: #Input shape changes at 3rd step
   inp = torch.rand(bs, 1, 50, 50).to('hpu')
   print(net(inp))  
net.analyse_dynamicity() # Call this after a few steps to generate the dynamicity report

The detect_recompilation_auto_model utility outputs two tables and corresponding .csv files.

The first table shows what happens at each step, while the second table shows which module or submodule recompiled the most times. Let's analyze the first table:

Step	Recompiling Modules	New In	New Out	Class	Location	Comment
0	Net/0	True	True	torch.nn.modules.activation.ReLU	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py	Recompiled due to new input shape
0	Net/1/conv	True	True	torch.nn.modules.conv.Conv2d	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py	Recompiled due to new input shape
0	Net/1	True	True	main.InnerNet	dyn_ops.py	Recompiled due to new input shape
0	Net	True	True	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Recompiled due to new input shape
1	Net/1	False	False	main.InnerNet	dyn_ops.py	Already processed input shape still recompiled. The issue may be dynamic ops.
1	Net	False	False	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Already processed input shape still recompiled. The issue could be due to dyn ops or a dynamic child.
2	Net/0	True	True	torch.nn.modules.activation.ReLU	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py	Recompiled due to new input shape
2	Net/1/conv	True	True	torch.nn.modules.conv.Conv2d	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py	Recompiled due to new input shape
2	Net/1	True	False	main.InnerNet	dyn_ops.py	Recompiled due to new input shape
2	Net	True	False	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Recompiled due to new input shape
3	Net/1	False	False	main.InnerNet	dyn_ops.py	Already processed input shape still recompiled. The issue may be dynamic ops.
3	Net	False	False	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Already processed input shape still recompiled. The issue could be due to dyn ops or a dynamic child.

Step 0) Shows all modules to recompile (since it is the first step).

Step 1) Shows an InnerNet and Net recompile. The Comment column shows:

InnerNet might be dynamic because it recompiled even without dynamic children modules.
Net might not be dynamic. It might have recompiled because its child (InnerNet) recompiled.

Step 2) A new input shape is seen, so every module recompiles as expected (shown in the Comment column).

Step 3) Indicates InnerNet has dynamic ops. As a result, possible outputs from the tool (as shown in the Comment column) are:

Recompiled due to new input shape.
An already-processed input shape recompiled and has a new output shape that may be a result of dynamic ops or a dynamic child.
An already-processed input shape is recompiled and maybe the issue is related dynamic ops.
An already-processed input shape recompiled and has a new output shape, possibly as a result of dynamic ops.

Note This tool takes a long time to run, so we recommend doing the following:

Run for a short number of steps.
Run on a single Intel® Gaudi® card (without distributed).
While the tool can detect and ignore recompilation due to inputs, it is recommended to pass in the same shape inputs where possible to save time running the tool.
With static inputs, the tool can focus only on finding dynamic ops.

In the next example, the dynamic portion of the code is replaced with a static equivalent:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

from habana_frameworks.torch.utils.experimental import detect_recompilation_auto_model
import torch

class InnerNet(torch.nn.Module):
   def __init__(self):
      super(InnerNet, self).__init__()
      self.conv = torch.nn.Conv2d(1, 8, 3, 3)

   def forward(self, x):
      x = torch.flatten(self.conv(x), 1)
      #x = x[x>0] # This is dynamic, replacing in next line with static implementation
      x = torch.where(x>0, x, torch.zeros_like(x))
      return x.sum()

net = torch.nn.Sequential(torch.nn.ReLU(), InnerNet()).to('hpu')
net = detect_recompilation_auto_model(net)

for bs in [20,20,30,30]: #Input shape changes at 4th step
   inp = torch.rand(bs, 1, 50, 50).to('hpu')
   print(net(inp))
net.analyse_dynamicity() # Call this after a few steps to generate the dynamicity report

On running the detect_recompilation_auto_model utility, only dynamicity from inputs appears:

Step	Recompiling Modules	New In	New Out	Class	Location	Comment
0	Net/0	True	True	torch.nn.modules.activation.ReLU	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py	Recompiled due to new input shape
0	Net/1/conv	True	True	torch.nn.modules.conv.Conv2d	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py	Recompiled due to new input shape
0	Net/1	True	True	main.InnerNet	dyn_ops_static.py	Recompiled due to new input shape
0	Net	True	True	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Recompiled due to new input shape
2	Net/0	True	True	torch.nn.modules.activation.ReLU	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py	Recompiled due to new input shape
2	Net/1/conv	True	True	torch.nn.modules.conv.Conv2d	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py	Recompiled due to new input shape
2	Net/1	True	False	main.InnerNet	dyn_ops_static.py	Recompiled due to new input shape
2	Net	True	False	torch.nn.modules.container.Sequential	/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py	Recompiled due to new input shape

Detect Dynamic Sections

This example focuses on detecting dynamic sections in a real model, Faster RCNN.

First, download the coco128 dataset and expand the archive:

wget https://ultralytics.com/assets/coco128.zip
unzip coco128.zip

Now, run the following code:

# Enable PT_HPU_LAZY_MODE=1 for this example
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

import torchvision, os
from PIL import Image
import torchvision.transforms as T

import habana_frameworks.torch.core as htcore
device = 'hpu'

#load model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval() # set to evaluation mode
model = model.to(device) # move model to device

from habana_frameworks.torch.utils.experimental import detect_recompilation_auto_model
model = detect_recompilation_auto_model(model, waittime=0.3)

for idx, k in enumerate(os.listdir('coco128/images/train2017/')):
    img = Image.open('coco128/images/train2017/' + k).resize((600,600))
    img = T.ToTensor()(img).to(device)
    print('inp shape:', img.shape)
    pred = model([img])
    htcore.mark_step()
    if idx == 6: # just running first few images
        break
    print('done img', idx)
model.analyse_dynamicity()

The outputs show the following:

Step	Recompiling Modules	New In	New Out	Class	Location	Comment
1	Net/roi_heads/box_roi_pool	False	False	torchvision.ops.poolers.MultiScaleRoIAlign	/usr/local/lib/python3.8/dist-packages/torchvision/ops/poolers.py	The already processed input shape was still recompiled. The issue may be dynamic ops.
1	Net/roi_heads	False	True	torchvision.models.detection.roi_heads.RoIHeads	/usr/local/lib/python3.8/dist-packages/torchvision/models/detection/roi_heads.py	Already processed input shape still recompiled and has new output shape. The issue may be dynamic ops.

This information relays that the MultiScaleRoIAlign and RoIHeads classes have some dynamic ops. As a next step, users can rewrite these sections as static or move the operation to a CPU. For other strategies, see Mitigation Techniques for Dynamic Ops.

Licensed under the Apache* License, Version 2.0 (the “License”) You may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Detect Dynamic Shapes

Dynamic Shapes and How to Detect Them

Types of Dynamicity

Look for Dynamicity

Deeper Analysis of Model Data and Ops

Detect Dynamic Inputs

Image Datasets - Low-Input Dynamicity

Image Datasets - High-Input Dynamicity

Text Datasets

Detect Dynamic Ops

Detect Dynamic Sections

You May Also Like:

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Detect Dynamic Shapes

Dynamic Shapes and How to Detect Them

Types of Dynamicity

Look for Dynamicity

Deeper Analysis of Model Data and Ops

Detect Dynamic Inputs

Image Datasets - Low-Input Dynamicity

Image Datasets - High-Input Dynamicity

Text Datasets

Detect Dynamic Ops

Detect Dynamic Sections

You May Also Like:

Product and Performance Information