Detect Frequent Graph Recompilations

Optimize with Intel® Gaudi® AI Accelerators

Create new deep learning models or migrate existing code in minutes.
Deliver generative AI performance with simplified development and increased productivity.

Learn more

In training workloads, recompilations can sometimes happen. This can create system latency and slow down the overall training process with multiple iterations of graph compilation. This blog focuses on detecting these graph recompilations.

If you experience graph recompilations with your model, you can find possible solutions toward eliminating potential latency in the Handling Dynamic Shapes guide.

The Habana bridge provides the functionality to compile a Intel® Gaudi® software graph and launch the resulting recipe in an asynchronous method. The recipes are cached by the Habana bridge to avoid recompilation of the same graph. This caching is done at an eager op level as well as at a JIT graph level. During training, the graph compilation is only required for the initial iteration; thereafter the same compiled recipe is rerun every iteration (with new inputs) unless there is a change in the operations being run.

In some cases, Intel Gaudi software must recompile the graph. This is mostly due to dynamic shaped input such as varying sentence lengths in language models or differing image resolutions in image model. Frequent graph recompilations can lead to a longer time to train.

Following is the step-by-step process for detecting frequent graph recompilations on the Intel Gaudi software platform.

Start Docker*

Make sure to use the latest PyTorch* container.

docker pull vault.habana.ai/gaudi-docker/1.9.0/ubuntu20.04/habanalabs/pytorch-installer-1.13.1:latest

command prompt

docker run -it \
--runtime=habana \
-e HABANA_VISIBLE_DEVICES=all \
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
--cap-add=sys_nice \
--net=host \
--ipc=host vault.habana.ai/gaudi-docker/1.9.0/ubuntu20.04/habanalabs/pytorch-installer-
1.13.1:latest

Prepare the Model

This tutorial uses an MNIST example model.

Clone the Model References repository inside the container that you just started:

git clone https://github.com/HabanaAI/Model-References.git

command prompt

Move to the subdirectory containing the hello_world example:

cd Model-References/PyTorch/examples/computer_vision/hello_world/

Update PYTHONPATH to include Model-References repository and set PYTHON to python8 executable:

export PYTHONPATH=$PYTHONPATH:Model-References
export PYTHON=/usr/bin/python3.8

Training on a Single Intel® Gaudi® Processor

Run training on a single Intel® Gaudi® processor in BF16 with Habana mixed precision enabled:

$PYTHON mnist.py --batch-size=128 --epochs=1 --lr=1.0 \
      --gamma=0.7 --hpu --hmp \
      --hmp-bf16=ops_bf16_mnist.txt \
      --hmp-fp32=ops_fp32_mnist.txt \
      --use_lazy_mode

command prompt

Detect Recompilations

Add the code to model artificial dynamicity by constantly changing the batch size. To detect the frequent recompilations, use the Metric APIs.

Edit the mnist.py file, and in line 61, add the line:

from habana_frameworks.torch.hpu.metrics import metric_global
gc_metric = metric_global("graph_compilation")
print("graph_compilation: ", gc_metric.stats())

command prompt

Now run the training code again. You will notice some graph compilations occur.

$PYTHON mnist.py --batch-size=128 --epochs=1 --lr=1.0 \
      --gamma=0.7 --hpu --hmp \
      --hmp-bf16=ops_bf16_mnist.txt \
      --hmp-fp32=ops_fp32_mnist.txt \
      --use_lazy_mode

command prompt

Now add some artificial dynamicity by adding the following code at line 46

index = max(-batch_idx -1, -args.batch_size + 1)
data, target = data[:-index, :, :, :], target[:-index]

command prompt

While running the training code again, you can see that graph compilation is much more frequent.

$PYTHON mnist.py --batch-size=128 --epochs=1 --lr=1.0 \
      --gamma=0.7 --hpu --hmp \
      --hmp-bf16=ops_bf16_mnist.txt \
      --hmp-fp32=ops_fp32_mnist.txt \
      --use_lazy_mode

command prompt

As long as the batch size keeps changing, there are more recompilations. Training may also be noticeably slower.

What’s Next?

Feel free to use the same technique to check the frequency of graph recompilations in your code. If needed, look for possible resolutions using the Handling Dynamic Shapes guide.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Detect Frequent Graph Recompilations

Optimize with Intel® Gaudi® AI Accelerators

Start Docker*

Prepare the Model

Training on a Single Intel® Gaudi® Processor

Detect Recompilations

What’s Next?

Product and Performance Information