Profiling and optimizing are important tasks that work together to transform your model from simply functional to highly efficient and performant. To get started, follow the Performance Optimization Guide Checklist.
Optimizing a model falls into three main categories:
- Initial model porting ensures the model is functional on Intel® Gaudi® processors by running GPU migration. Follow the Get Started instructions for training or inference.
- Model optimizations include general enhancements for performance and apply to most models. This includes managing dynamic shapes and using HPU_graphs for training or inference.
- Profiling allows you to identify bottlenecks on the host CPU or on the Intel Gaudi processor. Follow the steps outlined in Table 1 by first using the TensorBoard* toolkit with the Intel Gaudi platform to identify specific items.
Table 1. Tasks, activities, and results to optimize your model
What to Do | What Happens |
Where to Learn More |
---|---|---|
1. Perform PyTorch* profiling using TensorBoard | Obtains recommendations for performance using TensorBoard, specific to Intel Gaudi accelerators | Profile with PyTorch |
2. Review the PT_HPU_METRICS_FILE | Looks for excessive recompilations during runtime | Set HPU Metrics Review |
3. Profile using a trace viewer for Intel Gaudi accelerators | Uses the accelerator-specific Perfetto trace viewer for in-depth analysis of CPU and accelerator activity | Get Started Profiling with Intel Gaudi Software |
4. Perform model logging | Sets ENABLE_CONSOLE to set logging for debug and analysis | Runtime Environment Variables |
Maximize Model Performance
The name and brands, Habana® Labs and Habana® Gaudi®, are replaced in these training documents to more accurately refer to the revised product name, Intel® Gaudi® AI accelerators. Throughout this training, any mention of Habana should be understood as referring to the product, Intel® Gaudi® AI accelerators.