Optimize and Debug Your Model
Profiling and optimizing are important tasks that work together to transform your model from simply functional to highly efficient and performant. To get started, follow the Performance Optimization Guide Checklist.
Optimizing a model falls into three main categories:
- Initial model porting ensures the model is functional on Intel® Gaudi® processors by running GPU migration. Follow the Get Started instructions for training or inference.
- Model optimizations include general enhancements for performance and apply to most models. This includes managing dynamic shapes and using HPU_graphs for training or inference.
- Profiling allows you to identify bottlenecks on the host CPU or on the Intel Gaudi processor. Follow the steps outlined in Table 1 by first using the TensorBoard* toolkit with the Intel Gaudi platform to identify specific items.
Table 1. Tasks, activities, and results to optimize your model
What to Do |
What Happens |
Where to Learn More |
---|---|---|
1. Perform PyTorch* profiling using TensorBoard |
Obtains recommendations for performance using TensorBoard specific to Intel Gaudi accelerators |
|
2. Review the PT_HPU_METRICS_FILE | Looks for excessive recompilations during runtime |
|
3. Profile using a trace viewer for Intel Gaudi accelerators | Uses the accelerator-specific Perfetto trace viewer for in-depth analysis of CPU and accelerator activity |
|
4. Perform model logging | Sets ENABLE_CONSOLE to set logging for debug and analysis |
Stay Informed
Register for the latest Intel Gaudi AI accelerator developer news, events, training, and updates.