We're excited to release Intel® Gaudi® software version 1.19.0, which brings numerous enhancements and updates for an improved GenAI development experience on the Intel® Gaudi® accelerator platform.
We have enhanced support for vLLM by improving performance and adding support for Multi-step scheduling, Asynchronous Output Processing, Long context with LoRA (up to 128k), Automatic Prefix Caching, Repetition penalty, Structured Output, FusedMoE, and more. For more information, see the Intel Gaudi vLLM.
Release 1.19.0 provides preview support for stock PyTorch. Running inference and training using eager mode and torch.compiule graph mode. See Public PyTorch Support. In addition, Gaudi PyTorch Bridge source code and the associated Gaudi PyTorch Fork code are publicly available at Gaudi PyTorch bridge and Gaudi PyTorch Fork.
Added RDMA PerfTest tool. The tool executes performance testing for low-level, high-performance connectivity through ping-pong and bandwidth communication tests.
See Intel Gaudi RDMA PerfTest Tool.
With release 1.19, we have added many new firmware features, including the Hypervisor tools package, and support for disabling/enabling external NICs.
In addition, the update provides
- Support for TencentOS for Gaudi 3.
- Improved performance of various LLM models for Intel Gaudi2 and Gaudi3 accelerators,
including LLaMA 3.1 8B/70B for inference. For more information,
check out the Intel Gaudi model performance page. - Support for Megatron-LM pretraining of Llama 3.1 8B/70B and Mixtral 8x7B.
See Intel Gaudi Megatron-LM. - Upgraded versions of several libraries, including PyTorch 2.5.1, vLLM 0.6.4, Text Generation Inference (TGI) 2.0.6, Megatron-LM 0.8.0. You can find the full Gaudi Support Matrix here.
Lastly, this is a reminder that in release 1.20, support for Megatron-DeepSpeed will be deprecated, and will be replaced by Megatron-LM.
You can find more information on the Gaudi Software 1.19.0 release on the
Intel Gaudi release notes page.