A preview is not available for this record, please engage by choosing from the available options ‘download’ or ‘view’ to engage with the material
Description
The white paper provides an in-depth performance evaluation of the Intel® Gaudi® 2 AI accelerator, focusing on its capabilities to efficiently process advanced large language models (LLMs) such as Llama-3.1-8B and Falcon3-10B. The evaluation benchmarks the accelerator’s performance across critical metrics like latency, throughput, and Time to First Token (TTFT) under various conditions, including normal chat interactions and Retrieval-Augmented Generation (RAG) scenarios. Key findings reveal that the Intel® Gaudi® 2 AI accelerator significantly reduces latency and increases throughput, even under high load with multiple concurrent users. The insights aim to guide organizations in optimizing their AI infrastructure to leverage the full potential of their LLM investments, enhancing their competitiveness and innovation capacity in the AI-driven market.