Letter from the Editor
The Changing Landscape of AI Hardware and Software
This is our premiere event for technologists to learn more about Intel software and hardware. I’d also like to thank James Reinders, founding editor of The Parallel Universe, for covering the previous issue while I was away on sabbatical.
MLCommons recently published New MLPerf Inference Benchmark Results Highlight the Rapid Growth of Generative AI Models. Intel submitted Gaudi 2 results for the first time, demonstrating how this accelerator provides a lower-cost alternative for AI applications:
“The industry has a clear need: address the gaps in today’s generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results...illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers.”
I’m looking forward to seeing MLPerf results for the Intel Gaudi 3 AI accelerator, which should be available in fall 2024.
Lately, I’ve been experimenting with low-end AI PCs to see what I can get away with from an AI end-user perspective. Meanwhile, my colleague, Tony Mongkolsmai, has been experimenting with higher-end AI PCs to see what they allow AI developers to do locally. In this issue’s feature article, AI PC Brings Larger LLM Development to Your Desk, he shows how it’s possible to do serious development more conveniently and securely on a local system instead of automatically defaulting to remote data center or cloud systems to develop large models.
This is followed by a series of short articles covering a variety of AI topics. Low-Bit Quantized Open LLM Leaderboard describes a new tool to find high-quality models for a given client. Enterprise AI Art Exhibition at Intel Vision 2024 recaps the “exhibition” and gives a tutorial on generating similar works of art. Even though “art” is defined very loosely in this case, the resulting images and practical applications are compelling. Accelerating GGUF Models with Transformers shows how to accelerate low-bit LLM inference while taking advantage of the GPT-generated unified format (a new binary format that optimizes model storage and processing efficiency). Run LLMs on Intel® GPUs Using llama.cpp shows how to use the new SYCL* backend to run llama.cpp (a lightweight, high-performance LLM framework that is gaining popularity) on Intel GPUs.
Next, we have an article about Accelerating Simulations and Backpropagation with Python* and C++ Analytics from Dmitri Goloubentsev (CTO, MatLogica). Finally, Accelerating Memory-Bandwidth-Bound Kernels Using the Intel® Data Streaming Accelerator describes a hybrid CPU plus accelerator approach to software pipelining.
As always, don’t forget to check out Tech.Decoded for more information on Intel solutions for AI and data science, code modernization, visual computing, data center and cloud computing, systems and IoT development, and heterogeneous parallel programming with oneAPI.
Henry A. Gabb
July 2024