Intel® Distribution of OpenVINO™ Toolkit Release Notes

ID 780177
Updated 12/20/2024
Version
Public

author-image

By

What’s New 

  • OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance. 
  • Introduced support for Intel® Arc™ B-Series Graphics (formerly known as Battlemage) 
  • Memory optimizations implemented to improve the inference time and LLM performance on NPUs. 
  • Improved LLM performance with GenAI API optimizations. 

OpenVINO™ Runtime 

CPU Device Plugin 

  • KV cache now uses asymmetric 8-bit unsigned integer (U8) as the default precision, reducing memory stress for LLMs and increasing their performance. This option can be controlled by model meta data. 
  • Quality and accuracy has been improved for selected models with several bug fixes. 

GPU Device Plugin 

  • Device memory copy optimizations have been introduced for inference with Intel® Arc™ B-Series Graphics (formerly known as Battlemage). Since it does not utilize L2 cache for copying memory between the device and host, a dedicated copy operation is used, if inputs or results are not expected in the device memory. 
  • ChatGLM4 inference on GPU has been optimized. 

NPU Device Plugin 

  • LLM performance and inference time has been improved with memory optimizations. 

OpenVINO.GenAI 

  • The encrypted_model_causal_lm sample is now available, showing how to decrypt a model. 

Other Changes and Known Issues 

Deprecated

  • Starting with 2025.0 MacOS x86 will no longer be recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.

Jupyter Notebooks