What’s New
- OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance.
- Introduced support for Intel® Arc™ B-Series Graphics (formerly known as Battlemage)
- Memory optimizations implemented to improve the inference time and LLM performance on NPUs.
- Improved LLM performance with GenAI API optimizations.
OpenVINO™ Runtime
CPU Device Plugin
- KV cache now uses asymmetric 8-bit unsigned integer (U8) as the default precision, reducing memory stress for LLMs and increasing their performance. This option can be controlled by model meta data.
- Quality and accuracy has been improved for selected models with several bug fixes.
GPU Device Plugin
- Device memory copy optimizations have been introduced for inference with Intel® Arc™ B-Series Graphics (formerly known as Battlemage). Since it does not utilize L2 cache for copying memory between the device and host, a dedicated copy operation is used, if inputs or results are not expected in the device memory.
- ChatGLM4 inference on GPU has been optimized.
NPU Device Plugin
- LLM performance and inference time has been improved with memory optimizations.
OpenVINO.GenAI
- The encrypted_model_causal_lm sample is now available, showing how to decrypt a model.
Other Changes and Known Issues
Deprecated
- Starting with 2025.0 MacOS x86 will no longer be recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
Jupyter Notebooks
- Visual-language assistant with GLM-Edge-V and OpenVINO
- Local AI and OpenVINO
- Multimodal understanding and generation with Janus and OpenVINO