Introduction
This package contains the Intel® Distribution of OpenVINO™ Toolkit software version 2024.5 for Linux*, Windows* and macOS*.
Available Downloads
- Debian Linux*
- Size: 29.8 MB
- SHA256: 6FBFF98E228D400609B50B0E8EE805B3FFBF0A2675DAC85D51F1ADC35F0F54F3
- CentOS 7 (1908)*
- Size: 52.2 MB
- SHA256: 0986EED55951D7AE8ECFA300F5BFEFD4374087C3AA1E3523F45906FB3E69227F
- Red Hat Enterprise Linux 8*
- Size: 57 MB
- SHA256: F0638B10DD063BA1EC00A9A61F1D80962567C9BACEAEB0142300BCC34F6F62B2
- Ubuntu 20.04 LTS*
- Size: 32.9 MB
- SHA256: C96EE2B4B50ACE80DC71D171D3CFD188EE9686D2413778F73FC86C6340C5D0C9
- Ubuntu 20.04 LTS*
- Size: 48.8 MB
- SHA256: 2EAE0638B595F844FB72903A1B42A2124C5D9645858FFA9B9B15C60E2F97C633
- Ubuntu 22.04 LTS*
- Size: 51.3 MB
- SHA256: F597E56E405A03F67869985FB0B85D5A4E14C219AA8458DD2AD3017C022EA373
- Size: 52.5 MB
- SHA256: B602AE818064E4BB909B07BAC508A9B7C6A5DA1035D5D3899D9D99C5EABCBDE2
- macOS*
- Size: 138.7 MB
- SHA256: AA1920E52D394387EA3ED8F3F817B598A2A13B8FEB9D14A5ED3FE77545896E0B
- macOS*
- Size: 33.4 MB
- SHA256: F4CA2BB87032135359B2D96A6315F6E6DBCDE2ED5633BF2DC02BB20E190FC868
- Windows 11*, Windows 10*
- Size: 106.1 MB
- SHA256: E30C60518B6A3CA5D7F1B4FC56673C5B55CAF1962A34F1B50FB6B8A6436AB0C7
Detailed Description
Summary of major features and improvements
- More Gen AI coverage and framework integrations to minimize code changes
- New models supported: Llama* 3.2 (1B & 3B), Gemma* 2 (2B & 9B), and YOLO11*.
- LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
- Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision*, Wav2Lip*, Whisper*, and LLaVA*
- Preview: support for Flax*, a high-performance Python* neural network library based on JAX*. Its modular design allows for easy customization and accelerated inference on GPUs.
- Broader Large Language Model (LLM) support and more model compression techniques.
- Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs..
- Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
- A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.
- More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
- Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions..
- Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
- Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
- The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for Whisper* Pipeline (for whisper-base, whisper-medium, and whisper-small) and LLM* Pipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.
Support Change and Deprecation Notices
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO™ version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Discontinued in 2024.0:
- Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX* Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache* MXNet, Caffe*, and Kaldi* model formats. Conversion to ONNX may be used as a solution.
- Runtime components:
- Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
- As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.
Installation instructions
You can choose how to install OpenVINO™ Runtime according to your operating system:
- Install OpenVINO Runtime on Linux*
- Install OpenVINO Runtime on Windows*
- Install OpenVINO Runtime on macOS*
What's included in the download package
- OpenVINO™ Runtime/Inference Engine for C/C++ and Python APIs
Helpful Links
NOTE: Links open in a new window.
This download is valid for the product(s) listed below.
Disclaimers1
Product and Performance Information
Intel is in the process of removing non-inclusive language from our current documentation, user interfaces, and code. Please note that retroactive changes are not always possible, and some non-inclusive language may remain in older documentation, user interfaces, and code.