OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere.
What's New in Version 2024.2
The OpenVINO™ toolkit 2024.2 release enhances generative AI (GenAI) accessibility with improved large language model (LLM) performance and expanded model coverage. It also boosts portability and performance for deployment anywhere: at the edge, in the cloud, or locally.
Latest Features
Easier Model Access and Conversion
Product |
Details |
---|---|
New Model Support |
Support for Phi-3-mini, a family of AI models that takes advantage of the power of small language models for faster, more accurate, and cost-effective text processing. Llama 3 optimizations for CPUs, built-in GPUs, and discrete GPUs for improved performance and efficient memory usage. |
Python* |
A Python custom operation is now enabled in OpenVINO toolkit, making it easier for Python developers to code their custom operations instead of using C++ custom operations (also supported). This custom operation empowers you to implement your own specialized operations into any model. |
Generative AI and LLM Enhancements
Expanded model support and accelerated inference.
Feature |
Details |
---|---|
New Jupyter* Notebooks |
An expansion to Jupyter Notebooks ensures better coverage for new models. The following noteworthy notebooks were added:
|
Performance Improvements for LLMs |
A GPTQ method for 4-bit weight compression was added to the Neural Networks Compression Framework (NNCF) for more efficient inference and improved performance of compressed LLMs. There are significant LLM performance improvements and reduced latency for built-in and discrete GPUs. |
More Portability and Performance
Develop once, deploy anywhere. OpenVINO toolkit enables developers to run AI at the edge, in the cloud, or locally.
Product |
Details |
---|---|
Model Serving Enhancements |
Preview: The OpenVINO model server now supports an OpenAI*-compatible API, continuous batching, and PagedAttention, which enables significantly higher throughput for parallel inferencing, especially on Intel® Xeon® processors that serve LLMs to many concurrent users. The OpenVINO toolkit back end for the NVIDIA Triton* Inference Server now supports dynamic input shapes. TorchServe was integrated through torch.compile on the OpenVINO toolkit back end for easier model deployment, provisioning to multiple instances, model versioning, and maintenance. |
Intel Hardware Support |
A significant improvement in second-token latency and memory footprint for FP16-weight LLMs on Intel® Advanced Vector Extensions 2 (for 13th gen Intel® Core™ processors) and Intel® Advanced Vector Extensions 512 (for 3rd gen Intel® Xeon® Scalable processors) that are based on CPU platforms, particularly for small batch sizes. Preview: Support for the Intel® Xeon® 6 processor. |
Generate API |
Preview: Addition of Generate API, a simplified API for text generation using LLMs with only a few lines of code. The API is available through the newly launched OpenVINO Toolkit GenAI Package. |
Sign Up for Exclusive News, Tips & Releases
Be among the first to learn about everything new with the Intel® Distribution of OpenVINO™ toolkit. By signing up, you get early access product updates and releases, exclusive invitations to webinars and events, training and tutorial resources, contest announcements, and other breaking news.
Resources
Community and Support
Explore ways to get involved and stay up-to-date with the latest announcements.
Get Started
Optimize, fine-tune, and run comprehensive AI inference using the included model optimizer and runtime and development tools.
The productive smart path to freedom from the economic and technical burdens of proprietary alternatives for accelerated computing.