An Easy Guide to deploying AI Applications on AI PCs

Stay in the Know on All Things CODE

author-image

By

Boost Your AI Skills Today!

Looking to advance your expertise in AI? Don't miss out on our resource collection at the end of the article. 

What is an AI PC?

AI PCs are the new generation of personal computers which includes a central processing unit (CPU), a graphic processing unit (GPU), and a neural processing unit (NPU) to provide power efficient AI acceleration and handle AI tasks locally. Each processing unit has specific AI acceleration capabilities:

  • CPU: For AI tasks that requires smaller workloads, sequential data and low-latency inference
  • GPU: For larger workloads such as training deep neural networks that requires parallel throughput
  • NPU: Dedicated hardware accelerator designed to handle AI workloads on your PC at low power for greater efficiency instead of processing data in the cloud

Benefits of deployment on AI PCs

  • Power on Multiple Fronts: AI PCs are special because they pack multiple accelerators and a general compute engine on the same chip. This means you can tap into different architectures like the NPU for energy-efficient inference, the iGPU for more demanding tasks, and the CPU for traditional ML and complex operations. It's like having a unique tool for every situation!
  • Your Data, Your Rules: With AI PCs, you can run inference right on your device, so there's no need to send your data to third-party cloud services. This keeps your data secure and under your control, all within the comfort of your own premises.
  • No Internet? No Problem: AI PCs eliminate the need for a high-speed internet connection to perform meaningful AI tasks. By cutting out the middleman, your machine can handle AI workloads directly, no matter where you are.
  • Local Compute, Less Cloud Hassle: Since AI PCs run compute tasks locally, developers can deploy AI solutions that leverage the user's device. This means less worrying about cloud infrastructure like load balancing and autoscaling, and more focus on building awesome applications.

Tips for AI PC Beginners

Optimizing an AI PC involves refining both hardware and software to deliver enhanced performance for AI tasks and efficient user experience. Here are my key ways to optimize an AI PC.

  1. Optimize Hardware Configurations:
  • Use high performance CPU, GPU and NPU to improve AI processing speed
  • Make sure to increase RAM (have sufficient RAM) to handle large datasets and complex models
  • Utilize fast storage solutions to reduce data loading times
  1. Software Optimizations:
  • Use optimized AI frameworks like PyTorch and TensorFlow to leverage your specific hardware
  • Leverage techniques like mixed precision training, model pruning and quantization to increase the computational speed and reduce the memory usage without losing accuracy
  • Make use of AI model compilers like ONNX Runtime that are optimized for specific hardware
  1. System Maintenance:
  • Avoid thermal throttling by keeping the PC clean and dust free
  • Track resource usage to identify the inefficiencies with the help of performance monitoring tools

Deploy on Intel’s AI PC

Intel’s first AI PC platform features the Intel® Core™ Ultra processor. These processors can run over 500 optimized AI Models on a single machine. A few of these AI models are Phi-2, Mistral, Llama, Bert, Whisper, Stable Diffusion 1.5 and can support different AI applications such as large language, diffusion, super resolution, object detection and computer vision.

Check if your Intel processor has an NPU. To leverage the NPU on AI PC, Intel offers Intel® NPU Acceleration Library – a python library developed to increase the efficiency of AI applications.

Example: How to run an LLM model on the NPU?

  1. Install Intel NPU Acceleration and transformers libraries.
    pip install intel-npu-acceleration-library
    pip install transformers
  2. Change your existing LLM inference script with a couple of lines of code.
    # First import the library
    import intel_npu_acceleration_library
    # Call the compile function to offload kernels to the NPU
    model = intel_npu_acceleration_library.compile(model)
    

Full code snippet:

from torch.profiler import profile, ProfilerActivity
from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
from threading import Thread
#####code change#######
import intel_npu_acceleration_library
import torch
import time
import sys

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


print("Compile model for the NPU")
#####code change#######
model = intel_npu_acceleration_library.compile(model)

query = "What is the meaning of life?"
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

Source for the code: GitHub

Dive into our resource library

Check out our latest documentation, video and technical articles on how to deploy large language models with Intel AI PCs and NPUs. This section is designed for developers of every skill level.

What you’ll learn:

  • Implement Large Multimodal Models on Neural Processing Unit
  • Advantages of optimizing large language models with the OpenVINO Toolkit
  • Leverage OpenVINO™ toolkit to develop a Gen AI Assistant Chatbot on AI PC

How to get started

Step 1: Watch the video on how to run Large Multimodal Models on NPU.


This short introductory video explains why to run Llava-gemma-2B model by using the NPU acceleration library. Additionally, check out this article which provides a detailed guide on Intel NPU Acceleration Library and how to enable the NPU on your laptop to run LMMs.

Step 2: Optimize Large Language Models with the OpenVINO™ Toolkit

This whitepaper explains the methods to optimize large language models through compression techniques and the advantages of using OpenVINO toolkit for LLM deployment.

Step 3: Building a Gen AI Assistant Chatbot on AI PC

This article shows a code sample on how to build a virtual travel assistant chatbot on an AI PC using OpenVINO™ toolkit from Intel.

Follow the above steps to build and deploy GenAI applications on AI PCs!

Additional Resources