Runtime Instructions
The Following are the run instructions needed to setup the node, the model infrastructure and the full runtimes for the model.
Accessing the Intel Gaudi Node
To access an Intel Gaudi node in the Intel Tiber AI Cloud, go to Intel Tiber AI Cloud console and access the hardware instances to select the Intel Gaudi 2 platform for deep learning and follow the steps to start and connect to the node.
The website will provide an ssh
command to login to the node, and it’s advisable to add a local port forwarding to the command to be able to access a local Jupyter Notebook. For example, add the command: ssh -L 8888:localhost:8888 ..
to be able to access the notebook.
Details about setting up Jupyter Notebooks on an Intel Gaudi Platform are available here.
Docker Setup
With access to the node, use the latest Intel Gaudi Docker image by first calling the Docker run command which will automatically download and run the Docker:
docker run -itd --name Gaudi_Docker --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
Start the Docker and enter the Docker environment by issuing the following command:
docker exec -it Gaudi_Docker bash
More information on Gaudi Docker setup and validation can be found here.
Model Setup
Once the Docker environment is running, install the remaining libraries and model repositories.
Start in the root directory and install the DeepSpeed Library. DeepSpeed improves memory consumption on Intel Gaudi while running large language models.
cd ~
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.19.0
Now install the Hugging Face Optimum Intel Gaudi library and GitHub Examples, selecting the latest validated release of optimum-habana:
pip install optimum-habana==1.15.0
git clone -b v1.15.0 https://github.com/huggingface/optimum-habana
Finally, transition to the language-modeling example and install the final set of requirements to run the model:
cd ~/optimum-habana/examples/language-modeling
pip install -r requirements.txt
How to Access and Use the Llama 3 Model
Use of the pre-trained model is subject to compliance with third-party licenses, including the “META LLAMA 3 COMMUNITY LICENSE AGREEMENT”. For guidance on the intended use of the LLAMA 3 model, what will be considered misuse and out-of-scope uses, who are the intended users and additional terms please review and read the instructions. Users bear sole liability and responsibility to follow and comply with any third-party licenses, and Habana Labs disclaims and will bear no liability with respect to users’ use or compliance with third-party licenses. To be able to run gated models like this Llama-3-70b, perform the following step:
- Have a Hugging Face account and agree to the terms of use of the model in its model card on the Hugging Face Hub
- Create a read token and request access to the Llama 3 model from meta-llama
- Login to your account using the Hugging Face CLI:
huggingface-cli login --token <your_hugging_face_token_here>
To run with the associated Jupyter Notebook for fine-tuning, please see the running and fine-tuning addendum section for set up of the Jupyter Notebook. You can run these steps directly in the Jupyter interface.