Software Installation or Running the Docker* image for Intel Gaudi AI Accelerators
In most cases where you are getting direct access to an Intel Gaudi Software node or virtual machine from a cloud service provider, the software should be pre-installed and users should simply load the latest Docker image to run their workloads. Follow the steps in the section below to verify the software you have installed. If you are working on a new Intel Gaudi node or want to install all the software manually, please refer to the detailed installation guide. We provide an automated installer script (habanalabs-installer.sh) to install or update the Intel Gaudi Software. You can setup a bare metal environment, where the software is loaded directly on the machine or in a virtual environment. You can also use the script to setup the platform to use the Docker image, which requires the base system software installation of the Intel Gaudi Software driver, firmware and firmware tools.
To run the Docker image follow these steps:
- Run the
hl-smi
command to verify the current version of software that is running.
HL-SMI Version: hl-1.18.0-XXXXXXX
Driver Version: 1.18.0-XXXXXX
Check support documentation and confirm base installation
- Refer to the support matrix in the documentation to see what Docker versions are compatible with the driver and software.
- Confirm that the base system software is installed (see below) and the Docker container runtime is active by running Docker images command. If the Docker runtime is not functional, please refer to the installation guide.
Select the appropriate OS and PyTorch versions, and then
Pull
andRun
the appropriate Docker image. The example below shows how toPull
andRun
the Docker using version 1.18.0 with the Ubuntu* 22.04 OS.
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
- Check Installation: Follow these steps to confirm the current version installed on the platform. For the full details of how to manage the Intel Gaudi Software on the platform please refer to the detailed installation guide.
The Intel Gaudi Software stack consists of two components: * The base system software: this contains the Intel Gaudi Firmware, Drivers, core software. * The platform software: these are the Intel Gaudi versions of PyTorch* and other packages.
The first step is to understand what version of firmware is loaded on the system, run the hl-smi command. Use the HL-SMI Version at the top of the output. For example, if the installed version is 1.18.0, the output should be as follows:
HL-SMI Version: hl-1.18.0-XXXXXXX
Driver Version: 1.18.0-XXXXXX
To verify the base system software, run the following command:
apt list --installed | grep habana
For a correct base system software installation you will see the following list:
habanalabs-container-runtime/focal,now 1.18.0-524 amd64 [installed]
habanalabs-dkms/focal,focal,now 1.18.0-524 all [installed]
habanalabs-firmware/focal,now 1.18.0-524 amd64 [installed]
habanalabs-firmware-odm/focal,now 1.18.0-524 amd64 [installed]
habanalabs-firmware-tools/focal,now 1.18.0-524 amd64 [installed]
habanalabs-graph/focal,now 1.18.0-524 amd64 [installed]
habanalabs-rdma-core/now 1.18.0-524 all [installed]+
habanalabs-qual/focal,now 1.18.0-524 amd64 [installed]
habanalabs-thunk/focal,focal,now 1.18.0-524 all [installed]
To verify the platform software, run the following command:
pip list | grep habana
For a correct platform software installation you will see the following list:
habana_gpu_migration 1.18.0.524
habana-media-loader 1.18.0.524
habana-pyhlml 1.18.0.524
habana-torch-dataloader 1.18.0.524
habana-torch-plugin 1.18.0.524
habana_transformer_engine 1.18.0.524
lightning-habana 1.6.0
You can then download the correct habanalabs-installer.sh install script (in this example change the path to 1.18.0), and follow the installation guide to install the base system software and/or the platform software.
wget -nv https://vault.habana.ai/artifactory/gaudi-installer/1.18.0/habanalabs-installer.sh
chmod +x habanalabs-installer.sh