Fine-Tune GPT-2* with Hugging Face* and Intel® Gaudi® Accelerators

Optimize with Intel® Gaudi® AI Accelerators

  • Create new deep learning models or migrate existing code in minutes.

  • Deliver generative AI performance with simplified development and increased productivity.

author-image

By

This tutorial demonstrates fine-tuning a GPT-2* model on Intel® Gaudi® AI processors using the Hugging Face* Optimum for Intel library with Microsoft DeepSpeed*.

Explore the GitHub* Repo

Fine-Tune Defined

Training models from scratch can be expensive, especially with today’s large-scale models. Depending on the model size and scale, the estimated cost for the hardware needed to train such models can range from thousands of dollars all the way to millions of dollars. Fine-tuning is a process of taking a neural network model that has already been trained (usually called a pretrained model) and updating it to create a model that performs a specific task. Assuming that the original task is similar to the new task, using a pretrained model allows us to take full advantage of the feature extraction that occurs in the top layers of the network without having to develop and train a model from scratch.

This blog focuses on transformers. Pretrained transformers can be quickly fine-tuned for numerous downstream tasks and perform well. Let’s consider a pretrained transformer model that already understands language. Fine-tuning then focuses on training the model to perform question-answering, language generation, named-entity recognition, sentiment analysis, and other such tasks.

Given the cost and complexity of training large models, making use of pretrained models is an appealing approach. And in fact, there are many publicly available pretrained models. This blog focuses on the most popular open source transformer library, Hugging Face. The Hugging Face Hub contains a wide variety of pretrained transformer models, and the Hugging Face Transformers library makes it easy to use these pretrained models for fine-tuning.

Use Pretrained GPU Models to Fine-Tune on Intel Gaudi AI Processors and Vice Versa

While the pretraining process was done on a specific architecture, the saved pretrained model can be used on different architectures. For example, you can pretrain a model using Intel Gaudi AI processor, save it, and later fine-tune the model using a CPU. Or you can load a publicly available pretrained model, originally pretrained on a GPU, and continue training or fine-tuning it on an Intel Gaudi AI processor.

Start with Intel Gaudi Software and Hugging Face

Set up an Amazon EC2* DL1 instance with the latest Intel Gaudi software. For full instructions, see AWS DL1 Quick Start Guide.

Start the Docker* Software

Make sure to use the latest PyTorch* container from the PyTorch Docker Images for the Intel Gaudi Accelerator.

docker pull vault.habana.ai/gaudi-docker/1.6.1/ubuntu20.04/habanalabs/pytorch-installer-1.12.0:latest

 fine tunning screenshot

docker run -it \
--runtime=habana \
-e HABANA_VISIBLE_DEVICES=all \
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
--cap-add=sys_nice \
--net=host \
--ipc=host vault.habana.ai/gaudi-docker/1.6.1/ubuntu20.04/habanalabs/pytorch-installer-1.12.0:latest

 fine tunning screenstho

Create the model folder 

cd ~
mkdir gpt2
cd gpt2

fine tunning screenstho

Clone Optimum for Intel from Hugging Face and Set Up the Requirements

git clone https://github.com/huggingface/optimum-habana.git
cd optimum-habana
python3 setup.py install
cd examples/language-modeling
pip install -r requirements.txt

fine tunning screenshot

Install Microsoft DeepSpeed*

pip install git+https://github.com/HabanaAI/DeepSpeed.git

 fine tunning screenshot

Fine tune the model

Return to the GPT-2 folder.

cd ~/gpt2

 

To create a new file called main.py, enter the following command:

from optimum.habana.distributed import DistributedRunner

training_args = {
    "output_dir": "/tmp/clm_gpt2_xl",
    "dataset_name": "wikitext",
    "dataset_config_name": "wikitext-2-raw-v1",
    "num_train_epochs": 1,
    "per_device_train_batch_size": 4,
    "per_device_eval_batch_size": 4,
    "gradient_checkpointing": True,
    "do_train": True,
    "do_eval": True,
    "overwrite_output_dir": True,
}

model_name = "gpt2-xl"

training_args["model_name_or_path"] = model_name

training_args["use_habana"] = True                  # Whether to use HPUs or not
training_args["use_lazy_mode"] = True               # Whether to use lazy or eager mode
training_args["gaudi_config_name"] = "Habana/gpt2"  # Gaudi configuration to use
                    
training_args["deepspeed"] = "optimum-habana/tests/configs/deepspeed_zero_2.json"

# Build the command to execute
training_args_command_line = " ".join(f"--{key} {value}" for key, value in training_args.items())
command = f"optimum-habana/examples/language-modeling/run_clm.py {training_args_command_line}"

# Instantiate a distributed runner
distributed_runner = DistributedRunner(
    command_list=[command],  # The command(s) to execute
    world_size=8,            # The number of HPUs
    use_deepspeed=True,      # Enable DeepSpeed
)

# Launch training
ret_code = distributed_runner.run()

The code fine-tunes the GPT-2 pretrained model using the WikiText dataset. It runs in distributed mode if multiple Intel Gaudi AI processors are available. Note that for fine-tuning, the argument model_name_or_path is used and it loads the model checkpoint for weights initialization.

Run the code using the following command:

python3.8 main.py

This command produces the following results:

fine tunning screenshot

Use the New Fine-Tuned Model for Text Prediction

To create a new file called test.py, enter the following command:

# The sequence to complete
prompt_text = "Contrary to the common belief, Chocolate is actually good for you because "

import torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer

import habana_frameworks.torch.core as htcore

path_to_model = "/tmp/clm_gpt2_xl"  # the folder where everything related to our run was saved

device = torch.device("hpu")

# Load the tokenizer and the model
tokenizer = GPT2Tokenizer.from_pretrained(path_to_model)
model = GPT2LMHeadModel.from_pretrained(path_to_model)
model.to(device)

# Encode the prompt
encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="pt")
encoded_prompt = encoded_prompt.to(device)

# Generate the following of the prompt
output_sequences = model.generate(
    input_ids=encoded_prompt,
    max_length=16 + len(encoded_prompt[0]),
    do_sample=True,
    num_return_sequences=1,
)

# Remove the batch dimension when returning multiple sequences
if len(output_sequences.shape) > 2:
    output_sequences.squeeze_()

generated_sequences = []

for generated_sequence_idx, generated_sequence in enumerate(output_sequences):
    print(f"=== GENERATED SEQUENCE {generated_sequence_idx + 1} ===")
    generated_sequence = generated_sequence.tolist()

    # Decode text
    text = tokenizer.decode(generated_sequence, clean_up_tokenization_spaces=True)

    # Remove all text after the stop token
    text = text[: text.find(".")]

    # Add the prompt at the beginning of the sequence. Remove the excess text that was used for pre-processing
    total_sequence = (
        prompt_text + text[len(tokenizer.decode(encoded_prompt[0], clean_up_tokenization_spaces=True)) :]
    )

    generated_sequences.append(total_sequence)
    print(total_sequence)

Run the code using the following command:

python3.8 test.py

This command produces the following results:

fine tunning screenshot

What’s next?

You can try different prompts and different configurations for running the model. You can find more information on Hugging Face Habana-optimum GitHub page, and Habana Developer site.