Stable Code Finetuning with Axolotl

Axolotl is a popular library for finetuning models. In this tutorial, we will use Axolotl to finetune our Stable Code 3B model on FORTRAN code to show you how to create a custom code completion model on a new programming language.

This tutorial is based on the wonderful notebooks from Maxime Labonne’s llm course

import torch
# Check so there is a gpu available, a T4(free tier) is enough to run this notebook
assert (torch.cuda.is_available()==True)

Let’s get the environment set up first. We will go ahead and install the necessary packages.

!pip install torch=="2.1.2"
!pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl
!pip install flash-attn=="2.5.0"
!pip install deepspeed=="0.13.1"

Axolotl relies on a config file written in YAML to specify the model and the dataset and how to train the model. We will create a config file for our finetuning task to finetune the Stable Code 3B model on FORTRAN code using QLoRA. The dataset we are picking is from codeparrot/github-code-clean dataset which is a collection of code scraped from GitHub. Specifically, we will be utilizing the FORTRAN MIT licensed code.

import yaml

# Your YAML string
yaml_string = """
base_model: stabilityai/stable-code-3b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: codeparrot/github-code-clean
    name: FORTRAN-mit
    type: completion
    field: code
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1096
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

mlflow_experiment_name: colab-example

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 4
max_steps: 20
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch:
saves_per_epoch:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

"""

# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(yaml_string)

# Specify your file path
file_path = 'test_axolotl.yaml'

# Write the YAML file
with open(file_path, 'w') as file:
    yaml.dump(yaml_dict, file)

Next, we can launch the run, which should take about XX if you are running this in a Colab environment with a T4 GPU:

# Buy using the ! the comand will be executed as a bash command
!accelerate launch -m axolotl.cli.train /content/test_axolotl.yaml

Once the model has been finetuned, we can run it in a gradio environment:

# Buy using the ! the comand will be executed as a bash command
!accelerate launch -m axolotl.cli.inference /content/test_axolotl.yaml \
    --qlora_model_dir="./qlora-out" --gradio