Stable Code Finetuning with Axolotl
Axolotl is a popular library for finetuning models. In this tutorial, we will use Axolotl to finetune our Stable Code 3B model on FORTRAN code to show you how to create a custom code completion model on a new programming language.
This tutorial is based on the wonderful notebooks from Maxime Labonne’s llm course
import torch
# Check so there is a gpu available, a T4(free tier) is enough to run this notebook
assert (torch.cuda.is_available()==True)
Let’s get the environment set up first. We will go ahead and install the necessary packages.
!pip install torch=="2.1.2"
!pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl
!pip install flash-attn=="2.5.0"
!pip install deepspeed=="0.13.1"
Axolotl relies on a config file written in YAML to specify the model and
the dataset and how to train the model. We will create a config file for
our finetuning task to finetune the Stable Code 3B model on FORTRAN code
using QLoRA. The dataset we are picking is from
codeparrot/github-code-clean
dataset which is a collection of code
scraped from GitHub. Specifically, we will be utilizing the FORTRAN MIT
licensed code.
import yaml
# Your YAML string
yaml_string = """
base_model: stabilityai/stable-code-3b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: codeparrot/github-code-clean
name: FORTRAN-mit
type: completion
field: code
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 1096
sample_packing: true
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
mlflow_experiment_name: colab-example
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 4
max_steps: 20
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
warmup_steps: 10
evals_per_epoch:
saves_per_epoch:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
"""
# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(yaml_string)
# Specify your file path
file_path = 'test_axolotl.yaml'
# Write the YAML file
with open(file_path, 'w') as file:
yaml.dump(yaml_dict, file)
Next, we can launch the run, which should take about XX if you are running this in a Colab environment with a T4 GPU:
# Buy using the ! the comand will be executed as a bash command
!accelerate launch -m axolotl.cli.train /content/test_axolotl.yaml
Once the model has been finetuned, we can run it in a gradio environment:
# Buy using the ! the comand will be executed as a bash command
!accelerate launch -m axolotl.cli.inference /content/test_axolotl.yaml \
--qlora_model_dir="./qlora-out" --gradio