Notebook for Training LLama3.1 8 Billion (4-bit Quantized) using SFTTrainer
This training is done on Kaggle Notebook enabling GPU(Required in quantized training/ inference).
install Dependencies
%%capture
!pip install -U transformers[torch] datasets
!pip install -q bitsandbytes trl peft accelerate
!pip install flash-attn --no-build-isolation
!pip install huggingface_hub
import Modules
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer
from peft import LoraConfig
from huggingface_hub import notebook_login
import torch
from huggingface_hub import login
from datasets import Dataset
from kaggle_secrets import UserSecretsClient
import os
Remember to generate a Token with edit access on HuggingFace and add it as secret in Kaggle Notebook
hf_token = UserSecretsClient().get_secret("HF_TOKEN_LLAMA3")
login(token = hf_token)
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use only GPU 0
Remember to Customize your own data with at least 1000 examples.
Data_examples = [{"instruction":"Who has taken oath as Prime minister of India in 2024", "response":" Shri Narendra Modi has taken oath as Prime minister of india on 9th June 2024. He is now become prime minister having 3 consecutive terms."},...................................................................................,]
Process data to stringify only the text
field
processed_data = []
for example in Data_examples :
processed_data.append({'text':f"{example['instruction']} \n {example['response']}"})
Create a Dataset from the list of dictionaries
dataset = Dataset.from_list(processed_data)
Split into train and test Data sets
dataset = dataset.train_test_split(test_size=0.01)
# Access train and test splits
train_dataset = dataset['train']
test_dataset = dataset['test']
Firstly add model to Kaggle notebook navigating to Add Input and Add LLama3.1 8 B in out Notebook
model_path="/kaggle/input/llama-3.1/transformers/8b-instruct/2" # Change it according to your model path in Notebook
trained_model_name = "Llama-3-8B-instruct-4bit-finetuned"
output_dir = 'kaggle/working/' + trained_model_name
For 4 bit quantization (Q-LoRA) set Configs
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,)
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
Load the Model and Tokenizer and set pad token
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=quantization_config,
device_map="auto")
Use eos_token as pad_token
tokenizer.pad_token = tokenizer.eos_token
Set Training configurations
training_args = TrainingArguments(
fp16=False, # specify bf16=True instead when training on GPUs that support bf16 else fp16
bf16=True,
do_eval=True,
eval_strategy="epoch",
gradient_accumulation_steps=4,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant": False},
learning_rate=2.0e-05,
log_level="info",
logging_steps=5,
logging_strategy="steps",
lr_scheduler_type="cosine",
max_steps=-1,
num_train_epochs=1, # Number of times training will go through with same dataset.
output_dir=output_dir,
overwrite_output_dir=True,
per_device_eval_batch_size=8, # You can reduce if out-of memory errors occurs
per_device_train_batch_size=8, # You can reduce if out-of memory errors occurs
report_to="none", # for skipping wandb logging
save_strategy="no",
save_total_limit=None,
)
Set-up Trainer (Supervised-fine-tuning)
trainer = SFTTrainer(
model=model, # Use above quantized model
args=training_args,
train_dataset=train_dataset, # If Training Fails Try to reduce Dataset Size
eval_dataset=test_dataset,
dataset_text_field="text",
tokenizer=tokenizer,
packing=False, # Setting it True will Reduce dataset size as it will exclude similar examples occuring repetitive
peft_config=peft_config,
max_seq_length=1024,
)
Train model now And Note: It may take long lime (several minutes to Hours) depending on your dataset size
To clear out cache for unsuccessful run
torch.cuda.empty_cache()
train_result = trainer.train()
Save model in Notebook (in output_directory)
trainer.save_model()
Merge LoRA with the base model and save the merged model
merged_model = trainer.model.merge_and_unload()
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")
push merged model to the HuggingFace-hub (You must have logged in already)
merged_model.push_to_hub("username/model_name")
tokenizer.push_to_hub("username/model_name")
-------------------- End of Training and uploading trained model on our huggingface Space ----------------------------------