|
--- |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- sft |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
----------------------------------------------------------------------------------------------------- |
|
**Remeber this model is for illustration and knowlwdge Purpose. I have only used online freely available materials in whole process.** |
|
|
|
## Model Details |
|
This Model is Trained on Custum data related to Sales interactive conversations as Array of objects having Instruction and Response as Keys. |
|
-**Parameters:** ~8 Billion |
|
-**Quantization:** 4 Bit (Q-LORA) |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. |
|
|
|
- **Trained by:** [vakodiya] [Viru Akodiya] |
|
- **Model type:** [Text-Generation] |
|
- **License:** [apache-2.0] |
|
- **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct] |
|
|
|
|
|
### Training Data |
|
|
|
Training Data is specifically generated by me to train to my use case. |
|
It consits of Just 500 examples, so to increase dataset size, duplicated the original data and makes it 1000. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Hardware Type:** [Kaggle's GPU T4X2] |
|
- **Time used:** [37 Minutes] |
|
- **Cloud Provider:** [Kaggle] |
|
----------------------------------------------------------------------------------------------------------- |
|
|
|
## INFERENCE (It will need GPU) |
|
------------------------------------------------------------------------------------------------------------ |
|
|
|
# Install Dependencies |
|
``` |
|
%%capture |
|
!pip install transformers accelerate bitsandbytes |
|
``` |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline, AutoConfig |
|
import torch |
|
``` |
|
--------------------------------------------------------------------------------------------------------- |
|
# Load model and Tokenizer |
|
``` |
|
model_name = "vakodiya/Llama-3-8B-instruct-4bit-salesbot" |
|
config = AutoConfig.from_pretrained(model_name) |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_use_double_quant=True, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
quantization_config=bnb_config, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
) |
|
# Model evaluation mode |
|
model.eval() |
|
``` |
|
------------------------------------------------------------------------------------------------------- |
|
|
|
# Creating Inference Point |
|
``` |
|
def Trained_Llama3_1_inference(prompt): |
|
model.eval() |
|
conversation=[ |
|
{"role": "user", "content": prompt}, |
|
] |
|
input_ids = tokenizer.apply_chat_template(conversation, add_generation_prompt=True, return_tensors="pt", padding=True, truncation=True, return_attention_mask=True) |
|
if input_ids.shape[1] > 8192: |
|
input_ids = input_ids[:, -8192:] |
|
return "Input tokens more than 8k" |
|
inputs = input_ids.to(model.device) |
|
attention_mask = torch.ones_like(inputs, dtype=torch.long) |
|
final_prompt=tokenizer.decode(inputs[0]) |
|
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.4,attention_mask=attention_mask,pad_token_id=tokenizer.pad_token_id) |
|
response = tokenizer.decode(outputs[0]) |
|
final_response= response.replace(final_prompt,"").replace('<|eot_id|>',"") # Exclude prompt from response |
|
return final_response |
|
``` |
|
------------------------------------------------------------------------------------------------------------------------------- |
|
# Invoking Inference |
|
``` |
|
Trained_Llama3_1_inference("What are qualities of good Sales-person ?") |
|
``` |
|
----------End of Inferece -------------------- |
|
|
|
---------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
---------- Start of Training ----------------- |
|
|
|
#### Training (on Kaggle Notebook) |
|
This training is done on Kaggle Notebook enabling GPU(Required in quantized training/ inference). |
|
|
|
# Install Dependencies |
|
``` |
|
%%capture |
|
!pip install -U transformers[torch] datasets |
|
!pip install -q bitsandbytes trl peft accelerate |
|
!pip install flash-attn --no-build-isolation |
|
!pip install huggingface_hub |
|
``` |
|
|
|
------------------------------------------------------------------------------------------------------------------------------------------ |
|
# Import Modules |
|
``` |
|
from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM, TrainingArguments |
|
from trl import SFTTrainer |
|
from peft import LoraConfig |
|
from huggingface_hub import notebook_login |
|
import torch |
|
from huggingface_hub import login |
|
from datasets import Dataset |
|
from kaggle_secrets import UserSecretsClient |
|
import os |
|
``` |
|
------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
|
# Remember to generate a Token with edit access on HuggingFace and add it as secret in Kaggle Notebook |
|
``` |
|
hf_token = UserSecretsClient().get_secret("HF_TOKEN_LLAMA3") |
|
login(token = hf_token) |
|
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use only GPU 0 |
|
``` |
|
|
|
------------------------------------------------------------------------------------------------------------------------------------------- |
|
# Remember to Customize your own data with at least 1000 examples. |
|
``` |
|
Data_examples = [{"instruction":"Who has taken oath as Prime minister of India in 2024", "response":" Shri Narendra Modi has taken oath as Prime minister of india on 9th June 2024. He is now become prime minister having 3 consecutive terms."}, |
|
...................................................................................,] |
|
``` |
|
|
|
------------------------------------------------------------------------------------------------------------------------------------------ |
|
# Process data to stringify only the `text` field |
|
``` |
|
processed_data = [] |
|
for example in Data_examples : |
|
processed_data.append({'text':f"{example['instruction']} \n {example['response']}"}) |
|
|
|
# Create a Dataset from the list of dictionaries |
|
dataset = Dataset.from_list(processed_data) |
|
|
|
# Split into train and test Data sets |
|
|
|
dataset = dataset.train_test_split(test_size=0.01) |
|
# Access train and test splits |
|
|
|
train_dataset = dataset['train'] |
|
test_dataset = dataset['test'] |
|
``` |
|
--------------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
# Firstly add model to Kaggle notebook navigating to Add Input and Add LLama3.1 8 B in out Notebook |
|
|
|
``` |
|
model_path="/kaggle/input/llama-3.1/transformers/8b-instruct/2" # Change it according to your model path in Notebook |
|
trained_model_name = "Llama-3-8B-instruct-4bit-finetuned" |
|
output_dir = 'kaggle/working/' + trained_model_name |
|
``` |
|
---------------------------------------------------------------------------------------------------------------------------------------- |
|
## For 4 bit quantization (Q-LoRA) set Configs |
|
``` |
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16,) |
|
|
|
peft_config = LoraConfig( |
|
r=16, |
|
lora_alpha=16, |
|
lora_dropout=0.1, |
|
bias="none", |
|
task_type="CAUSAL_LM", |
|
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], |
|
) |
|
``` |
|
----------------------------------------------------------------------------------------------------------------------------------------- |
|
# Load the Model and Tokenizer and set pad token |
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_path, |
|
quantization_config=quantization_config, |
|
device_map="auto") |
|
|
|
# Use eos_token as pad_token |
|
tokenizer.pad_token = tokenizer.eos_token |
|
``` |
|
|
|
----------------------------------------------------------------------------------------------------------------------------------------- |
|
# Set Training configurations |
|
``` |
|
training_args = TrainingArguments( |
|
fp16=False, # specify bf16=True instead when training on GPUs that support bf16 else fp16 |
|
bf16=True, |
|
do_eval=True, |
|
eval_strategy="epoch", |
|
gradient_accumulation_steps=4, |
|
gradient_checkpointing=True, |
|
gradient_checkpointing_kwargs={"use_reentrant": False}, |
|
learning_rate=2.0e-05, |
|
log_level="info", |
|
logging_steps=5, |
|
logging_strategy="steps", |
|
lr_scheduler_type="cosine", |
|
max_steps=-1, |
|
num_train_epochs=1, # Number of times training will go through with same dataset. |
|
output_dir=output_dir, |
|
overwrite_output_dir=True, |
|
per_device_eval_batch_size=8, # You can reduce if out-of memory errors occurs |
|
per_device_train_batch_size=8, # You can reduce if out-of memory errors occurs |
|
report_to="none", # for skipping wandb logging |
|
save_strategy="no", |
|
save_total_limit=None, |
|
) |
|
``` |
|
-------------------------------------------------------------------------------------------------------------------------------------------- |
|
# Set-up Trainer (Supervised-fine-tuning) |
|
``` |
|
trainer = SFTTrainer( |
|
model=model, # Use above quantized model |
|
args=training_args, |
|
train_dataset=train_dataset, # If Training Fails Try to reduce Dataset Size |
|
eval_dataset=test_dataset, |
|
dataset_text_field="text", |
|
tokenizer=tokenizer, |
|
packing=False, # Setting it True will Reduce dataset size as it will exclude similar examples occuring repetitive |
|
peft_config=peft_config, |
|
max_seq_length=1024, |
|
) |
|
``` |
|
------------------------------------------------------------------------------------------------------------------------------------------------- |
|
# Note: It may take long Time to train model (several minutes to Hours) depending on your dataset size |
|
``` |
|
# To clear out cache for unsuccessful run |
|
torch.cuda.empty_cache() |
|
|
|
train_result = trainer.train() |
|
``` |
|
------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
|
# Save model in Notebook (in output_directory) |
|
``` |
|
trainer.save_model() |
|
``` |
|
------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
# Merge LoRA with the base model and save the merged model |
|
``` |
|
merged_model = trainer.model.merge_and_unload() |
|
merged_model.save_pretrained("merged_model",safe_serialization=True) |
|
tokenizer.save_pretrained("merged_model") |
|
``` |
|
--------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
# push merged model to the HuggingFace-hub (You must have logged in already) |
|
``` |
|
merged_model.push_to_hub("username/model_name") |
|
tokenizer.push_to_hub("username/model_name") |
|
``` |
|
------------------- End of Training and uploading trained model on our huggingface Space ---------------------------------- |